#110: Data Democratization with Redash Transcript
00:00 Are you asked to generate reports for your company's data?
00:02 Has someone suggested that you buy or deploy massive BI software that's expensive,
00:07 closed source, and generally underwhelming?
00:09 Well, it's Redash and Python to the rescue.
00:11 Today you'll meet Eric Framovich, the creator of Redash, whose goal is to make your company data-driven
00:17 by connecting to any data source to easily visualize your data.
00:21 Not only is it a cool open source, but it's an example of someone taking a successful open source project
00:27 and building a business on top of it.
00:30 This is Talk Python to Me, episode 110, recorded April 27, 2017.
00:59 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
01:06 This is your host, Michael Kennedy.
01:08 Follow me on Twitter where I'm @mkennedy.
01:10 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via at Talk Python.
01:17 This episode has been brought to you by Intel and Hired.
01:21 Be sure to check out what they're offering during their segments.
01:23 It helps support the show.
01:25 Eric, welcome to Talk Python.
01:27 Good to be here.
01:28 Yeah, it's great to have you here.
01:29 It's time to democratize some data and break it free from all the places it's captured
01:35 and use Python to do it, right?
01:37 Yep, yep, yep, yep.
01:38 That's how it started.
01:39 And it's still the mission.
01:40 Like, we're not there yet.
01:43 And it's a lot beyond the technology itself.
01:46 It's also educating people.
01:48 But yeah, that's the goal.
01:51 It's an awesome goal.
01:52 And I'm looking forward to chatting with you about it.
01:54 But before we get into that, let's hear your story.
01:56 How did you get into programming in Python?
01:57 It started over 20 years ago when I got my first PC, which was an XT computer.
02:04 I think there were newer computers at the time, but that's what we could afford.
02:08 And as a kid, there wasn't much to do with it.
02:11 So I started exploring and luckily stumbled at QBasic, which was shipped with DOS at the time.
02:17 And it had a few example programs and games along with it.
02:21 So basically, I just started running them and then messing around with the source code
02:26 and looking what results my changes have been causing.
02:31 So that was my first start with programming.
02:34 Later, I found some older kid at school who was programming during breaks.
02:40 And I was like standing behind him, looking over his shoulder, looking what he's doing,
02:44 catching up new functions and stuff, and then going back home to try.
02:48 And eventually, my parents bought me some books.
02:51 And that's how I really started programming.
02:54 From then, I kept learning, you know, started connecting to the internet, exploring more.
02:59 And I think that I got a bit into web development at the time.
03:03 And then the biggest jump was when I was 18, we have a mandatory army service.
03:08 I was lucky to join one of our intelligence, military intelligence units, where I was becoming a developer.
03:16 And there I was doing stuff that are completely different from what I'm doing today.
03:20 A lot of C++ and hardware interfaces gained a lot of interesting experiences like debugging on a helicopter.
03:27 Debugging on a helicopter.
03:28 How interesting.
03:30 Yeah, yeah.
03:31 It's not, I can't share that much details.
03:33 But yeah, that's, that happened.
03:35 Anyway, when I finished my army service, I got back into the web and web stuff.
03:43 Although I had the chance to do pretty much everything like full stack and mobile development, like native mobile development and anything, whatever the job requires.
03:53 Yeah.
03:54 So, and Python, I think that I started with Python.
03:57 I'm not really sure.
03:58 But definitely, I've been doing Python when Google App Engine was released.
04:03 So, that's around 2008.
04:05 I had some detour in Ruby for a few years.
04:09 But then four years ago, when I joined EverythingMe, I went back to developing mainly in Python.
04:14 Been doing some Go in the past years.
04:18 But there's nothing like Python.
04:20 Yeah, that's right.
04:21 And, okay, so this is interesting.
04:24 You talked about, what was it called?
04:26 EverythingMe?
04:27 Yeah, yeah.
04:28 Yeah, so for a while, you worked at this company called EverythingMe.
04:31 And that was like a launcher app for Android phones.
04:36 Is that right?
04:37 Yeah.
04:37 Yeah, that's right.
04:38 Okay.
04:39 And that was super popular, right?
04:41 Like between 10 and 50 million downloads or something on that scale?
04:45 Yeah, something like that.
04:47 Yeah, that was the scale.
04:48 I don't remember the exact numbers by now.
04:50 But yeah, that was the scale.
04:51 Were you guys using Python there?
04:53 Or was that mostly just things like Swift and Objective-C?
04:56 Or not Objective-C, Java, I guess.
04:58 So yeah, our client was Android only.
05:00 So that's Java.
05:02 But we had a lot of...
05:04 Because probably today we would call it AI-based launcher or whatever the current buzzwords are.
05:10 But it was personalized and it was learning you over time and adapting to you.
05:16 I mean, in various ways in the UI.
05:18 So there was a lot of machine learning and logic on the backend.
05:23 And most of it was in Python.
05:25 And then we had a lot of machine learning and learning.
05:26 Some of our new code was in Go.
05:28 But I had...
05:30 Like I was doing many Python.
05:32 And then that shut down, which is actually a really cool story, which we'll get to in a minute, which kind of gave you an excuse to do something even more cool.
05:41 But it seems like they came back.
05:44 Is that right?
05:44 The application itself had come back.
05:47 It was a bit weird for us.
05:49 But yeah, there was another company that bought the IP, I think, or something like that.
05:54 I'm not really sure about the details.
05:55 But they brought back that.
05:57 Yeah, because I read your Medium post about how when that shut down, you went to work on this project.
06:02 And then I did some research to look at just what that was historically.
06:06 And I'm like, oh, wait, it's still in the Play Store.
06:08 And there's still an active website.
06:09 And okay, I figured somebody must have purchased it.
06:13 Yeah, exactly.
06:13 Okay.
06:14 Well, that brings us to what you're doing today.
06:17 So what are you doing day to day for programming?
06:20 These days I'm working on Redash.
06:22 And I'm working on the three levels.
06:25 One is Redash itself.
06:27 So the code.
06:28 Trying to build the Redash community because it's an open source project.
06:31 So trying to build the open source community around it.
06:34 But it's also the company that I started when everything in me was shut down.
06:38 So I'm also trying to build a company.
06:40 And yeah, so that's my day to day.
06:44 And I'm trying to balance between all of this.
06:46 Usually one of the aspects is suffering more than others.
06:49 But I'm trying to learn over time how to improve the balance.
06:53 That's really cool.
06:54 So one of the things I always love to talk about and share with everybody is how people
06:59 have an open source project and how they, in some way or another, make a business about
07:04 it.
07:05 So I really want to dig into that.
07:06 But before we do, let's talk about just Redash itself.
07:08 What is Redash?
07:10 Where did it come from?
07:11 Things like that.
07:12 Okay, cool.
07:13 So there for me, when, like at some point, when we started, we collected event data, like
07:20 about usage and stuff like that with Splunk.
07:22 You can compare it to Elasticsearch with Kibana, something like that.
07:28 And once we started getting actual usage, it just didn't scale well with the amount of data
07:35 that we started to collect.
07:36 And we've been looking into different alternatives and decided to go with Redshift at the time.
07:41 It was really around the time that Redshift was just introduced.
07:44 It was a bit over four years ago, I think.
07:47 All right.
07:47 Tell us all what Redshift is.
07:48 It's from AWS, right?
07:50 Yeah.
07:50 So basically, it was a mobile app, but you can compare it to Google Analytics, like the data
07:54 that Google Analytics collects.
07:57 So it's the same thing, like a user clicked this button, user swipe left, user swipe right,
08:02 user did that, looked at that, and stuff like that.
08:04 And for us, it was really important because along with Splunk, we've been using Flurry,
08:10 I think, and some of the off-the-shelf analytics products.
08:13 But most of them really didn't work for us because we've been creating quite a unique product.
08:20 Like it's a launcher.
08:22 So it's not an application that you start, do something, and then close it.
08:26 It's something that runs all the time.
08:29 So the concept of session is completely different from how other apps see it or how the off-the-shelf
08:35 products see it.
08:36 So being able to access the raw data gives us a lot of power and ability to really understand
08:42 what people are doing.
08:43 So that was Redshift.
08:45 So you said Redshift is a column store database.
08:49 Column and R, yeah.
08:50 Column and R, yeah.
08:51 Which is not necessarily the same as a standard relational database.
08:55 It's not a document or key value store type of database.
08:59 Yes.
09:00 How would you describe that?
09:01 Like what does it mean to be a column or a database?
09:03 Redshift has two properties.
09:04 One of them is that it's columnar.
09:07 And what it means is quite easy to understand is that it stores data in columns.
09:12 So when you write a row, it takes each column and stores them, each one of them, along with
09:17 the other values of that column.
09:19 And what it allows, it allows for easier compression.
09:23 Like it can really compress your data really well.
09:26 Because think of like if you have a column which is a boolean.
09:31 So it's super easy to compress it if you have only these booleans versus if you store them
09:36 by rows.
09:36 Because then it gets harder to compress this row because it has all the different kinds
09:43 of data.
09:44 And this compression both saves you space.
09:47 And when you're dealing with large, large amount of data, it's important.
09:51 But it also makes processing much, much faster because you need to load smaller amounts of
09:58 data each time.
09:59 The other property of Redshift is it's I think the term is MPP is massively parallel.
10:05 What's the other P?
10:07 I don't remember.
10:08 Massively parallel.
10:10 That's just them.
10:11 So why MPP?
10:13 Massively parallel processing maybe.
10:14 Yeah, exactly.
10:15 Yeah, you're right.
10:16 They have, like when you run a query on Redshift, they compile binary code from it, from your
10:23 query.
10:24 Distribute this code to all the nodes in your cluster.
10:27 And then each node runs the query and then the master node aggregates it all.
10:33 So it's all sophisticated ways to process large amounts of data.
10:38 That's cool when you're processing large amounts of data.
10:41 It's not cool when you have small tables because then the penalty of like waiting for the compilation
10:47 step and all that is like it's too high compared to if you just ran this query on a normal database.
10:53 So you need to know when you should use it and when not.
10:57 But it's a great tool.
10:58 If you like on the Google side, you have BigQuery, which is again a columnar database.
11:04 But BigQuery has the advantage that it's you might call it serverless because with Redshift, you need to decide what the size of cluster you have.
11:13 You need to maintain it.
11:15 You need to vacuum your data like you do with Postgres.
11:18 With BigQuery, you don't have all this hassle that you just load that into it, run your queries.
11:23 You don't worry about anything.
11:25 Google takes care of it.
11:26 It comes with the downside that you pay per query.
11:30 They have an interesting pricing model where you need to pay basically based on the amount of data that your query is scanning.
11:38 So you need to be really aware what your query is doing and what columns you're picking to query and stuff like that.
11:47 That is an interesting business model to like, as your code runs less efficiently, you pay more.
11:53 But that's a pretty straightforward way to, I mean, that's how you use it up, right?
11:56 Yeah, exactly.
11:57 And the thing is that while it's not straightforward this way, with Redshift, it's practically the same.
12:04 Because if you write a BigQuery, you're waiting longer time, you take more resources off your cluster.
12:09 So you will end up eventually having to buy a larger cluster.
12:13 So it means it costs you more.
12:15 It costs you in human hours, like when your analyst is staring at the screen and waiting for his query to return.
12:22 So it's less explicit, but it's practically the same thing.
12:27 But obviously, from a financial standpoint, it's much easier to say, OK, we have this and that budget.
12:33 That's the kind of cluster that we're going to buy.
12:36 And that's it versus BigQuery where you, OK, I'm not sure how much it's going to cost me.
12:41 Let's hope it will be OK.
12:42 Yeah, it's definitely harder to predict.
12:45 But OK, very interesting.
12:47 All right.
12:47 So you said, look, we have this high performance, massive scale database and sort of regardless of which one you choose.
12:55 And we have all this data coming in.
12:57 And you're like, well, now we want to look at it and query it and do reports on it and things like that, right?
13:03 Yes, exactly.
13:04 So we had this project.
13:05 So we had this project of, OK, let's start piping our data into Redshift and to understand how we should do it and all that.
13:13 And when we started to get to the finish point of this, we said, OK, we're starting to have data in Redshift.
13:20 What are we going to do with it?
13:21 And we've been really spoiled by Splunk because Splunk has a really good user interface that allows you to query the data that you send into it.
13:30 And they have their own language that some people like it, some others not.
13:35 I didn't use it enough to form an opinion, but it was obviously lacking with Redshift because Redshift is just a database.
13:41 They don't handle the UI.
13:44 And at this point, we decided, OK, let's look at what the big boys are using.
13:50 Let's find the BI tool.
13:53 And we looked at Tableau, Yellowfin and maybe some others, but all of them failed with Redshift.
13:59 They might be great tools for traditional BI, let's call it, but with Redshift, what we've been doing at the time and probably many others doing,
14:07 we had a huge table, which being our raw events data, and we wanted to start running queries on top of it.
14:14 And it's something that's super hard to do with tools like Tableau or Yellowfin.
14:19 I think that they improved over the years and adapted to support better tools, databases like Redshift.
14:25 But it's definitely not the same thing as being able just to open an editor, write your query and run it on top of the table.
14:33 We tried to find a tool.
14:36 And at the same time, because we started already collecting data, anyone who wanted access to this data, we just created a user in Redshift,
14:46 told him to connect with any SQL client that he likes.
14:51 And people started inquiring for data, getting results and sharing them.
14:56 But the way they've been sharing them is basically sending CSV or Excel files over email, which is okay.
15:03 But then you get into questions like, okay, we see an issue in the data.
15:09 Okay, we see that we have a drop in conversion.
15:12 Now we start with questions like, do we really have an issue?
15:16 Maybe the issue is with the data that we collect, because it's a new pipeline.
15:21 Maybe we have a bug, and it's not really a drop in conversion.
15:25 Or maybe it's just the way the person who wrote this query, he made a mistake.
15:30 And he calculated the conversion wrong.
15:33 So you start reverse engineering the kind of query that he ran, because he obviously, since then, closed his editor.
15:39 He doesn't remember what he did.
15:41 He just has a CSV file to show.
15:43 And it was really, really frustrating.
15:45 I'm sure.
15:46 And you can't even verify it some of the times, right?
15:49 Because the person might not remember exactly what they did.
15:51 Yeah.
15:52 Yeah.
15:52 And this is like an exact true story, where I was reverse engineering.
15:56 I knew what query that person started with.
15:59 So I started with that query, and I'm running it.
16:03 And I see, no, that's not the same numbers.
16:05 Okay, let's change this and that.
16:06 Not the same numbers.
16:07 And then I followed the steps until I got the same numbers.
16:11 And okay, he had a mistake in the query.
16:12 And this is not something like, I had a similar experience in a previous working place.
16:18 So the problem felt familiar.
16:21 And me and our CTO at the time, Joey Simchon, we started talking about, how about if we had a JSFiddle-like UI for queries, where a person can write a query, get the results, and then share this URL with others, where the other person gets both the results, but also the query.
16:42 So that you can see what he did, and maybe understand if it's like, do some peer review.
16:49 And also, if you want to keep digging in, you have the query to start with.
16:54 So we had this idea floating.
16:57 And then we had one of our hackathons.
17:00 And this is when Ridesh was born.
17:02 Nice.
17:03 Yeah, you know you have a good product or idea when it's solving a problem that keeps coming back.
17:09 Right?
17:09 You're like, oh, we've been here before.
17:11 Why is there nothing that actually does this well?
17:13 Yeah, yeah.
17:14 And I think that only after I realized, yeah, that's actually something that I wanted to build back then.
17:20 Hmm, interesting.
17:21 And many people that adopted Redash since then have been telling me, yeah, we built the same thing internally, but we dropped it for Redash because, well, we can't really maintain it for a long time.
17:32 And here's a product that someone's already maintaining and using.
17:35 So let's switch to it.
17:37 Right.
17:37 So Redash kind of is made up of two different parts.
17:41 You've got, like you said, this JS Fiddle-like query editor.
17:45 And it's actually really nice.
17:47 It has auto-completion for the columns and tables and all sorts of stuff, right?
17:52 Yeah.
17:52 It depends on the database that you connect to because not all of them will have support for loading the schema.
17:58 But if it's a connector that supports loading the schema, you will have auto-complete.
18:03 And we have some other features that we adopted over time, like query snippets.
18:07 And you can have parameters inside your query so you can create something more interactive that the end user, let's call it, can play with instead of changing the query every time.
18:17 I see.
18:18 So can you have, like, so the other half of the story is you build the queries.
18:21 The other half is the dashboards and visualizations.
18:23 So on your visualization, could you have, like, a slider that is, like, a number of days you want to average across or something like that?
18:31 And you can slide it and it'll change the query.
18:33 So unfortunately, the parameters are not as slick as a slider or something today.
18:37 So currently, they're just input boxes where you can, the most that you can customize is, like, say, okay, this is a number input box or it's a date input box and the UI will adapt accordingly.
18:48 But it's not as simple as you can't.
18:53 So you can't.
18:57 So you can't.
19:01 So you can't.
19:06 able to give these people a way to create interactive stuff is super important.
19:10 Sure. And being the person who writes that query where they're like, well, I asked for
19:13 one day, but now could you do it for seven days? You know, that's also not fun to be
19:18 doing that, right?
19:19 Yeah, exactly. Although I've seen more than once that people who are not like your traditional,
19:25 someone who, like, obviously developers will know to write queries. Then there are the
19:31 product managers who some of them had some engineering background, they know to write
19:35 queries. But I've seen marketing people learn to write queries just because, well, they
19:40 started with asking someone to write a query for them. And then over time, they started
19:45 picking at the SQL and saying, okay, that's not a frightening, let's change the number of
19:50 days from seven to 14, running the query and hey, I got the results I wanted, that's cool.
19:56 Then the next step is that they, okay, let's just learn SQL and there are some good resources
20:01 online to learn SQL. They learn SQL, then they feel really empowered because now they can
20:07 have direct access to the data without anyone in the middle. It creates some issues sometimes
20:13 because many times the company's organizations actually won't have a clear schema of the data.
20:21 The data model is not that obvious.
20:24 Especially on these event streams, right? Where you're just like dropping data and that's streaming
20:28 in or something.
20:28 So yeah, so the event stream is definitely hard because it's also harder to, like, every question
20:34 beyond like, okay, how many events we had yesterday, how many unique users and obvious stuff like that
20:40 becomes really complex query. So this is something that I really want to tackle in the future.
20:47 And basically to make it easier to create more sensible models around the data. And I have some
20:52 ideas here, but that's, it will take time to get here. But yeah, so, so it's not that obvious to give
20:58 people a way to play with data. But I think that the fact that in Redash, you can always see the
21:03 querying, always have some peer review. It makes it much more safe, let's call it, to do it. And I see
21:11 really good success stories around this.
21:14 Yeah. And you have the ability to very much JS Fiddle, like to share your queries and, you know,
21:20 save them in the dashboard. And then people can like fork them off and say, I'm going to make a copy
21:25 of this and then I'll tweak it myself and save my version and stuff like that. Right. So that's pretty
21:31 helpful.
21:31 Yep. So basically the flow is that you can write the query, get the results. From here, you can just share
21:37 the table that you got. That's cool. You can access it with an API if you want to connect it to some
21:42 other tool. But you can also visualize it. And there are several types of visualizations.
21:46 The default ones like charts, maps, some Sunkey, WorldCloud and other stuff. And then you can group
21:53 several visualizations into a dashboard. And that's Shareware as well. That's basically the scope of it.
22:00 Yeah, that's cool. And so really, I think, you know, the way people probably perceive it is like,
22:05 here's a site internally we can go to, or I guess it could be public as well, that we can go to and
22:10 it'll show us the stats for our company. If we've got an app or web app, how are people using it? If you
22:16 got sales, like what sales versus leads doing, right? And there's a lot of different ways to visualize it.
22:22 One of the things I saw that you guys had that I really appreciated, because I've had to build it
22:26 before, is cohort analysis for like subscription services, or you know, those those types of things,
22:34 you know, users come and then they fade out. And that was really nice. So you have a lot of nice
22:38 visualizations. And they're pretty interactive. And they look really good. They're certainly things
22:43 like you would be happy to put in front of the CEOs or whatever. And go, here's your dashboard. Enjoy this.
22:48 In many cases, the CEO is the most active user on Redash, like at Everything Me, usually the CEO was
22:56 the one to spot data issues, because he was like, all the time on Redash, and he would spot various
23:02 changes in data. And we had, because we've been using Redshift from the early days. So we had cases
23:08 where a data for a whole day just disappears, and then reappears. So he was spotting that just before
23:16 anyone else. Sure. I live in this. I know this looks weird. I've been watching this for 14 days.
23:22 Yeah. So he was really challenging us to build the better alerting and monitoring mechanism for our
23:28 data flow, so that we can spot it before him. But I know that other companies as well, like the most
23:34 active user is the CEO. And he's like starting his morning, checking from the phone, the stats,
23:39 and then goes to email. So yeah, it's really, it's really nice in that way. And, and really,
23:45 like the visualization is something that we really want to find a way to get more contributions from
23:52 the community. Because this is like a venue for people to really become creative and share stuff
23:59 that they know. And I think that something that really, really, really help with it is if we will
24:04 have some plugin model. So today you need, like, if you want to add a visualization, you need to make
24:09 a pull request to the main repository. And you need like to start the whole project for that, which
24:15 over time became a bit hard. So I think that once we have a plugin model, it might catalyze more
24:20 development around this area. Yeah, I was thinking that a plugin model, I was going to ask you about
24:25 that. That seems like a really good idea.
24:27 We all love Python for its tremendous productivity benefits, but getting the best performance takes
24:46 some work. What if you could get out of the box, easy access to high performance Python? Intel distribution
24:52 for Python developers delivers just that. Get close to 100 times better performance for certain
24:57 functions when using NumPy, SciPy, Scikit-learn, linked with optimized native libraries like Intel
25:03 Math Kernel Library, access efficient multi-threading, and Python projects like Numba and Scithon.
25:08 Try the Intel distribution for Python and experience performance today at talkpython.fm/intel.
25:15 And profile your Python and native C, C++ applications for performance hotspots with
25:21 Intel VTune amplifier. With Intel, it's all about performance.
25:25 We also talked about the community. Like if you check out the GitHub repo at github.com slash
25:37 get read dash slash read dash. And when somebody, one of my listeners suggested that I have you on the
25:44 show and that we talk about this, I'm like, okay, this is a pretty cool looking project, but like how
25:48 many people really care about it? And then I went and looked at the GitHub repo and there's like 65,
25:53 almost 6,500 stars. There's like 135 contributors. I'm like, whoa, people are using this a lot. And it
26:00 seems like such a nice alternative to like Microsoft Excel with like some BI plugin or, you know, something
26:10 like that. Or all the various like big corporate commercial things that you have to buy into to
26:18 sort of do your BI dashboard type stuff. Yeah. And most of them are many times will be too complex for
26:26 many use cases and overpriced. So basically they've been like priced let's buy, let's charge as much as
26:33 we can, which is not really friendly for smaller companies or for different economies. Like
26:39 read dash has users really all around the world. And we see like people from Asia, Africa, South America,
26:45 where they have a completely different economy and it's really hard for them to afford all the other
26:51 tools. So being able to like to give them something that's really affordable, like almost free, like
26:58 it's open source. So it's really free except for the, I don't know, the cost of the server really gives
27:03 them a, I don't know, the same playing field as others, which is kind of cool. We have like, besides that,
27:10 we have all the big names that use it and that everyone knows like Amazon Atlassian, Mozilla,
27:16 Cloud, Outbrain and others. But it's really cool for me that I know that it's being used everywhere
27:23 and not just, you know, like in the United States, in the Silicon Valley. And actually I think that in
27:29 the Silicon Valley, probably less popular in many other areas. Right. Cause they have the money to burn
27:34 on whatever they want. Yeah. But it is cool. It's cool that it's used at some of these major tech
27:39 companies. It's also cool that it's enabling places where they're just with nothing possible or much less
27:45 possible before. So you talked about supporting different databases or connectors. You actually
27:53 have a quite a few integrations, you call them. What are some of the integrations that you guys have?
27:58 We started with Redshift and Postgres and basically it's the same thing. It's used like Redshift. You can
28:04 talk to it with a Postgres client. So this was the first connector that we had. I think the second one
28:09 was MySQL and the third one was BigQuery. But since then we had many more and most of them came from as
28:17 contributions from the community. Basically, someone found Redash. He wanted to connect it to his database and
28:24 he created the connector. And I think it's, there are two reasons why it became like this kind of contribution
28:30 became probably because there is a really simple API that you need to implement to add a connector to Redash.
28:37 And also it's like the thing that you must create to be able to connect to your database. So it's been really
28:45 motivating people to do it. So yeah, today it connects to many databases that I never used or heard of before.
28:51 So it's like InfluxDB or Impala?
28:53 It's actually InfluxDB I used.
28:55 Okay.
28:56 But Impala, yeah.
28:58 Yeah, yeah. Cool. And also it connects to like the standard sort of big company ones like Oracle, Microsoft SQL Server.
29:05 You've got MagaDB and MySQL, like what you already mentioned, and Cassandra and some others, right?
29:13 Yeah. And with all of them, you can say, well, let's say with all the SQL ones, you can just write the regular SQL query that you will write for the database.
29:23 Like we don't do any processing on your query. We send it as is. And this is one, this is a decision that allows us to support all these databases so easily because adding a database is just a matter of having proper driver and Python for it.
29:38 And even that's not really a limitation as like Amazon Athena, which is something they introduced recently.
29:45 They released only a JDBC connector for it.
29:48 And basically what I did is write a simple microservice in Java that uses the JDBC driver and exposes an interface that Redash can talk to it.
30:02 I see.
30:02 But most of the others, we just use the Python driver.
30:06 Now, if it's a database, if it's not like, if it's a database like MongoDB, then you just need to write a JSON that describes the query, the MongoDB query.
30:17 And it's almost similar to the syntax that you might use if you wrote the query inside some Mongo shell.
30:25 Not exactly the same, but quite similar.
30:27 But we also use this for other, like we have connector to Google Analytics, for example.
30:32 So again, you write a JSON that describes the data that you want to fetch from Google Analytics and you can get it into Redash.
30:41 Similarly, we have support for Google Spreadsheets, Jira and some other stuff.
30:46 And what this allows is that you can create one dashboard that shows data from the multiple data sources that you have.
30:54 It became quite common these days that many companies will have different type of data sources.
31:00 And it really helps them to be able to just show data from all of them instead of having different silos of UIs that show data from each one or having to build it in-house yourself.
31:13 Right. So maybe you could have like Google Analytics, like a Google Analytics widget on your dashboard right next to, so for web traffic and conversions, right next to like your sales numbers for the day.
31:28 And you could see how those relate potentially or something like that, right?
31:31 Yep. Yep. Yep.
31:32 Nice. Do you have integration with things like Stripe?
31:33 No. And it's something that comes up and maybe in the future we will have, but it's much easier today to just use something like, I don't know, Segment or Stitch, which are companies that give you the service of you connect all these web hooks to them or they just use their REST API, whatever.
31:54 And they can write the data into your database or maybe BigQuery or whatever you want.
32:02 And then it's easy to query that in Reddish.
32:05 I see. Let them handle all the API bits and the callbacks.
32:08 It's not only that. You get also the advantage that you have all the data in one database so you can start joining the data easily.
32:17 Right. Okay.
32:18 And it's something that you can do in the hosted version of Reddish, that you can write a query that runs across different data sources, but it's still not the same thing as having it in the same database and easily manipulating the data.
32:32 Sure.
32:32 So we've talked about a lot of the databases and data sources on the back end.
32:38 What is Redash itself built in?
32:40 It's a Python web app, right?
32:42 Yep.
32:42 Yeah. So it's both Python and JavaScript because it's a single page application. I think there are new names for that today. But basically, the Python side is an API.
32:54 And then there is the front end application that uses this API to present the UI.
33:01 I see. So you've got like a Flask backend. There's a bunch of JSON services. And then you've got what, Angular running on the front end?
33:09 It's a bit embarrassing these days to say it. But yeah, we're using Angular 1. So yeah, when we started, it was quite new and fun. But these days, yeah.
33:18 Isn't that the problem with JavaScript? It's like, we got the most amazing thing. It's just taken off. And like six months later, it's not. Right? There's five other choices. It's hard to really pick a winner. I mean, it's just nobody seems to reign very long in that world.
33:33 Yeah, it's a bit frustrating in this sense. Ember been quite surprising in the sense that they kept just moving ahead. And they've been able to really pick up the good parts of each new iteration of frameworks.
33:50 But it has its own issues in other senses. But it was really impressive to see how Ember just evolved over the years.
33:57 And I think that there is a feeling that things are starting to stabilize.
34:02 That the community is starting to adopt less, like the same tools and start to evolve them instead of, yeah, let's reinvent it again.
34:11 And it seems that React is going to be among the top winners along the side with Vue. I'm not sure how to pronounce it, but it's V-U-E.
34:21 Okay.
34:21 And probably Angular will stay because all the enterprise players that use it.
34:27 But it's probably not going to be the tool of choice for new companies and smaller companies.
34:34 I really like the Angular 2 and by now it's probably Angular 4 syntax is really like, I just don't like how it looks, the API they created.
34:43 But that's just a personal thing.
34:45 So, yeah, we're still with Angular 1.
34:47 I really hope that I could devote the time to migrate to something else.
34:51 But every time it comes up, I need to wear the product manager hat and say, the user doesn't care if it's Angular React or whatever.
35:00 So, let's focus on things that the user cares about.
35:03 Yeah, I mean, it really doesn't matter too much until maybe there are contributors that's like, I would love to contribute, but I'm not writing Angular.
35:11 Or it's really in the way of evolving the product, right?
35:15 Yeah, exactly.
35:16 I'm sure that there are a few things that I can, like, the aspect of attracting new contributors is definitely an issue here.
35:25 Because many people will just stay away from Angular these days and will be bummed by that.
35:32 And I can see some technical challenges that will be much easier, including the plugin model and stuff like that, if we adopted something else.
35:41 What would you pick if you could pick anything?
35:43 It's a good question.
35:44 It will be probably either React or Vue.
35:48 There are some things that I really like about Vue or whatever its name.
35:52 Vue, maybe?
35:53 Yeah.
35:54 Yeah, it's probably pronounced like Vue, but yeah.
35:57 Anyway, V-U-E.
35:58 So, there are many concepts there that I really like, but it seems that most of the mindshare is around React.
36:06 And as an open source project, I think it's important to adopt the more common technologies so it's easy for people to jump in and contribute instead of picking stuff that might make more sense from a technical point of view.
36:21 Yeah, there's really that balance, isn't there?
36:23 I mean, you definitely want to make it as easy as possible for contributors to contribute and extend it.
36:29 Yeah, and in that sense, Angular was really bad choice.
36:32 I remember an instance where someone like a coworker there for me wanted to add some feature, and I started to walk him through the things that he will need to understand to add it.
36:43 And then after like three minutes, I said, you know what?
36:46 Never mind.
36:47 I just implemented myself.
36:48 It will take less time than to explain it.
36:50 On the other hand, there was some internal app that I wrote later on with React, and I wanted someone to help me with a specific component.
36:59 And he didn't even need to look at React documentation.
37:03 He figured out the API from the code that I already had in place.
37:07 So it really reassured this feeling that I had that React has a really good learning curve.
37:14 So even if someone is not familiar with it, it's super easy to understand.
37:18 And it has other issues where it's like the ecosystem around it is a bit complex at this time.
37:27 But yeah, I don't know.
37:29 It will take time until I will have to make this decision again.
37:32 So we'll see how things will shift until then.
37:35 Yeah, maybe someday you can rewrite the front end.
37:38 But for now, there's more to do.
37:41 So one of the things I wanted to talk to you about, make sure that we covered on the show, is when you first started this project, it was an open source project.
37:52 And then recently, we talked about everything me and it's shutting down.
37:56 And that gives you a chance to say, okay, well, if I'm not doing that anymore, now what?
38:00 And you decided to try to take Redash and make it a proper company, as you talked about in the beginning, right?
38:09 I think there are a lot of people working on open source things.
38:13 And it feels to me like the standard way to level up off of open source stuff is to, all right, I've created something.
38:23 Let's just say like Flask.
38:24 I don't actually know how Armin Roeneker works this angle.
38:28 But let's suppose I create Flask.
38:29 Flask is popular.
38:30 So I try to do consulting for Flask projects.
38:33 And that's fine.
38:35 But I feel like in a lot of cases, it doesn't work very well.
38:38 You're still trading in your time for money, which is not always the best thing to be doing.
38:42 So you said like you actually thought through the different business models.
38:47 And right now, you're basically offering, in addition to the free open source version, hosted Redash as a service, right?
38:54 Yeah, that's correct.
38:55 So you want to talk about your thinking through those different business models and whatnot, stuff like that?
39:00 Sure, sure.
39:01 About half a year ago, I did a talk at a local meetup about what I learned from the experience of taking an open source project and turning it into a company.
39:11 And there have been three lessons.
39:14 The first one is that if you want to be able to work on some open source project, like your project, don't start a company.
39:22 That was a mistake.
39:24 And it's like an obvious thing.
39:27 Starting a company has its own challenges.
39:30 It has its own challenges.
39:30 It has its own demands.
39:33 And it's not easy.
39:36 You have to learn about accounting and all sorts of stuff that is, you're not trained in marketing, for example.
39:42 Yeah, so there is that.
39:44 But there is the whole thing of creating a business, which has nothing to do with creating a project or creating a product even.
39:53 Like you have your open source project.
39:54 Like you have your open source project.
39:55 That's cool.
39:56 That's cool.
39:56 But it has nothing to do with creating a business around it.
40:00 And I know that I made mistakes.
40:05 Some of them intentional around the business side of Redash.
40:09 Because the reason I started the company was to make sure that Redash will stay here for the long term.
40:17 So basically, the project doesn't serve the company.
40:20 Like it's not that I'm building a company using the project.
40:24 I'm building the company for the project.
40:26 Like I'm building the company to be able to work on the project for the long term.
40:31 Obviously, I'm also making it for my own self-defense reasons.
40:35 Because like I always wanted to create a company the way I see that it should be.
40:40 But there is some balance where usually you will invest more in the business itself.
40:48 I tipped more towards the project because that was the goal.
40:52 Now, in terms of business model, luckily we can see a lot of open source companies these days.
40:59 And you can see different models they do.
41:02 Many of them actually, if you look into it, they do all of them.
41:06 So basically, you can either like have some dual licensing or two versions.
41:13 Like have some your community version and your enterprise version.
41:18 Something like GitLab.
41:20 Yeah, GitLab is doing.
41:22 So they have their community GitLab version that anyone can download and install.
41:25 And there's the enterprise version of GitLab.
41:28 And these days there are, I think, two flavors of their enterprise version, which you can buy a license for and install them.
41:34 So that's one model.
41:35 There is the model of offering some professional services around your software.
41:40 And there is the model of offering a hosted version of your software, which is something that GitLab is doing as well.
41:47 But it's not their main thing.
41:49 The product that I took inspiration most from is Sentry, which is actually Python as well.
41:56 And I exchanged a few emails with David Kramer, who was really generous with his time and shared some advice.
42:02 And what they're doing is that they have a SaaS offering, which is exactly the same as their open source project.
42:10 When I started, I decided to keep everything open source and offer it as a service as the SaaS offering.
42:20 And basically, the idea is that anyone who doesn't want to manage his own installation of Redash and have the hassle of, like, uptime and keeping up with the upgrades and all this stuff, he can use the hosted version.
42:34 What I learned since then is it takes time to scale a SaaS business, especially when it's not your only focus.
42:41 Like, I'm always, like, juggling between, okay, we have the open source version and I have some production issue with the SaaS and you need to prioritize.
42:50 And also, like, building a SaaS business, again, like, at this point, Redash is good enough to just keep selling it.
42:59 Like, to focus on finding the market, finding the market and blah, blah, blah, blah, blah.
43:04 But because it's also open source, it has to keep moving forward because no one likes, like, a stagnating open source project.
43:13 So there is this balance which is not that easy to maintain, especially with limited resources.
43:19 So besides deciding, like, on this business model, I also decided to keep the company bootstrapped and not raise any money.
43:27 There were different reasons for that.
43:30 Essentially, it's just the thing that I felt most comfortable with.
43:35 This portion of Talk Python to Me is brought to you by Hired.
43:38 Hired is the platform for top Python developer jobs.
43:42 Create your profile and instantly get access to thousands of companies who will compete to work with you.
43:47 Take it from one of Hired's users who recently got a job and said, I had my first offer within four days and I ended up getting eight offers in total.
43:54 I've worked with recruiters in the past, but they were pretty hit and miss.
43:58 I tried LinkedIn, but I found Hired to be the best.
44:00 I really like knowing the salary up front and privacy was also a huge seller for me.
44:05 Well, that sounds pretty awesome, doesn't it?
44:07 But wait until you hear about the signing bonus.
44:08 Everyone who accepts a job from Hired gets a $300 signing bonus.
44:12 And as Talk Python listeners, it gets even sweeter.
44:15 Use the link talkpython.fm/Hired and Hired will double the signing bonus to $600.
44:20 Opportunity is knocking.
44:22 Visit talkpython.fm/Hired and answer the door.
44:27 There have been a few local investors who've been really interested.
44:30 And they even had a funny thing that one of our SaaS users, we have an integration with Slack that you can share visualizations from Reddish on Slack.
44:41 And he has a Slack channel with his investors.
44:43 So he started sharing with them KPIs and stuff like that from Reddish in Slack.
44:48 And so his investors saw it and he said like, hey, what's that?
44:52 That looks cool.
44:53 Can you connect me with the founder?
44:54 I might want to invest.
44:56 But I passed that opportunity.
44:58 Yeah, that's a hard decision to make, right?
45:00 But as soon as you take funding, the clock starts ticking for growth and other types of things that are maybe short term and not necessarily where you want to focus.
45:11 Well, it's a long discussion like bootstrap versus VC and all.
45:15 But essentially what I realized is that I prefer to keep it bootstrap.
45:19 So let's just move with it.
45:22 It definitely puts many constraints on what you can do.
45:25 Up to this day, I'm still working mostly alone on it.
45:29 Like full time, I'm the only employee at the company.
45:32 I'm using some freelancers, but full time, I'm the only one.
45:36 I am currently trying to hire another developer.
45:41 But it takes time.
45:42 So it's still only me.
45:43 But hopefully soon there will be another person.
45:47 So do you feel like having your open source project also powering a SaaS company?
45:55 Do you feel like Redash is better because of the exposure and experience you got running the SaaS company?
46:01 Yeah, that's for sure.
46:02 The hosted version of Redash is probably one of the largest deployments of Redash.
46:08 And it allows me both to stress test it and to find weak points in terms of performance and stuff like that.
46:15 But also it gives me visibility into how people use it.
46:19 So as an open source project, you practically have no idea how people use your project, what they do with it, how they use it, what kind of connectors and all that.
46:28 So you need to make decisions really blindly compared to a SaaS business where they have a lot of data that they can collect.
46:36 Yeah, it's not like everything me, for example, which started this whole thing where you have all these stats coming in.
46:42 And now I'm sure you have like all sorts of usage stats on like features, how frequently they're used and the type of errors you run into.
46:49 And you wouldn't get that if it was just cloned off GitHub and run the script.
46:53 Yeah, exactly.
46:54 And so this as well improves the project itself because it allows to make more informed decisions.
47:00 I remember there was one instance where someone wanted to switch just some behavior.
47:06 And it felt to me that what he proposed is not common enough.
47:10 But then I checked the data and I realized that he's right.
47:13 So it really, really helps to be able to see how people use it and to have more direct access to them and to be able to understand what they do with Redash.
47:22 It has some downsides towards the open source project because there are things around the way like the deployment story and stuff like that, that might be investing more into them if it was only a self-hosted solution.
47:38 We still have quite an easy way to deploy Redash.
47:42 Like I maintain images for Amazon and Google Cloud.
47:46 And also there are Docker images.
47:48 So it's not that hard to start using Redash.
47:51 And in the last version, I even added some wizard that allows you to set up the user without having to use the CLI.
47:57 That's eventually improving, but it could move faster if I wasn't also working on the SAS project.
48:03 Yeah, it's easy to see it that way.
48:05 But at the same time, the SAS project lets you put all of your time into this open source project and not do like consulting for a bank where you're building forms over data type web apps.
48:18 Yeah, exactly.
48:19 And at the beginning, I've been doing some custom development, paid custom development for on top of Redash, but still like doing some sort of consulting work.
48:30 While over time, I could focus more on the product itself.
48:33 So it definitely has its benefits.
48:36 And I'm still like it's a recurring thing dilemma for me, like around whether I should invest more in the SAS or should I find a business model around the self-hosted version?
48:48 Because even today, there are much more users who use the open source version, the self-hosted one, than ones who use the SAS version.
48:57 And there are people who even start with the SAS and then like they use the trial of the SAS.
49:02 Oh, that's cool.
49:03 Let's move to the open source one for different reasons.
49:05 Some of them just don't want to open their database to an external company.
49:10 Others are just cheap.
49:13 And I even compared the numbers with Sentry and they said that they see a better ratio between the open source users and the SAS users.
49:24 And I think that part of the reason is the kind of product I'm building where you really need to like give access to your data to another company and many people not feeling comfortable with that.
49:38 So I always like it's a recurring dilemma of whether I should find either another business model or an additional business model where I can offer something to the self-hosted crowd.
49:51 And the obvious thing here is just some kind of an enterprise edition.
49:56 But my concern here is that once I start offering an enterprise edition, there are two problematic outcomes here.
50:03 One is that usually enterprise clients, they're not only buying software, they're also buying support, which means selling services.
50:12 Right.
50:12 What's your SLA and all of that kind of stuff?
50:15 Yeah.
50:15 Yeah.
50:16 And you start deviating into the world of selling hours instead of selling software.
50:22 That's one issue.
50:23 And another issue is that they have completely different demands than what the rest of the users need.
50:29 And then there is the concern of turning Reddish into Jira instead of keeping it as Trello.
50:36 And this is like something that it's a slippery slope where you find yourself with a bloated product because of different bigger clients with weird needs.
50:47 And today it's easier to ignore them because they're not paying you.
50:51 But once most like huge chunk of your revenue comes from them, you need to be more attentive.
50:56 So these concerns keep me on the SaaS model, but it's like a recurring dilemma.
51:02 And I hope that one day I will have a definitive answer here and I will know like, yeah, that's what we're sticking with.
51:08 Yeah, it's not super straightforward, I can tell.
51:11 But I think it's really great that you're making it work on either path.
51:16 And the SaaS one is definitely adding value to the world.
51:19 So nice work.
51:20 And I'm sure it's inspirational to a lot of people to hear it.
51:23 Yeah, I hope so.
51:24 I hope so.
51:25 All right.
51:26 Well, I think we're going to probably have to leave it there more or less for Redash.
51:30 So let me ask you two final questions before I let you out of here.
51:34 When you're working on Redash and you open up an editor, a Python editor, what one is it?
51:39 So until not long ago, it was Atom.
51:42 But recently I started using Sublime again.
51:45 So I guess it's both of them.
51:47 And obviously with Vim mode enabled.
51:49 Yeah, of course.
51:50 Very nice.
51:51 And what out of the 100,000 plus PyPI packages, what one do you think is notable that people maybe don't know of, but they should check out?
52:00 So the first one is PeeWee.
52:02 P-E-E-W-E-E.
52:04 Yeah, exactly.
52:05 It's an ORM.
52:06 It's a Python ORM.
52:08 I think it's not, it should be much more popular than it is.
52:11 I think it's the most Pythonic ORM.
52:13 Yeah, exactly.
52:14 I really don't understand how SQLAlchemy getting more mind share than PeeWee.
52:20 It's really like a mystery to me because PeeWee is much more Pythonic.
52:24 And Redash has been using PeeWee up until like half a year ago when I decided to migrate to SQLAlchemy.
52:32 And no offense to the SQLAlchemy people, but I think that was one of the biggest mistakes of 2016 for me.
52:40 Yeah, I should have stayed with PeeWee.
52:41 How interesting.
52:42 Yeah, you're right.
52:43 The SQLAlchemy definitely gets a lot of the mind share.
52:46 SQLAlchemy is great.
52:47 But I do think also PeeWee is quite cool.
52:50 Basically, you give it Lambda expressions or generator expression type things, and it transforms that into the actual SQL query.
52:57 I think that's glorious.
52:58 Yeah, and just simple to use and fun.
53:01 Yeah, exactly.
53:02 And I also noticed that there's an extension sort of package that will convert it to an asyncio variation.
53:08 So you can basically create async coroutines and use PeeWee queries and await the queries, which is, that's just icing on the cake.
53:18 That's nice.
53:18 Oh, that's kind of cool.
53:19 Yeah, it's very cool.
53:21 So thanks for that.
53:22 Those are awesome recommendations.
53:23 Now, final call to action, like, how do people check out Redash?
53:28 How do they check out the hosted thing?
53:29 What do you need from the community?
53:31 Things like that.
53:32 Go to redash.io and you have all the information there.
53:35 It's super easy to start, either if you go with the hosted version or even if you start with your own deployment.
53:41 It's really a few minutes and you can start curing your data.
53:45 And probably if you have a database, you need Redash.
53:48 And contributions are always welcomed, like both code but also documentation.
53:53 Like, this is something that we're not getting enough of.
53:56 But, yeah, any contribution is eventually welcomed.
53:59 All right.
54:00 Well, sounds good.
54:01 It's definitely a cool project.
54:02 Eric, thank you for sharing it with everyone.
54:04 Thank you for inviting me.
54:05 You bet.
54:06 Bye.
54:06 This has been another episode of Talk Python to Me.
54:10 Today's guest has been Eric Framovich.
54:14 And this episode has been sponsored by Intel and Hired.
54:17 The Intel distribution for Python delivers the high-performance Intel C libraries built right into Python.
54:23 Get close to 100 times better performance for certain functions when using NumPy, SciPy, and Scikit-learn.
54:29 Check them out at talkpython.fm/intel.
54:32 Hired wants to help you find your next big thing.
54:35 Visit talkpython.fm/hired to get five or more offers with salary and equity presented right up front
54:41 and a special listener signing bonus of $600.
54:45 Are you or your colleagues trying to learn Python?
54:47 Well, be sure to visit training.talkpython.fm.
54:50 We now have year-long course bundles and a couple of new classes released just this week.
54:56 Have a look around.
54:57 I'm sure you'll find a class you'll enjoy.
54:59 Be sure to subscribe to the show.
55:01 Open your favorite podcatcher and search for Python.
55:03 We should be right at the top.
55:05 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.
55:14 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.
55:19 Corey just recently started selling his tracks on iTunes, so I recommend you check it out at talkpython.fm/music.
55:25 You can browse his tracks he has for sale on iTunes and listen to the full-length version of the theme song.
55:31 This is your host, Michael Kennedy.
55:33 Thanks so much for listening.
55:34 I really appreciate it.
55:36 Smix, let's get out of here.
55:38 Outro Music.
55:59 And I'll put it in.
56:00 Thank you.