Learn Python with Talk Python's 270 hours of courses

#110: Data Democratization with Redash Transcript

Recorded on Thursday, Apr 27, 2017.

00:00 Michael Kennedy: Are you asked to generate reports for your companies data? Has someone suggested that you buy or deploy massive Beehive software that's expensive, closed source, and generally underwhelming? Well it's Redash and Python to the rescue. Today you'll meet Arik Fraimovich, the creator of Redash, whose goal is to make your company data driven by connecting to any data source to easily visualize your data. Not only is it a cool open source, but it's an example of someone taking a successful open source project and building a business on top of it. This is Talk Python To Me. Episode 110, recorded April 27, 2017. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy, keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. This episode has been brought to you by, Intel and Hired. Be sure to check out what they're offering during their segments and help support the show. Arik, welcome to Talk Python.

01:27 Arik Fraimovich: Ah, good to be here.

01:28 Michael Kennedy: Yeah, it's great to have you here. It's time to democratize some data and break it free from all the places that's captured and use Python to do it, right?

01:36 Arik Fraimovich: Yeah, yeah, yeah, yeah. That's how it started and that's the, it's still the mission. Like, we're not there yet. And it's a lot beyond the technology itself. It's educating people, but yeah. That's the goal.

01:50 Michael Kennedy: It's an awesome goal. And I'm looking forward to chatting with you about it, but before we get into that, let's hear your story. How did you get into programming in Python?

01:58 Arik Fraimovich: It started over 20 years ago when I got my first PC which was the NeXT computer. I think that there were newer computers at the time but that's what we could afford. And as a kid there wasn't much to do with it, so I started exploring and luckily start, stumbled at QBasic which was shipped with DOS at the time. And it handed to you exmple programs and games along with it, so basically I just started running them and then messing around with the source code and looking what results my changes have been causing, so that was my first start with programming. Later I found some older kid that does, at school who was programming during breaks and I was like, standing behind him looking over his shoulder, looking what he's doing, catching up new functions and stuff and then going back home to try. And eventually, my parents bought me some books and that's how I really started programming. From then I kept learning, you know, started connecting to the internet, exploring more and the thing that I got a bit into, web development at the time. And then the biggest jump was when I was 18 we have mandatory Army service. I was lucky to join one of our intelligence, military intelligence units where I was, become a developer. And there I was doing stuff that are completely different from what I'm doing today. A lot of C++ and hardware interfaces. Gained a lot of interesting experiences like debugging on a helicopter.

03:27 Michael Kennedy: Debugging on a helicopter. How, how interesting.

03:29 Arik Fraimovich: Yeah, yeah. It's not, I can't share that much details but yeah. That's that happened. Anyway, when I finished my Army service I got back into the web and web stuff. Although I had the chance to do pretty much everything like full stack and mobile development, like native mobile development and anything, whatever the job requires. Yeah, so and Python, I think that I started with Python, I'm not really sure but definitely I've been doing Python when Google app engine was released. So that's around 2008. I had some detour in Ruby for a few years. But then four years ago when I joined Everything Me I went back to developing mainly in Python. Been doing some Go in the past year but there's nothing like Python.

04:20 Michael Kennedy: Yeah, that's right. And okay, so this is interesting. You talked about what was it called? Everything Me? What?

04:28 Arik Fraimovich: Yeah.

04:28 Michael Kennedy: Yeah, so for a while you worked at this company called Everything Me and that was like a launcher app for Android phones, is that right?

04:37 Arik Fraimovich: Yeah, yeah. That's right.

04:39 Michael Kennedy: Okay, and that was super popular, right? Like between 10 and 15 million downloads or something on that scale.

04:46 Arik Fraimovich: Yeah, something like that. Yeah, that was the scale. I don't remember the exact number by now but yeah, that was the scale.

04:51 Michael Kennedy: Were you guys using Python there or was that mostly just things like Swift and Objective-C or not Objective-C. Java, I guess.

04:58 Arik Fraimovich: So yeah, our client was ... so that's Java but we had a lot of, because probably today we would call it, AI-based launcher or whatever the current buzzwords are but it was personalized and it was learning you over time and adapting to you. In various ways in the UI. So there was a lot of machine learning and logic on the back end and most of it was in Python. Some of our new code was in Go but I had like, I was doing many Python.

05:32 Michael Kennedy: And then that shut down which is actually a really cool story which we'll get to in a minute. Which kind of gave you an excuse to do something even more cool and, but it seems like they came back, is that right?

05:44 Arik Fraimovich: The application itself has come back. It was a bit weird for us, but yeah, there was another company that bought the IP I think or something like that. I'm not really sure about the details but they brought back the app.

05:57 Michael Kennedy: Yeah, 'cause I read your Medium post about how when that shut down you went to work on this project and then I did some research to look at just what that was historically and like, oh wait, it's still in the Play Store and there's still the active website and okay. I figured somebody must have purchased it.

06:13 Arik Fraimovich: Yeah, exactly.

06:14 Michael Kennedy: Okay, well that brings us to what you're doing today. So what are you doing day to day for programming?

06:21 Arik Fraimovich: These days I'm working on Redash and I'm working on the three levels, one is Redash itself, so the code, trying to build a Redash community because it's an open source project so trying to build the open source community around it. But it also, the company that I started when Everything Me was shut down, so I'm still trying to build a company and yeah. So that's my day to day and I'm trying to balance between all of this. Usually one of the aspects is suffering more than others, but I'm trying to learn over time how to improve the balance.

06:53 Michael Kennedy: That's really cool. So one of the things I'm, always love to talk about and share with everybody is how people of an open source project and how they in some way or another, make a business about it. So I really want to dig into that but before we do, let's talk about just Redash itself. What is Redash? Where did it come from? Things like that.

07:12 Arik Fraimovich: Okay, cool. So with Everything Me, it's when like at some point when we started we collected event data like about usage and stuff like that, with Splunk You can compare it to Elastic Search with Kibana. And once we started getting actual usage, it just didn't scale well with the amount of data that we started to collect and we've been looking into different alternatives and decided to go with Redshift at the time. It was really around the time that Redshift was just introduced. It was a bit over four years ago, I think.

07:47 Michael Kennedy: Alright, tell us all what Redshift is. It's from AWS, right?

07:50 Arik Fraimovich: Yeah, so basically it was a mobile app but you can compare it to Google Analytics like the data that Google Analytics collect. So it's the same thing, like, a user clicked this button. A user swipe left. A user swipe right. User did that. Looked at that and stuff like that. And for us it was really important because along with Splunk we've been using Flurry, I think, and some of off the shelf analytics products but most of them really didn't work for us because we've been creating quite a unique product. Like it's a launcher, so it's not an application that you start, do something and then close it. It's something that runs all the time. So the concept of session is completely different from how other apps fit or how the off the shelf products fit. So being able to aces the raw data gives us a lot of power and ability to really understand what people are doing. So that was Redshift.

08:45 Michael Kennedy: So you said Red Shift is a column store database.

08:49 Arik Fraimovich: Columnar .

08:50 Michael Kennedy: Columnar, yeah. Which is not necessarily the same as a standard relational database. It's not a document or key value store type of database.

09:00 Arik Fraimovich: Yes.

09:00 Michael Kennedy: How would you describe that? Like what does it mean to be a columnar database?

09:03 Arik Fraimovich: Redshift has two properties. One of them is that it's columnar and what it means is quite easy to understand is that it stores data in columns. So when you write a row, it takes each column and stores them. Each one of them along with the other values of that column and what it allows, it allows for easier compression. Like it can really compress your data really well. Because think of like, if you have a column which is a boolean. So we, it's super easy to compress it if you have only this boolean versus if you store them by rows because then it gets harder to compress because it has all the different kinds of data. And this compression, both it saves you space. And when you're dealing with large amount of data, it's important but it also makes processing much, much faster because you need to load smaller amounts of data each time. The other property of it's, I think the term is MPP, is Massively Parallel, what's the other P? I don't remember. Massively Parallel, that's just so YMPP.

10:13 Michael Kennedy: Massively Parallel Processing maybe.

10:15 Arik Fraimovich: Yeah, exactly. Yeah, you're right. They have, like when you run a query on Redshift they compile binary code from it. From your query, distribute this code to all the nodes in your cluster, and then each node runs the query, then the master node aggregates it all. So it's all sophisticated ways to process large amounts of data. That's cool, when you're processing large amounts of data. It's not cool, when you have small tables because then the penalty of like waiting for the compilation step and all that is like it's too high compared to if you just train this query on a normal database. So you need to know when you should use it and when not, but it's a great tool. If you're like on the Google site, you have Big Query, which is again columnar database but Big Query has the advantage that it's, you might call it server-less because with Redshift you need to decide what the size of cluster you have. You need to maintain it, you need to vacuum your data like you do with Postgres. With Big Query you don't have all this hassle that you just load that thing into it, run your queries, you don't worry about anything. Google takes care of it. It comes with the downside that you pay per query. They have an interesting pricing model where you need to pay basically based on the amount of data that your query is scanning. So you need to be really aware what your query is doing and what columns you're picking to query and stuff like that.

11:47 Michael Kennedy: That's is an interesting business model to like as your code runs less efficiently, you pay more, but that's a pretty straightforward way to, I mean that's how you use it up, right?

11:56 Arik Fraimovich: Yeah, exactly and the thing is that while it's not straightforward, this way, with Redshift it's practically the same because if you write a big query. You're waiting longer time, you take more resources of your cluster so you will end up eventually having to buy a larger cluster so it means it costs you more. It costs you in human hours, like when your analyst is staring at the screen waiting for his query to return so it's less explicit but it's practically the same thing. But obviously from financial standpoint much easier to say, "Okay, we have this and that budget. "That's the kind of cluster that we're going to buy and that's it," versus Big Query where you, "Okay, I'm not sure how much it's gonna cost me. Let's hope it will be okay."

12:43 Michael Kennedy: Yeah, it's definitely harder to predict, but okay, very interesting. Alright, so you said, "Look, we have this high performance, massive scale database, and sort of regardless of which one you choose, and we have all this data coming in." You're like, "Now we wanna look at it and query it and do reports on it," and things like that, right?

13:03 Arik Fraimovich: Yeah, exactly. So we adopt, we had this project of, "Okay, let's start by packing our data into Redshift and to understand how we should do it," and all that. And when we started to get to the finish point of this we said okay, we're starting to have data in Redshift. What are we going to do with it? And we've been really spoiled by Splunk because Splunk has really good user interface that allows you to query the data that you send into it and they have their own language that some people like it some others not. I didn't use it enough to form an opinion, but it was obviously lacking with Redshift because Redshift is just the database. They don't handle the UI. And at this point we decide, okay, let's look at what the big boys are using. Let's find the BI tool and we looked at Tableau, Yellow, and maybe some others but all of them failed with Redshift. They might be great tools for traditional BI, let's call it, but with Redshift what we have been doing at the time and probably many others doing it. We had a huge table which been our raw events data and we wanted to start running queries on top of it and it's something that's super hard to do with tools like Tableau or Yellow. I think that they improved over the years and adapted to support better tools, databases like Redshift. But it's definitely not the same thing as being able just to open an editor or write your query and run it on top of the results and on top of the table. We tried to find a tool, at the same time because we started already collecting data, anyone who wanted access to this data, we just created the user in Redshift told him to connect with any SQL client that he likes and they start, people start inquiring for data, getting results and sharing them but the way they've been sharing them is basically sending CSV or Excel files over email which is okay, but then you get into questions like okay, we see an issue in the data, okay? We see that we have a drop in conversion. Now we start with questions like, "Do we really have an issue?" Maybe the issue is with the data that we collect because it's a new pipeline. Maybe we have a bug and it's not really a drop in conversion. Or maybe it's just the way the person who wrote this query, he made a mistake. And he calculated the conversion wrong. So you start reverse engineering the kind of query that you're in because he obviously since then closed his editor, he doesn't remember what he did. He just has his CSV file to show and it was really, really frustrating.

15:45 Michael Kennedy: I'm sure and you can't even verify it some of the times, right? 'Cause the person might not remember exactly what they did.

15:51 Arik Fraimovich: Yeah, yeah and like, exact true story where I was reverse engineering. I knew what query that person started with so I started with that query and I'm running it and I see, no that's not the same numbers. Okay, let's change this and that. Not the same numbers. And then, I followed the steps until I got the same numbers and okay, here the mistake in the query. And this is not something like I had a similar experience in my previous working place so the problem felt familiar and Mim our CTO at the time . We started talking about how about if we had a like JSFiddle UI for queries where a person can write a query get the results and then share this URL with others where he gets, where the other person gets both the results, but also the query, so that you can see what he did and maybe understand if there, if it's like do some peer review and also if you want to keep digging in you have the query to start with. So we had this idea floating and then we had one hackathon and this is when Redash was born.

17:02 Michael Kennedy: Nice, yeah you know you have a good product or idea when it's solving a problem that keeps coming back, right? You're like, oh, we've been here before. Why is there nothing that actually does this well?

17:13 Arik Fraimovich: Yeah, yeah and I think that only after I realize that yeah that's actually something that I wanted to build back then. Hmm, interesting and many people that adapted Redash instead been telling me, "Yeah we build same thing generally but we dropped it for Redash because we can't really maintain it for a long time and here's a product that someone's already maintaining and using, so let's switch to it."

17:37 Michael Kennedy: Right, so Redash kind of has made up of two different parts. You've got, like you said, this JSFiddle like query editor and it's actually really nice. It has auto completion for the columns and tables and all sorts of stuff, right?

17:52 Arik Fraimovich: Yeah, it depends on the database that you connect to because not all of them we have support for loading the schema, but if it's a connector that supports lodging the schema, you will have auto complete and we have some other features that we adopted over time like query snippets. And you can have parameters inside your queries, so you can create something more interactive that the end user let's call it, can play with instead of changing the query every time.

18:17 Michael Kennedy: I see, so can you have like a, so the other half of the story is you build the query. The other half is the dashboards and visualizations. So, on your visualization could you have like a slider that is like number of days you wanna average across or something like that. You can slide it and it will change the query.

18:33 Arik Fraimovich: So unfortunately, the parameters are not as slick as a slider or something today so currently their just input boxes where you can, the most that you can customize is like, say okay, this is a number input box or it's a date input box and UI will adapt accordingly. But it's definitely an area where I and hopefully others want to invest more and improve it because eventually not everyone knows the right queries. And having the need to have someone to write the query for you every for every small change is not fun so being able to give these people a way to create interactive stuff is super important.

19:10 Michael Kennedy: Sure. And being the person who writes that query where they're like, "Well, I ask for one day, but now could you do it for seven days?" You know, that's also not fun if you're doing that, right?

19:19 Arik Fraimovich: Yeah, exactly. I've seen more than once that people who are not like your traditional someone who like obviously developers will not write queries. Then there are the product managers who some of them have some engineering background. They're not write queries. But I've seen marketing people learn to write queries, just because well they started with asking someone to write a query for them and then over time they started picking at the SQL and saying, "Okay, that's not that frightening. Let's change the number of days from seven to 14, run the query, and, Hey I got the results I wanted." That's cool and then next step is that they, "Okay let's just learn SQL." And there are some good resources online to learn SQL. They learn SQL then they feel really empowered because now they can have direct access to the data without anyone in the middle. It creates some issues sometimes because many times the companies, organizations actually, won't have clear schema of the data. The data model is not that obvious.

20:24 Michael Kennedy: Especially on these event streams, right? Where you just like dropping data in, that's streaming in or something.

20:28 Arik Fraimovich: So yeah, so the events stream is definitely hard because it's also harder to like every question beyond like, okay how many events we had yesterday and how many unique users and obvious stuff like that becomes really complex query so this is something that I really want to tackle in the future and basically to make it easier to create more sensible models around the data. And I have some ideas but it will take time to get here. But yeah, so it's not that obvious to give people a way to play with data but I think that the fact that in Redash you can always see the query always have something peer review it makes it much more safer, let's call it, to do it. And I feel really good success stories around this.

21:13 Michael Kennedy: Yeah, and you have the ability to very much JSFiddle, like to share your queries and save them in the dashboard and people can like, fork them off and say, "Okay I'm gonna make a copy of this and I'll tweak it myself and save my version," and stuff like that, right? So that's pretty helpful.

21:31 Arik Fraimovich: Yup, so basically the flow is that you can run the query, get the results. From here we can just share the table that you got. That's cool. You can access it with an API, if you want to connect it to some other tool but you can also visualize it and there are several types of visualizations. The default ones are like charts, maps, some world cloud, and other stuff and then you can group several visualizations into a dashboard and that's sharable as well. That's basically the scope of it.

22:00 Michael Kennedy: Yeah, that's cool and to really, I think you know, the way people probably perceive it is like, here's a site internally we can go to or I guess it could be public as well. That we can go to and it'll show us the stats for our company. If we've got an app, or web app. How are people using it? Or if you've got sales. Like what sales versus leads doing, right? And there's a lot of different ways to visualize it. One of the things I saw that you guys had that I really appreciated because I've had to build it before is cohort analysis for like subscription services, or you know, those types of things. Users come and then they fade out and that was really nice to see a lot of nice visualizations and they're pretty interactive and they look really good. There are certainly things like you would be happy to put in front of the CEO's or what ever and go, "Here's your dashboard, enjoy this."

22:48 Arik Fraimovich: In many cases, the CEO is the most active user in Redash, like at Everything Me, usually the CEO was the one to spot data issues because he was like all the time on the the Redash he would spot various changes in data and we had, because we've been using Redshift from the early days, so we had cases where a data for a whole day just disappears and then reappears, so he was putting that just before anyone else.

23:16 Michael Kennedy: Oh, sure. I live in this. I know this looks weird. I've been watching this for 14 days.

23:23 Arik Fraimovich: Yeah, so he was really challenging us to build a better alerting and monitoring mechanism for data flow so that we can spot it before him. But I know that our company's as well like the most active user is the CEO and he's like starting his morning checking from the phone, the stats and then goes to email. So yeah, it's really, it really nice in that way and really like, the visualization's something that you really want to find a way to get more contributions from the community because this is like a venue for people to really become creative and share stuff that they know. And I think that something that really, really, really help with it is if we will have some plug-in model so today you need like, if you want to add the visualization, you need to make a full request to main repository and you need to start the whole project for that which over time became a bit hard so I think that once we have a plug-in model it might catalyze more development around this area.

24:22 Michael Kennedy: Yeah, I was thinking that a plug-in model, I was gonna ask you about that. That's seems like a really good idea. We all love Python for it's tremendous productivity benefits but getting the best performance takes some work. What if you could get out of the box, easy access to high performance Python? Intel distribution for Python developers delivers just that. Get close to the one hundred times better performance for certain functions when using NumPy, SciPy, Sykit learn, linked with an optimized native libraries like Intel Math Kernel Library. Access efficient multi threading and Python projects like Numba and Sython. Try the Intel Distribution for Python and experience performance today at talkpython.fm/Intel. And profile your Python and native C/C++ applications for performance hot spots with Intel VTune Amplifier. With Intel, it's all about performance. We talked about the community. Like if you check out the GitHub Repo github.com/getRedash/Redash and when somebody, when one of my listeners suggested that I have you on the show and that we talk about this, I'm like, okay, this is a pretty cool looking project but how many people really care about it? And then I went and looked at the GitHub repo, there's like 65, almost 6500 stars. There's like 135 contributors. I'm like, whoa, people are using this a lot and it seems like such a nice alternative to Microsoft Excel with like some BI plug-in or something like that. Or all the various big, corporate commercial things that you have to buy into to sort of do your BI dashboard-type stuff.

26:21 Arik Fraimovich: Yeah, and most of them are many times will be too complex for many use cases and overpriced. So basically they like price by let's charge as much as we can which is not really friendly for small companies or for different economies like, Redash has users really all around the world and we see like people from Asia, Africa, South America where they have a completely different economy and it's really hard for them to afford all the other tools. So being able to give them something that's really affordable like almost free. It's open source. It's really free except for the I don't know, the cost of the server, really gives them a I don't know, the same playing field as the others which is kind of cool. We have like, besides that, we have all the big names that use it and that everyone knows like Amazon, or Mozilla, Sound Cloud, Outbrain and others. But it's really cool for me that I know that it's being used everywhere and not just in the United States, in the Silicon Valley and actually I think that in the Silicon Valley probably less popular in many other areas.

27:32 Michael Kennedy: Right, 'cause they have the money to burn on whatever they want.

27:36 Arik Fraimovich: Yeah.

27:38 Michael Kennedy: But it's cool. It's cool that it's used at some of these major tech companies. It's also cool that it's enabling places where they're just with nothing possible or much less possible before. So you talked about supporting different databases or connectors. You actually have a, quite a few integrations you call them. What are some of the integrations that you guys have?

27:58 Arik Fraimovich: With started with this Redshift and Postgres and basically it's the same thing. You use like Redshift, you can talk to it with Postgres client. So this was the first connector that we had. I think the second one was My SQL and the third one was Big Query but since then we had many more and most of them came from, as contributions from the community. Basically, someone found Redash. He wanted to connect it with his database and he created the connector and I think it's, there are two reasons why it became like this kind of contribution because, popular is because there is a really simple API that you need to implement to add a connector to Redash and also, it's like the thing that you must create to be able to connect to your database. So, it's been really motivating people to do it. So yeah, today it connects to many databases that I never used or haven't built before.

28:50 Michael Kennedy: Like Influx DB or Impala.

28:53 Arik Fraimovich: It's actually, Influx DBI, I used.

28:56 Michael Kennedy: Okay.

28:57 Arik Fraimovich: But Impala yeah, never.

28:59 Michael Kennedy: Yeah, yeah, cool. And also connects to like the standard sort of big company ones like Oracle, Microsoft SQL Server. You've got--

29:07 Arik Fraimovich: Yeah.

29:08 Michael Kennedy: Mongo DB and MY SQL like what you already mentioned. And Cassandra and some others, right?"

29:13 Arik Fraimovich: Yeah, and with all of them you can say, well let's say, with all the SQL ones you can just write the regular SQL query that you will write for the database. Like we don't do any processing on your query. We send it as is and this is one, this isn't decision that allows us to support all these databases so easily. Because adding a database is just a matter of having proper driver in Python for it. And even that, not really a limitation as like, Amazon Athena, which is something they introduced recently. They released only a JDBC connector for it and basically what I did was write a simple microservice in Java that uses this JDBC driver and exposes an interface that Redash can talk to it.

30:02 Michael Kennedy: I see.

30:03 Arik Fraimovich: But most the others, we just use the Python driver. Now, if it's a database, if it's not like, if it's a database like Mongo DB, then you just need to write a json that describes the query, the Mongo DB query which is almost similar to the syntax that you might use if you wrote, the code inside, some Mongo shell. Not exactly the same but quite similar. But we also use this for other like we have connector to Google Analytics for example. So again, you write a json that describes the data that you want to crunch from Google Analytics and you can get it into Redash. Similarly we have support for Google Spreadsheets, Jira and some other stuff and what this allows is that you can create one dashboard that shows data from the multiple data sources that you have. It became quite common these days that many companies will have different type of data sources and it really helps them to be able to just show data from all of them instead of having different silos of UI's that show data from each one or having to build it in-house yourself.

31:13 Michael Kennedy: Right, so maybe you could have like Google Analytics like a Google Analytics widget on your dashboard right next to, so for web traffic and conversions right next to your sales numbers for the day and you could see how those relate potentially or something like that, right?

31:31 Arik Fraimovich: Yup, yup.

31:32 Michael Kennedy: Do you have integration with it things like Stripe?

31:34 Arik Fraimovich: No, and it's something that comes up and maybe be the future we will have but it's much easier today to just use something like Segment or Stitch which are companies that give you the service of, you connect all these web hooks to them or just use their REST API whatever and they can write the data into your database or maybe Big Query or whatever you want and then it's easy to query that in Redash.

32:06 Michael Kennedy: I see, let them handle all the API bits and the callbacks.

32:09 Arik Fraimovich: It's not only that, you get also that advantage that you have all the data in one database so you can start joining the data easily.

32:17 Michael Kennedy: Right, okay.

32:18 Arik Fraimovich: And it's something that you can do in the hosted version of Redash that you can write a query that warrants a close different data sources but it's still not the same thing as having it in the some database and easily manipulating the data.

32:32 Michael Kennedy: Sure, so we've talked about a lot of the databases and data sources on the back end. What is Redash itself built in? It's a Python web app, right?

32:42 Arik Fraimovich: Yup, yeah. So it's but both Python and JavaScript because all it's single page application or I think the new names for that today but basically the Python side is an API and then there is the front end application that uses this API to present the UI.

33:01 Michael Kennedy: I see, so you've got like a flask back end. There's a bunch of JSON services and then you've got, what Angular running on the front end?

33:09 Arik Fraimovich: It's a about embarassing these days to say but yeah, we're using Angular 1 so yeah. When we started it was quite new and fun but these days, yeah.

33:18 Michael Kennedy: Isn't that the problem with JavaScript? It's like, we got the most amazing thing. It's just taking off and like six months later it's not, right? There's five other choices. It's hard to really pick a winner. I mean it's just nobody seems to reign very long in that world.

33:33 Arik Fraimovich: Yeah, it's a bit frustrating in this sense. Ember been quite surprising in the sense that they kept just moving ahead and there are they been able to really pick up the good parts of each new iteration of frameworks. But it has it's own issues in other senses but it's was really impressive to see how Ember just evolved over the years and I think that there is a feeling that things starting to stabilize. That the community starting to adopt less like the same tools and start to evolve them instead yeah, let's reinvent it again. And it seems that React is going to be among the top winners along side with Vue. I'm not sure how to pronouns it, but it's V-U-E. And probably Angular will stay because all the enterprise players that use it but it's probably not going to be the tool of choice for new companies and small companies. I really like the Angular 2 and by now it's probably Angular 4. Syntax is really like, I just don't like how it looks there the API they created, but that's just a personal thing. So yeah, we're still with Angular 1. I really hope that I could devote the time to migrate to something else but every time it comes up I need to wear the product manager hat and say the user doesn't care if it's Angular or React or whatever, so let's focus on things that the user cares about.

35:04 Michael Kennedy: Yeah, I mean it really doesn't matter too much until maybe you there are contributors it's like I would love to contribute but I'm not writing in Angular or it's really in the way of evolving the product, right?

35:15 Arik Fraimovich: Yeah, exactly. I'm sure that there are a few things that I, like the aspect of attracting new contributors is definitely an issue here because many people would just stay away from Angular these days and we'd be bumped by that and I can see some technical challenges that will be much easier including the plug-in model and stuff like that if we adopted something else.

35:41 Michael Kennedy: What would you pick if you could pick anything?

35:43 Arik Fraimovich: It's a good question. It will be probably either React or Vue. There are something things that I really like about Vue or whatever his name.

35:53 Michael Kennedy: View, maybe. Yeah.

35:54 Arik Fraimovich: Yeah, it's probably pronounced like View but, yeah, anyway V-U-E. So there are many concepts there that I really like but it is seems that most of the mind share is around React and as an open source project I think it's important to adopt the more common technologies, so it's easy for people to jump in and contribute instead of picking stuff that might be, make more sense I mean from a technical point of view.

36:22 Michael Kennedy: Yeah, there's really that balance, isn't there? I mean you definitely wanna make it as easy as possible for our contributors to contribute and extend it.

36:29 Arik Fraimovich: Yeah, and in that sense, Angular was really bad choice. I remember an instance where someone, like a co-worker at Everything Me wanted to add some feature and I started to walk him through the things that he will need to understand to edit and then after like, three minutes I said, "You know what? Nevermind, I will just implement it myself. It will take less time than to explain it." On the other hand, the World Summit internal app that I wrote later on with React and I wanted someone to help me with a specific component there and he didn't even need to look at React's documentation. He figured out the API from the code that I already had in place so it really reassured this feeling that I had that React is really, has really good learning curve. So even if someone not familiar with it, it's super easy to understand. I mean it has other issues where it's like the ecosystem around it is a bit complex at this time, but yeah, I don't know. It will take time and I will have to make this decision again, so we'll see how things will shift until then.

37:36 Michael Kennedy: Yeah, maybe some day you can re-write the front end but for now, there's more to do. So one of the things I wanted to talk to you, make sure that we covered on the show is when you first started this project, it was an open source project and then recently we talked about Everything Me and it's shutting down and that give you a chance to say, "Okay, well if I'm not doing that anymore, now what?" And you decided to try to take Redash and make it a proper company as you talked about in the beginning, right?

38:09 Arik Fraimovich: Yup, yup, yup.

38:10 Michael Kennedy: I think there are a lot of people working on open source things and they're you know, it feels to me like the standard way to like, level up off of open source stuff is to already, I've created something. Let's just say like, Flask. I don't actually know how you know works the single, but let's suppose I create Flask. Flask is popular, so I try to do consulting for Flask projects. And that's fine but I feel like in a lot of cases, it doesn't work very well. You're still trading in your time for money which is not always the best thing to be doing. So you said, you actually thought through the different business models and right now you're basically offering in addition to the free open source version, a hosted Redash as a service, right?

38:54 Arik Fraimovich: Yeah, that's correct.

38:55 Michael Kennedy: So do you wanna talk about your thinking through those different business models and what not, stuff like that?

39:00 Arik Fraimovich: Sure, sure. About half a year ago, I did a talk at a local meet up about what I learned from the experience of taking a part of open source project and turning it into a company, and there have been three lessons. The first one is that if you want to be able to work on some open source project like your project, don't start a company. That was a mistake because and it's like an obvious thing. Starting a company has it's own challenges. And it has it's own demands and it's not easy.

39:36 Michael Kennedy: You have to learn about accounting and all sorts of stuff that is, you're not trained in marketing for example.

39:42 Arik Fraimovich: Yeah, so there is that, but there is the whole thing of like creating a business which is, has nothing to do with creating a project or creating a product even. Like you have your open source project. That's cool. But it has nothing to do with creating a business around it. And I know that I made mistakes. Some of them intentional around the business side of Redash because the reason I started the company was to make sure that Redash will stay here for the long term. So basically the project doesn't serve the company like it's not that I'm building a company using the project. I'm building the company for the project. Like I'm building the company to be able to work on the project for the long term. Obviously I'm not making it for my own selfish reasons because I always wanted to create a company the way I see that it should be, but there is some balance where usually you would take, you will invest more in the business itself. I tipped more towards the project because that was the goal. Now, in terms of business model, luckily we can see a lot of open source companies these days and you can see different models they do. Many of them actually, if you look into it, they do all of them. So basically you can either like have some dual licensing or two versions, like have some your community version and your enterprise version. Something like GitHub, yeah GitHub is doing. So they have the community GitHub version that anyone can download and install and there's the enterprise version of GitHub and there are I think two flavors of their enterprise version which you can buy a license for and install them. So that's one model. There is a model of offering some professional services around your software. And there is the model of offering a hosted version of your software which is something that GitHub is doing as well. But it's not their main thing. The project that I took inspiration most is Century which is actually Python as well. And I change a few emails with David Kramer who was really generous with his time and shared some advice and what they're doing is that they have a SaaS offering which is exactly the same is their open source project. When I started I decided to do practically the same thing. To keep everything open source, and offer it as a service as the SaaS offering. And basically the idea is that anyone when doesn't want to manage his own installation of Redash and have the hassle of uptime and keeping up with the upgrades and all this stuff, he can use the hosted version. What I learned since then is that it takes time to scale a SaaS business especially when it's not your only focus. Like I'm always like juggling between okay, we have the open source version and I have some production issue with the SaaS and you need to prioritize. And then so like building a SaaS business again, like at this point Redash is good enough to just keep selling it. Like to focus on finding the market, finding the market, and blah, blah, blah, blah. But because it's also open source, it has to keep moving forward because no one likes a stagnating open source project. So there is this balance which is not that easy to maintain especially with limited resources. So besides deciding like on this business model, I also decided to keep the company bootstrapped and not raise any money. There were different reasons for that. Essentially it just the thing that I felt most comfortable with.

43:36 Michael Kennedy: This portion of Talk Python To Me is brought to you by Hired. Hired is the platform for top Python developer jobs. Create your profile and instantly get access to thousands of companies who will compete to work with you. Take it from one of Hired users who recently got a job and said, "I had my first offer within four days and I ended up getting eight offers in total. I've worked with recruiters in the past but they were pretty hit and miss. I tried LinkedIn, but I found Hired to be the best. I really like knowing the salary upfront and privacy was also a huge seller for me." Well, that sounds pretty awesome, doesn't it? But wait until you hear about the signing bonus. Everyone who accepts the job from Hired gets a three hundred dollar signing bonus. And as Talk Python listeners, it gets even sweater. Use the link, talkpython.fm/hired and Hired will double the signing bonus to $600.00. Opportunity is knocking. Visit talkpython.fm/hired and answer the door.

44:27 Arik Fraimovich: There have been a few local investors who have been really interested and they even had funny thing that one of our SaaS users we have a integration with Slack that you can share visualizations from Redash on Slack and he has a Slack channel with his investors. So he started sharing with them KPI's and stuff like that from Redash and Slack and so he's invested so then he said like, "Hey what's that? That looks cool. Can you connect me the founder? I might want to invest." But I passed that opportunity."

44:58 Michael Kennedy: Yeah, that's a hard decision to make, right? But as soon as you take funding the clock starts ticking for growth and other types of things that are maybe short term and not necessarily where you want to focus.

45:11 Arik Fraimovich: Well, it's a long discussion. Like Bootstrap versus VC and all but essentially what I realize is that I prefer to keep it Bootstrap so let's just move with it. It definitely puts many constraints on what you can do. Up to this day, I'm still working mostly alone on it. Like full time, I'm the only employee in the company. I'm using some freelancers but full time I'm the only one. I am currently trying to hire another developer. Somebody takes time so it's still only me but hopefully soon there will be another person.

45:47 Michael Kennedy: So do you feel like the having your open source project also powering a SaaS company, do you feel like Redash is better because of the exposure and the experience you got running the SaaS company?

46:01 Arik Fraimovich: Yeah, that's for sure. The hosted version of Redash is probably one of the largest deployments of Redash and it allows me both to stress test it and to find weak points in terms of performance and stuff like that but also it gives me visibility into how people use it. So as an open source project, you practically have no idea how people use your project, what they do it, how they use it, what kind of connectors and all that. So you need to make decisions really blindly compared like, to a SaaS business where they have a lot of data to take and collect.

46:37 Michael Kennedy: Yeah, it's not like Everything Me for example what started this whole thing where you have all these stats coming in and now I'm sure you have like, all sorts of usage stats on like features, how frequently they're used and that type of errors you run into and you wouldn't get that if it was just cloned off GitHub and run the script.

46:53 Arik Fraimovich: Yeah, exactly and so this as well improves the project itself because it allows to make more informed decision. I remember there was one instance where someone wanted to switch some behavior and it felt to me that what he proposed is not common enough but then I checked the data and I realize that he's right. So it really, really helps to be able to see how people use it and to have more direct access to them and to be able to understand what they do with Redash. It has some downsides towards the open source project because there are things around the way like the deployment story and stuff like that that might be investing more into them if it was only a self-hosted solution. We still have quite an easy way to deploy Redash. Like I maintain images for Amazon and Google Cloud and also there are docker images, so it's not that hard to start using Redash and in the last version they even added some wizard that allows you to set up the user without having to use the CLI. That's eventually improving but it could move faster if I wasn't also working on the SaaS project.

48:04 Michael Kennedy: Yeah, it's easy to see it that way but at the same time the SAS project let's you put all of your time into this open source project and not do like, consulting for a bank where you're building forms over data type web apps.

48:18 Arik Fraimovich: Yeah, exactly and at the beginning I've been doing some custom development, paid custom development for, on top of Redash but still like doing some sort of consulting work while over time I could focus more on the product itself. So it definitely has it's benefits. And then still like it's a recurring thing, dilemma for me like around whether I should invest more in the SaaS or should I find a business model around the self-hosted version because even today, there much more users who use the open source version the self-hosted one than ones who use the SaaS version. And there are people who even start with the SAS and then like they use the trial SaaS. Oh, that's cool. Let's move to the open source one for different reasons. Some of them just don't want to open their database to an external company. Others are just cheap. And I even compared the numbers with Century and they said that they see a better ratio between the open source users and the SaaS users. And I think that part of the reason is the kind of product I'm building where you really need to like give access to your data to another company and many people not feeling comfortable with that. So I always like it's a recurring dilemma of whether I should find either another business model or an additional business model where I can offer something to the self-hosted crowd and the obvious thing here is just some kind of enterprise edition but my concern here is that once I start offering an enterprise edition, there are two problematic outcomes here. One is that usually enterprise clients they not only buying software. They're also buying support which means selling services.

50:12 Michael Kennedy: Right. What's your SLA and all of that kind of stuff. Yeah.

50:16 Arik Fraimovich: Yeah, and you start deviating into the world of selling hours instead of selling software. That's one issue and another is that they have completely different demands than what the rest of the users need. And then there is the concern of turning Redash into Jira instead of keeping it as. And this is like something that, it's a slippery slope where you find yourself with a bloated product because of different, bigger clients with varied needs. And today, it's easier to ignore them because they're not paying you but once most like, huge chunk of your revenue comes from them, you need to be more attentive. So this concerns keep me on the SaaS model but it's like a recurring dilemma and I hope that one day I would have a definitive answer here and I will know like, yeah that's what we're sticking with.

51:08 Michael Kennedy: Yeah, it's not super straightforward I can tell, but I think it's really great that you're making it work on either path and the SaaS one is definitely adding value to the world. So, nice work and I'm sure it's inspirational to a lot of people to hear.

51:23 Arik Fraimovich: Yeah, I hope so.

51:25 Michael Kennedy: Yeah, hope so. Alright, well I think we're gonna probably have to leave it there more or less for Redash, so let me ask you two final questions before I let you outta here. When you're working on Redash and you open up an editor, a Python editor. What one is it?

51:40 Arik Fraimovich: So until not long ago it was Atom, but recently I started using Sublime again so I guess it's both of them. And obviously with a dim mode enabled.

51:49 Michael Kennedy: Yeah, of course. Very nice. And what out of the hundred thousand plus PyPI packages, what any do you think's notable that people maybe don't know of but they should check out?

52:00 Arik Fraimovich: So the first one is Peewee.

52:03 Michael Kennedy: P-E-E-W-E-E?

52:04 Arik Fraimovich: Yeah, exactly. It's an ORM.

52:06 Michael Kennedy: Mm-hmm.

52:08 Arik Fraimovich: It's a Python ORM. I think it's not, it should be much more popular than it is.

52:11 Michael Kennedy: I think it's the most Pythonic ORM.

52:14 Arik Fraimovich: Yeah, exactly. I really don't understand how SQLAlchemy getting more mind share than Peewee. It's really like a mystery to me because Peewee much more Pythonic. And Redash been using Peewee up until like, half a year ago when I decided to migrate to SQLAlchemy and no offense to the SQLAlchemy people but I think that was one of the biggest mistakes of 2016 for me. Yeah, I should've stayed with Peewee.

52:42 Michael Kennedy: How interesting. Yeah, you're right. The SQLAlchemy definitely gets a lot of the mind share. SQLAlchemy is great but I do think also Peewee is quite cool. Basically you give it Lambda expressions or generator expression type things and it transforms that into the actual SQL query. I think that's glorious.

52:59 Arik Fraimovich: Yeah, and in just simple views and fun.

53:01 Michael Kennedy: Yeah, exactly and I also noticed that there's an extension per package that will convert it to an asyncio variation. So you can basically create async co-routines and use Peewee queries and await the queries which is, that's just icing on the cake. That's nice.

53:19 Arik Fraimovich: Oh, that's kind of cool.

53:20 Michael Kennedy: Yeah, it's very cool. So thanks for that, those are awesome recommendations. Now final call to action. Like how do people check out Redash? How do they check out the hosted thing? And what do you need from the community? Things like that.

53:32 Arik Fraimovich: Go to Redash.io and you have all the information there. It's super easy to start. Either if you go with the hosted version or even if you start with your own deployment, it's really a few minutes and you can start querying your data. And probably if you have database you need Redash. And contributions are always welcome. Like both code but also documentation. Like this is something that we're not getting enough of, but yeah, any contribution is eventually welcome.

53:59 Michael Kennedy: Alright, well, sounds good. It's definitely a cool project. Arik, thank you for sharing with everyone.

54:04 Arik Fraimovich: Thank you for inviting me.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon