A Stroll Down Startup Lane

Episode #414, published Sun, May 7, 2023, recorded Sat, Apr 22, 2023

Episode Deep Dive Links Transcript

At PyCon 2023, there was a section of the expo floor dedicated to new Python-based companies called Startup Row. I wanted to bring their stories and the experience of talking with these new startups to you. So in this episode, we'll talk with founders from these companies for 5 to 10 minutes each.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guests Introduction and Background

Devin Peterson (Ponder) Devin co-founded Ponder to build upon the open source library Modin, which originated as his PhD project at Berkeley. At PyCon’s Startup Row, he talks about enabling data scientists to write familiar pandas code while pushing computation to databases or distributed systems. Devin shares how their open-core approach caters to both open source users and enterprise needs.

Josh Albrecht (Generally Intelligent) Josh is the CTO of Generally Intelligent, an AI research company in San Francisco focused on building agents that can operate independently and safely. He discusses how their AI agents can handle browser and desktop tasks, delving into concepts of large language models and the difference between local and cloud-based inference.

Mo Sarat (Wereobots) Mo is the co-founder and CEO of Wereobots, whose mission is enabling organizations to leverage geospatial (space-and-time) data. They build a database infrastructure that automates geospatial analytics, bridging structured and unstructured data in real time. Mo’s background includes significant work in open source geospatial databases and Apache Sedona.

Dawa and Jack (Nip Time) Dawa (a long-time Python developer) and Jack co-founded Nip Time, aiming to build Python-programmable spreadsheets with AI assistance. They talk about bridging the gap between data science notebooks and user-friendly spreadsheets, eliminating repeated tasks by letting Python handle data cleaning, automation, and advanced analysis directly in cells.

Federico Garza and Christian Chula (Nixla) Federico (CTO) and Christian (co-founder) are behind Nixla, an open-core time series forecasting platform. They contribute to libraries like StatsForecast and NeuralForecast, enabling businesses and developers to quickly predict future values for use cases like demand forecasting or climate data modeling.

Piero Molina (Predibase) Piero is the CEO of Predibase and the original author of Ludwig, an open source machine learning framework donated to the Linux Foundation. He delves into how Predibase optimizes Ludwig for enterprise: from data connection to large-scale cloud compute to one-click model deployment. Their configuration-driven approach aims to cut months of ML pipeline work down to days.

Nikhil Rao (Pinecone) Nikhil is the CEO and co-founder of Pinecone (as spelled in the transcript). He discusses creating a pure Python framework to build full-stack web apps, seamlessly bridging front-end interactivity and back-end logic without requiring multiple programming languages. By transpiling Python to Next.js (for the UI), and using Python frameworks like FastAPI on the back-end, Pinecone aims to be the go-to for Python developers building rich web applications.

What to Know If You’re New to Python

If you’re just getting started with Python and want to follow along with these startup stories, here are a few essentials:

Code and Data: Familiarize yourself with basic Python data structures (lists, dictionaries) and widely used libraries like pandas.
Foundational Frameworks: Understand how web frameworks like FastAPI or data libraries like Dask fit into the picture.
Interactive Tools: Tools such as Jupyter notebooks or even “Python-powered spreadsheets” (like Nip Time) can help you iterate quickly.
Communities and Conferences: PyCon’s Startup Row showcases how Python fosters innovation. Following these startups can inspire your own Python journey.

Key Points and Takeaways

The Vibrant Ecosystem of Python Startups As evidenced by PyCon’s Startup Row, Python’s versatility powers a wide variety of cutting-edge companies. Founders use Python as the backbone for everything from spreadsheets and geospatial analytics to machine learning and web frameworks. Many incorporate an open-core business model, balancing free community-driven libraries with paid enterprise features. This synergy of community and commerce highlights Python’s broad appeal across nearly every industry domain.
- Links and Tools:
  - Startup Row at PyCon US
  - PyCon
Bridging Python Data Science into Databases (Ponder) Devin Peterson explains that while pandas makes data exploration seamless, it doesn’t naturally scale to large enterprise data or big workloads living in databases. Ponder solves this by translating familiar pandas calls into efficient SQL queries, letting data scientists and developers maintain their Python-first workflow without manually writing SQL or running Spark jobs. This approach can drastically cut data movement, saving time and resources.
- Links and Tools:
  - Modin
  - Dask
  - Ray
Generally Intelligent’s Vision for AI Agents Josh Albrecht shares how Generally Intelligent is building AI agents that can reason and act on their own. These agents handle tasks like coding assistance or personal research, discussing uncertain steps with the user rather than passively generating text. By leveraging large-scale clusters for training, the team pushes the frontier of artificial general intelligence (AGI) safely while still embracing open-source Python tools such as PyTorch.
- Links and Tools:
  - Generally Intelligent
  - PyTorch
Geospatial Data Analytics for Real-Time Insights (Wereobots) Mo describes how Wereobots helps organizations extract deeper insights by analyzing data with its crucial “space-and-time lens.” Whether tracking packages in delivery, insuring homes in hurricane-prone zones, or monitoring climate patterns for agriculture, geospatial data requires specialized queries and optimizations. Their open-core library, built on Apache Sedona, marries Python APIs with advanced geospatial queries at enterprise scale.
- Links and Tools:
  - GeoPandas
  - Apache Sedona
Python-Powered Spreadsheets for Data Science (Nip Time) Dawa and Jack show how Nip Time aims to unify the power of Python with the accessibility of spreadsheets. Instead of hand-coded if statements or complex pivot tables, Nip Time’s cells can run arbitrary Python functions that call APIs, clean data, and apply advanced AI models. By bridging the gap between notebooks and spreadsheets, they cater to collaborative workflows—especially for business users who want the power of Python without learning Jupyter directly.
- Links and Tools:
  - Nip Time
  - Requests (for API calls)
  - pandas
Time Series Forecasting and the Open-Core Model (Nixla) Federico and Christian discuss the importance of time series forecasting for industries like logistics, finance, and agritech. Through Nixla, they offer open-source forecasting libraries such as StatsForecast and NeuralForecast, letting developers combine state-of-the-art statistical or neural network approaches. Their future roadmap includes hosted services and user-friendly APIs that significantly lower the barrier for generating accurate forecasts at scale.
- Links and Tools:
Declarative Machine Learning Platforms (Predibase) Piero describes how Predibase builds on Ludwig, an open source tool he created at Uber. Ludwig allows users to define ML pipelines in a simple configuration file rather than writing extensive boilerplate code. Predibase adds collaboration features, cloud-based scale-out, model repositories, and easy data connections. This end-to-end system slashes project timelines and invites contributors to customize every detail or just stick to out-of-the-box configurations.
- Links and Tools:
Pure Python Web Framework for Full-Stack Apps (Pinecone) Nikhil explains how “Pinecone” compiles a Python-based front end down to Next.js React components while using FastAPI on the server. This approach removes the need for separate JavaScript frameworks for the client and Python for the back end. Developers retain full Python logic for data manipulation, effectively bridging robust Python libraries with a production-ready UI stack.
- Links and Tools:
  - Next.js
  - FastAPI
Open-Core Business Models and Enterprise Adoption Many guests emphasized an open-core approach: providing a powerful free library plus commercial add-ons for security, scalability, or enterprise features. This model encourages community contribution while funding full-time development. It can also create a natural funnel for companies that outgrow free tiers. Organizations like Ponder, Wereobots, and Nixla exemplify how Python-based open source can thrive financially.
- Links and Tools:
  - Apache License 2.0
  - PyPI
Collaboration, Speed, and Python’s Momentum A recurring theme in the interviews is how Python’s ecosystem simplifies collaboration across roles (data scientists, engineers, business teams). Tools like Nip Time for spreadsheets or Predibase for model pipelines empower faster iterations. Founders highlight that the language’s friendly syntax and wide library availability remain major draws, as does the tradition of open source at the heart of many Python-based tools.

Links and Tools:
- pandas
- Flask
- PyCon US

Interesting Quotes and Stories

Devin Peterson (On bridging Python and SQL): “A data scientist literally said, ‘I don’t want a new tool, can you just make my tool run faster?’ That’s how we realized we could generate SQL for them without changing their Python workflow.”

Josh Albrecht (On coding agents): “Imagine an AI that actually sees errors, writes tests, runs them, and iterates. That’s the future we’re building—beyond just autocomplete.”

Mo Sarat (On geospatial): “Location is a fundamental dimension of data. Once you add time, you see the real story unfolding of your moving assets, your climate signals, your risk maps.”

Dawa (On Python in spreadsheets): “The next time you have to clean data, you’re not manually editing cells. It’s a Python script in that cell that just runs, no matter how big or small your spreadsheet is.”

Piero Molina (On building Ludwig): “I built Ludwig at Uber because I was tired of rewriting the same ML boilerplate. Then I realized open sourcing it would let me do it once, and do it right.”

Key Definitions and Terms

Open-Core Model: A business model where a company offers a core open-source project for free, then provides enterprise or paid features on top.
AGI (Artificial General Intelligence): AI systems designed with broad capabilities resembling general human intelligence, rather than domain-specific tasks.
Time Series Forecasting: Predicting future values of data based on historical, time-ordered information (e.g., demand, weather).
Geospatial Analytics: Analysis focusing on data associated with a geographic location, often combined with timestamps for advanced insights.
Configuration-driven ML (Ludwig): An approach where machine learning pipelines are defined via structured configuration files rather than extensive code.

Learning Resources

Here are some courses from Talk Python Training that match topics and tools highlighted in these startup stories:

Getting started with Dask For anyone looking to scale their pandas workflows or handle distributed computation, much like Ponder’s approach of going beyond local data frames.
Move from Excel to Python with Pandas Perfect for those intrigued by Nip Time’s vision of Python-powered spreadsheets and want to transition from manual spreadsheets to Python data tooling.
Full Web Apps with FastAPI Helps with building the type of web apps that Pinecone is enabling, using FastAPI for your backend.
Fundamentals of Dask Another in-depth option for distributed computing and large-scale data handling that resonates with many of the ML and big-data themes in these startup journeys.

Overall Takeaway

These conversations from PyCon’s Startup Row shine a spotlight on Python’s ability to power diverse, innovative startups across AI, geospatial analytics, large-scale data science, and beyond. Each founder has harnessed Python’s simplicity, massive ecosystem, and open-source community to solve complex business challenges—often reducing months of overhead into days or even hours. Whether you’re building a new machine learning product, bridging the gap between data science and spreadsheets, or crafting a full-stack web app without writing JavaScript, the message is clear: Python remains one of the most collaborative and rapidly evolving ecosystems for startups and enterprises alike. Take inspiration from their stories, tap into the open-source movement, and consider how Python can turn your next big idea into a successful, real-world solution.

Links from the show

Ponder: ponder.io
generally intelligent: generallyintelligent.com
Wherobots: wherobots.ai
Neptyne: neptyne.com
Nixtla: nixtla.io
Predibase: predibase.com
Pynecone: pynecone.io
Watch this episode on YouTube: youtube.com
Episode #414 deep-dive: talkpython.fm/414
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #414 deep-dive: talkpython.fm/414

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 At PyCon 2023, there was a section of the expo floor dedicated to new Python-based companies

00:06 called Startup Row. I wanted to bring their stories and the experience of talking with

00:11 these new startups to you. So in this episode, we talk with the founders of these companies for

00:16 about five to 10 minutes each. This is Talk Python To Me, episode 414, recorded on location at PyCon

00:23 in Salt Lake City on April 22nd, 2023.

00:27 Welcome to Talk Python To Me, a weekly podcast on Python. This is your host, Michael Kennedy.

00:44 Follow me on Mastodon, where I'm @mkennedy and follow the podcast using @talkpython,

00:50 both on fosstodon.org. Be careful with impersonating accounts on other instances. There are many.

00:56 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

01:01 We've started streaming most of our episodes live on YouTube. Subscribe to our YouTube channel over

01:06 at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.

01:12 This episode is brought to you by Sentry and us over at Talk Python Training. Please check out what

01:19 we're both offering during our segments. It really helps support the show.

01:24 We kick off the interviews with Devin Peterson from Ponder. Ponder is taking Moden, a distributed

01:29 compute library for Python, and pushing data science compute directly into the database.

01:34 Welcome to Talk Python here on Startup Row.

01:36 Thank you. Thank you.

01:37 Yeah, it's fantastic to have you here. You know, we met yesterday here at PyCon US,

01:42 and you were telling me about your project Ponder and how it's built upon Moden, the open source

01:49 project. And as I looked around, I'm like, everyone here has a story. And I just thought it'd be so

01:53 great to have you on the show along with all the others and just kind of tell your story. You know,

01:58 how did you, how did you get here to start up Row at PyCon?

02:01 Yeah, it's interesting. So Moden started as my PhD project, and I was doing my PhD at Berkeley,

02:07 and I started in the genomics world, trying to build large scale data science tools for,

02:13 for, you know, the people who actually do the science. I'm not a biologist myself. I don't know

02:18 the first thing about biology, honestly.

02:20 But you got some good programming skills, and they can always use that applied to their data,

02:23 right?

02:23 Right. The problem was we were building tools in Spark, and it was really hard for these Spark-like

02:27 APIs to translate natively to the way that they were reasoning about data. And like,

02:31 they're using Python. And so there's a very kind of natural way that scientists think about,

02:36 you know, interacting with data that's not Spark, right? It's not intuitive, as intuitive in Spark,

02:42 even PySpark, right?

02:43 A lot of Python people avoid databases as much as they can, at least SQL and directly talking to them like

02:48 that.

02:48 Yeah, totally. Because like, often the way, when you're exploring data there, you have a mental

02:53 model of how you want to interact with the data. And that is not SQL often. Like, it's just the way

03:00 that it is.

03:01 Yeah.

03:01 So yeah, I had a moment there where a data scientist was like, if I don't, I don't want your tool,

03:08 can you just make my tool run faster? And so I was like, ah, yes, wait a second, this is actually

03:14 a real project. And so we, I started like looking into pandas and looking into like, you know, the,

03:19 the world of, you know, databases and, and like the kind of academic space. And nobody had really

03:24 dug that deep into pandas because in, in the academic sense, everybody was like, okay, pandas

03:29 is just a bad database. That's what database people thought at the time. So we did a bunch of work and

03:35 it kind of turned out that's not the case. They're, they're totally new things. And so from there, we built

03:41 modem and, and now with ponder, we're kind of extending that to, to basically bridge these

03:46 two worlds where you can use Python, but we're, we're generating SQL on the backend and able to

03:51 run pandas directly in your database or your data warehouse.

03:54 Yeah. Fantastic. So when I first heard about what you're doing at ponder, it, I immediately thought

04:00 of Dask and Dask is another popular startup success, open source startup success story with

04:06 Matthew Rocklin and Foreman coiled and stuff. And I mean, I think they may have outgrown startup

04:11 row, but you know, good for them. Yeah, totally. My first thought was, okay, well, how is this

04:16 different than Dask? But the big difference is Dask is grid computing and yours runs in the database.

04:21 Yeah. For ponder, definitely open source motor and also integrates with Dask clusters as well. So

04:27 Dask has Dask data frame and that runs on Dask clusters. We can also run a mode in open source on,

04:33 on Dask clusters. It's very important to us that whatever infrastructure that you

04:36 have, you can run pandas on top of that. So ponder is the next level of that, where if you have,

04:41 if your data is in the database, it doesn't leave, right? We can just execute it directly there.

04:46 And you know, all of your assumptions from Python and pandas hold true in the database,

04:51 even though the database actually doesn't like the assumptions that you might have in pandas,

04:54 right? We emulate those behaviors and we we've done a lot of work to actually make that feel very

04:58 native. So that is a key difference with, with ponder and, and Dask though, is that your data never

05:04 leaves the database. So you don't have to have a separate Dask cluster to kind of pull the data

05:08 into and execute on it there. You can just run things natively in the database or the data warehouse.

05:13 So if you have a large database, you already have a probably powerful database server. Why

05:17 suck, transfer all the data off of that, load it into something else, analyze it and throw it away,

05:21 right? Just like make it run there.

05:23 Exactly. Exactly. Yeah. So maybe a quick elevator pitch type of thing might be like,

05:29 you all take pandas and turn it into SQL statements that run on the database,

05:33 but people get a program in pandas.

05:35 Yes, exactly.

05:36 That's exactly it. Yes. Some things that are really, really native in pandas,

05:41 like describe, for example, df.describe. Super, super common.

05:45 It seems easy. Like it just gives me some summary stats.

05:48 Yes, exactly. That's 300 lines of SQL.

05:51 No.

05:51 Like you wouldn't believe it looking at it though, because you know, it seems so simple and it's,

05:57 it is a simple, simple output, right? I want to get some summary statistics for my data,

06:01 but SQL is so declarative and the language itself doesn't lend itself well to this type of iterative,

06:07 interactive kind of like workflow. So.

06:09 Right. And the notebooks remember step by step, they have like a history sort of a memory,

06:14 whereas SQLs, every statement is standalone.

06:16 Exactly. So all or nothing basically. And you have to do the whole thing up front. And that's

06:22 the thing people love about pandas is that you can incrementally build these things up.

06:25 So, so we're giving that interface to SQL basically.

06:28 Awesome. All right. Well, let's wrap this up with a bit of a bit of a talk,

06:32 how you got to startup row. How'd you start this company?

06:35 Yeah.

06:35 Where are you? Like so many people are excited to take their open source work and instead of making

06:40 it their side job or something they do part-time at their company, make it their full-time energy.

06:45 And you're there. How'd you do it?

06:46 Yeah. So the way that we started was we talked to a lot of companies where they basically asked us,

06:54 can you make this work on top of our infrastructure? We didn't support, we, you know,

06:58 we only supported in the open source Ray and Dask. And we saw a motion there to have kind of an open

07:02 core model. So we follow the open core model where these more enterprise-y features like,

07:06 you know, security features and being able to push into data warehouses, right? Like an individual,

07:11 you know, consultant may not have, you know, a data warehouse. They probably don't, right? But,

07:16 but enterprises do. And these are the types of features that enterprises really care about. So

07:19 this open core model, I think, lended itself really well to our business, particularly because,

07:24 you know, enterprises will pay for these features. And so, yeah. And then we, we went out and we raised

07:30 this, a seed around and, you know, saw the opportunity to come here and be in, in PyCon startup row. And

07:36 fortunately, you know, it's, it's a competitive process. Really it is. Yeah. We're, we're very,

07:41 we feel very fortunate to be, you know, chosen among the few that are chosen here. But yeah,

07:46 that's kind of our journey is, is basically, you know, starting talking like, so for folks out there who are

07:51 like interested in this, talk to people who are using this, people who are interested in the

07:55 problem that you're solving and figure out where the gaps are and kind of ask questions. Don't be

07:59 afraid to ask, like, would you pay for this? Or how much would you pay for this? Those, those questions,

08:03 they're uncomfortable to ask. And like, especially the developer who's not used to presenting salesy type

08:09 marketing things. You always, salespeople as kind of, yeah, I got it. It's a necessary evil.

08:14 Totally. It totally is. Yeah. So, but, but you have to ask, because how do you know if you can kind of

08:19 take that next step? Unless you ask, Hey, would you pay $50 a month for this? Would you pay $10 a

08:24 month for this? Right. You can't know unless you, unless you really go out there and ask. So that's

08:29 what I would encourage folks to do if they're interested in this is, you know, find those gaps

08:32 and, and, and really ask the hard questions that are kind of hard, but yeah. Awesome. Well,

08:37 congratulations. Thanks for taking the time to talk to us. Thank you. Thank you. Yeah, you bet. Bye.

08:41 Next up is Generally Intelligent and Josh Albrecht. Generally Intelligent is an independent

08:45 research company developing AI agents with general intelligence that can be safely deployed in the

08:51 real world. Josh, welcome to Talk Python To Me. Hey, thanks. Hey, it's great to have you here. Tell

08:55 people quickly who you are. Yeah. So I'm Josh, Josh Albrecht. I'm the CTO of Generally Intelligent.

09:00 We're an AI research company based in San Francisco. Awesome. I love the humbleness. Generally,

09:06 generally intelligent, right? You're not a super genius, but no, it's a clever name. I like it.

09:11 Thank you. Yeah. Yeah. And you know, what, what's the problem you're solving here?

09:15 So yeah, we, you know, kind of, as it says on the tin, like we're working on artificial general

09:19 intelligence. We don't usually like to use that term because it can mean lots of different things to

09:22 lots of different people. But in, in general, what we're working on is making more capable,

09:27 safer, more robust AI systems. And in particular, we're focused on agents. So systems that can act

09:32 on their own. And like right now, mostly what we're focused on is agents that can work kind of in your

09:38 browser, on your desktop, in your code editor, those kinds of virtual environments and digital

09:42 environments. How much of this are you envisioning running locally versus running on a big cluster in

09:47 the cloud? Yeah, I think it'd be nice someday in the future to have things run totally locally. But

09:51 right now, a lot of these technologies do require a large cluster of GPUs, which are very expensive.

09:57 And most people don't even have, you know, a GPU or have a bunch of GPUs at home. So it's kind of hard

10:01 to actually get it running locally. Hopefully someday in the future, we'll be able to do that. But for now,

10:05 you'll, you'll probably need internet access to use a lot of these things. Right. Okay. So you're

10:08 envisioning a bunch of these agents that have access to an API that can quickly respond. Right,

10:15 right. Over there. Yeah. Okay. So give us some ideas, you know. Yeah. So what this looks like

10:21 concretely, you can imagine like a coding agent. So one thing you can do with GitHub Copilot right now is

10:26 you can write a function declaration and a doc string and have it generate the function. But you can imagine for a

10:31 coding agent, you can not only generate the function, but also generate some tests, run those tests,

10:35 see errors in those tests, try and fix the errors, kind of do that whole life cycle to ideally give you

10:40 a, you know, output that's actually a lot better. And then also, if you're thinking about this as an

10:44 agent, maybe it's more of a back and forth. It's not just an autocomplete in your editor,

10:47 but it can come back to you and say, you know, I'm sort of uncertain about this part here. What did you

10:51 mean? Or, hmm, like, you know, I wrote these tests, but I'm not sure if it's quite what you wanted. Or maybe,

10:56 you know, it's kind of running in the background and flagging different things that it sees in your

10:58 code base. Like maybe you made some change and it can like detect that you, your doc string is out

11:02 of date and kind of flag that for you. So thinking about it more as like an actual pair programmer.

11:06 Okay. And is it primarily focused on, yeah. Are you thinking to focus mostly on programming or is it

11:11 more abroad? Like I'm looking for a great deal on this classic car, go scour the internet and,

11:17 and, and, you know, negotiate it for me. Yeah. Yeah. So, so, you know, the company is generally

11:21 intelligent. So we certainly do want to be able to address all these different use cases over time.

11:25 I think for us right now, one of the domains that we are interested in is code, especially because

11:30 it's so objective. You can know if it's right or wrong, you have tests, that sort of stuff. So it's

11:33 a nice playground for, for ourselves and something that we can build for ourselves to kind of iterate

11:37 on internally, but we're not exactly sure what the final product will be. We're also training our own

11:42 kind of large language models. We might prioritize some stuff around those. So there's lots of

11:45 possibilities. We're not wedded to anything yet. Thankfully, we have the luxury to kind of take a little

11:49 bit of time to figure that out as a, as a research company. Yeah. That's excellent. What about science?

11:53 Yeah. Science is definitely a thing that we're interested in. It's pretty hard. And so, you know,

11:58 do we necessarily want these things like, you know, running around making things in test tubes or

12:02 whatever? I think that's probably a little bit harder than coding and coding is already pretty

12:05 hard. So I think we'll get there. That's some of the stuff that we like personally on the team are

12:09 really excited about to see, you know, how can we use these to uncover new cures for diseases or

12:13 whatever. I'm really excited for that kind of stuff a little further in the future.

12:16 Yeah. That'd be amazing. I was just talking to someone on the expo floor hall here,

12:19 about protein folding. Yeah. Right. That kind of stuff. Yeah. It's kind of been elusive for people.

12:24 We more or less have just tried to brute force it. Yeah. Right. With the folding at home thing.

12:29 Let's just run every computer and just try every possibility, but there's a lot of possibilities.

12:32 Yeah. Yeah. Exactly. All right. So where's Python fit in here? What are some of the tools that you're

12:36 using? Yeah. So Python is, we love Python. We, we basically write everything in Python or bash,

12:41 but you know, mostly Python or Python generates a little bit bash, you know, but it's mostly Python. So yeah,

12:46 we use a lot of PyTorch for our models. And then other than that, you know, let's see,

12:51 what other libraries do we use? I mean, we use tons of Python libraries like numpy and scikit and,

12:55 you know, adders and just, there's, there's so many like wonderful, you know, things that people have

13:00 built that we just, yeah, that are just so nice to work with. So we love the Python. You can kind of

13:04 take it, open it up, look at all the source and like really understand everything in that full

13:07 stack for us doing research. That's really valuable to be able to know everything that's going on.

13:11 Yeah. You have these Lego block types of things. Like what if we arranged it like this? You don't

13:16 have to write the whole machine learning, but you can click a few pieces together and

13:20 yeah, off it goes.

13:21 Yeah. Yeah. We build on top of Mosaic, for example, or other open source libraries that,

13:25 that people put together for training stuff and kind of adapted for yourself. It's so nice.

13:29 you can just pull things in and so easily change everything.

13:30 Yeah. Awesome. I must've somehow blinked along the way and I, these large language models just

13:36 seem to have come out of nowhere and all of a sudden, you know, AI is one of these things. It's

13:40 kind of where I kind of recommended stuff and now all of a sudden it's mind bogglingly good.

13:45 Yeah. Do things like TensorFlow and stuff work with these large language models or do you need

13:49 other libraries? Yeah. So TensorFlow and PyTorch are probably the two main machine learning libraries

13:55 that people do deep learning systems on top of. Pretty sure that, you know, GPT-3 and GPT-4 were

14:00 probably trained on top of PyTorch. I think a lot of the stuff at Google, like Palm and BART and those

14:05 types of things are trained on TensorFlow, but at the end of the day, they're, they're actually very

14:09 similar and they're sort of converging to kind of similar ideas too as well. So it's interesting to

14:13 see, to see them evolve.

14:14 Yeah. Fantastic. All right. Last question, close out our conversation here is we're sitting here on

14:19 startup row. Well, just outside of startup row, I suppose, but it's, you know, there's a bunch of people out

14:25 here who are working on open source projects who would like to make it somehow find a way to make

14:30 it their passion, their job, spend more time on it, maybe make it a company. How'd you get here?

14:35 Tell people your journey.

14:36 Yeah. So we got here and a little bit of a different route. So we, a lot of us were working at a previous

14:43 company called Sorceress that did apply more of an applied machine learning thing where we are taking

14:48 machine learning and applying it to the job of recruiting and trying to figure out, you know,

14:51 can we find good people online that might be a good fit for a particular position and,

14:54 you know, and reach out to them and get them interested in the job and that sort of stuff.

14:58 We went through YC with this in 2017 and we raised our series A and eventually, you know,

15:03 it was growing. We had a few million in revenue and customers and everything. And it just, in 2019,

15:06 we were looking and it felt like, you know, there's so much really interesting stuff happening in

15:10 self-supervised learning and in deep learning and in machine learning. And it feels like, you know,

15:14 recruiting is very important, but is this going to be the most important thing in the world? Is this going to

15:17 really be the thing that changes the world? Or will there be something a little bit larger in

15:21 this more general purpose AI? And the more we thought about it, the more we felt like, you know,

15:24 the AI stuff is probably going to have a huge impact. Like we should really be working on that.

15:27 So we kind of wound down the previous company, a bunch of us moved over and started up Generally

15:31 Intelligent. And then we've been working on stuff ever since then.

15:33 Fantastic. Well, I know you've got some really cool stuff where the agents can sort of look at the

15:39 code they're writing, think about it, evolve. And it's, it looks like a really interesting take.

15:43 So congratulations. And I'll put the link to the, all your work in the show notes. People can check

15:49 it out. Yeah. Sounds good. Yeah. Thank you very much. Yeah. Thanks for being here. It's great to

15:52 chat. Take care. You bet. This portion of Talk Python To Me is brought to you by Sentry. Is your Python

16:00 application fast or does it sometimes suffer from slowdowns and unexpected latency? Does this usually

16:07 only happen in production? It's really tough to track down the problems at that point, isn't it?

16:11 If you've looked at APM application performance monitoring products before, they may have felt

16:16 out of place for software teams. Many of them are more focused on legacy problems made for ops and

16:22 infrastructure teams to keep their infrastructure and services up and running. Sentry has just launched

16:28 their new APM service. And Sentry's approach to application monitoring is focused on being actionable,

16:34 affordable, and actually built for developers. Whether it's a slow running query or latent payment endpoint

16:40 that's at risk of timing out and causing sales to tank, Sentry removes the complexity and does the

16:46 analysis for you, surfacing the most critical performance issues so you can address them

16:50 immediately. Most legacy APM tools focus on an ingest everything approach, resulting in high storage

16:57 costs, noisy environments, and an enormous amount of telemetry data most developers will never need to

17:03 analyze. Sentry has taken a different approach, building the most affordable APM solution in the market.

17:09 They've removed the noise and extract the maximum value out of your performance data while passing

17:13 the savings directly onto you, especially for Talk Python listeners who use the code Talk Python.

17:19 So get started at talkpython.fm/sentry and be sure to use their code Talk Python all lowercase

17:27 so you let them know that you heard about them from us. My thanks to Sentry for keeping this podcast going strong.

17:36 Now we talk with Mo Sarat from Wereobots. They're building the database platform for geospatial analytics and AI.

17:43 Hey Mo, welcome to Talk Python.

17:44 Thank you so much.

17:45 Yeah, it's good to have you here. Let's start off with a quick introduction. Who are you?

17:49 Absolutely. So my name is Mo and I'm the co-founder and CEO of a company called Wereobots.

17:54 Wereobots' grand vision is enable every organization to drive value from data via space and time.

18:00 Awesome. I love it. I love it. So yeah, thanks for being here on the show. Let's dive into Wereobots.

18:06 What is the problem you're solving? What are you guys building?

18:09 Think about like, again, every single data record that is collecting on a daily basis.

18:14 Even like we're here right now, we're talking on this podcast at this specific location at this specific time.

18:20 So if you think about the space and time aspect, it's actually a very important aspect of every single piece of data that is being collected.

18:25 Right. If we're here next week, who knows why we're here? We could be here for a different reason. That might mean something different, right?

18:30 Absolutely. Yeah. So that's exactly. So that space and time lens that you can apply to your data can actually also tell you a better story about your data.

18:38 You can drive more value, more insights from your data if you apply that space and time lens.

18:43 And this is basically what we are, not necessarily like, this is exactly what we focus on in our company.

18:48 But more specifically, I mean, we are trying to build like kind of a database infrastructure to enable people to use that space and time lens to drive value from their data.

18:58 Okay. Fantastic. Now, when you talk about space and time and data, are we talking records in a time series database?

19:05 Are we talking regular database or NoSQL? Or could it be even things like the log file from Nginx about the visitors to my website?

19:14 What's the scope?

19:15 The scope is actually very wide. So think about any data could be structured, semi-structured, unstructured data that you have.

19:22 And as long as it have like kind of a geospatial aspect to it, a geospatial aspect to it means like the record or the document has, was, let's say, created in a specific location or represent an event that happened in a certain location at a certain time.

19:37 Or represent like, again, like an object or an asset that you monitor at different locations at different times.

19:44 Whatever it is, it can be stored in any of these kind of formats.

19:47 As long as it have this kind of geospatial aspect to it, you can definitely apply that kind of geospatial or space-time lens to it.

19:55 Right. Okay. So what are some of the questions you might answer with Orlot?

19:58 Questions that varies. I mean, so there are, it depends on the type of the data. It depends on the use case.

20:03 You have a horizontal technology that enables you to enable so many industry protocols, but I'll give a couple of examples.

20:08 Yeah, yeah. Make it concrete for us.

20:10 Absolutely. Think about like a logistics company or a delivery company.

20:14 Like the most, I mean, well-known delivery company is Amazon, right?

20:18 I mean, you go to the app, you purchase an item or a product, and then the whole journey of that product from the supplier to the warehouse to the driver, Amazon driver, all the way that makes it to your door.

20:30 There is a whole kind of, everything has a geospatial location to it, attached to it.

20:36 The package is moving around. You're located somewhere. Their house is a certain location.

20:40 Handling the logistics behind all of that, understanding how things are, you're monitoring all these assets in space and time as it reaches the door.

20:50 This whole journey, there's a lot of kind of data processing, data analytics happening that you have to do through, again, the geospatial kind of aspect, the geospatial contextual aspect of things.

21:00 So this is one example.

21:02 Another example could be if you're like an insurance company and you're insuring homes, for example, and you want to understand what are the nearby kind of climate conditions, natural disaster conditions compared to your home.

21:15 There's also the home has a location, these kind of natural disaster, weather changes at different locations all the time.

21:21 That will impact how you take decisions about insuring these homes.

21:25 Do I buy it? Do insurers want to insure it? What do I have to pay to do that?

21:29 Exactly.

21:29 That's another example, again, that space and time lens or the geospatial aspect impacts your decision when it comes to taking, it's an important decision that you take in here.

21:39 So that's another example.

21:40 So these are just a couple of use cases, but there are tons of other use cases and use cases that may not exist even yet.

21:47 So there's a lot of movement now into climate tech and ag tech.

21:51 And we are like what we're trying to do at Whereabouts is we're building the database infrastructure that enable the next generation climate tech and agriculture technology.

22:01 So they can ask the questions that they might have, but you already have the machinery to answer them.

22:06 We have machinery to answer them.

22:07 And they build their own secret sauce on top of our infrastructure.

22:11 So kind of a framework platform.

22:13 Absolutely.

22:13 Yeah.

22:14 Got it.

22:14 Yeah.

22:14 So Python, where's Python fit in this story?

22:16 That's a great question.

22:17 So geospatial data or the geospatial aspect of data has existed for so long.

22:24 As you said, we live in the space-time continuum.

22:26 Everything has a space-time aspect, geospatial aspect.

22:28 And that's why developers already have APIs to interact with geospatial data.

22:33 And these APIs, the language varies.

22:36 So there are some people that use SQL to interact with the data, process the data in either SQL databases or any other kind of SQL processing engine, right?

22:45 But a lot of the geospatial developers or people developing with geospatial data, they use Python.

22:51 There are so many libraries that use Python to actually...

22:55 Example of these libraries is a library called Geopandas.

22:57 It's a fantastic library.

22:58 It's an extension to Pandas to kind of wrangle and crunch geospatial data.

23:03 Ask questions about what things are contained in here, what things are outside of here, how far away is it?

23:08 Absolutely.

23:08 So this is what Geopandas does.

23:09 The only problem is that Geopandas is a library, has a great functionality, but again, it doesn't...

23:15 It's not enterprise-ready for the most part.

23:17 It doesn't scale, all that kind of stuff.

23:19 So what we do at WorderBots is that we provide SQL API to the user to run spatial queries on the data, but we also provide spatial Python API.

23:29 Like if you're using Geopandas, you can use the same API, do the heavy lifting enterprise scale kind of processing of the data using our platform, and then do the major Geopandas kind of functionality you're familiar with to, again, do the geospatial processing with it.

23:46 So this is how it fits within Python, and actually looking at our...

23:50 We have an open source software called Apache Sedona.

23:53 It's an Apache under the Apache license, and it has all these APIs, SQL and Python, and Python is the most popular.

24:01 So it's been the Python package alone on PyPy is being downloaded a million times over on a monthly basis as we're speaking today.

24:11 So definitely Python fits very well within our...

24:15 Yeah, that's awesome.

24:16 Absolutely, yeah.

24:16 So it sounds like your business, WorderBots, is a little bit following the open core model, you say?

24:23 Yes.

24:24 Let's round out our conversation here with a...

24:25 Talking about the business itself, how do you get a startup row?

24:28 We follow the open core model.

24:30 You're totally right about that.

24:31 So we have our open source software, Apache Sedona.

24:33 It's available for free open source, very permissive license, the Apache license 2.0.

24:37 And it's the open source.

24:39 It's also used in operational production in so many use cases.

24:41 There are so many contributors outside.

24:43 I'm the original creator of it, as well as my partner, Gia.

24:46 We're both the original creators, but it's grew beyond us now.

24:49 So there are like dozens, like 100 contributors now, something like this.

24:53 And we use Sedona as an open core, but we build a whole platform around it.

24:57 So if you want to think about like what we do compared to the other data platforms in the market, there are generic data platforms like Snowflake Databricks.

25:06 There are more specific specialized data platforms like MongoDB for NoSQL.

25:11 There's Neo4j for Graph.

25:12 We are...

25:13 Webinar Bots is like the data platform for Geospatial.

25:16 So this is basically...

25:18 And we use Apache Sedona as an open core to enable us to do all of this.

25:21 Fantastic.

25:22 All right.

25:23 Well, congratulations on being here.

25:25 Yeah.

25:25 I wish you success with the whole project.

25:27 And thanks for coming on the show.

25:28 Thank you so much.

25:29 I appreciate it.

25:30 Looking forward to it.

25:31 Yeah.

25:31 You bet.

25:31 Thank you so much.

25:32 Yeah.

25:32 Bye.

25:32 Time to talk to Nip Time, who have created Python programmable spreadsheets that are super powered with Python and AI.

25:40 I got to tell you, this product looks super awesome.

25:43 It looks so much better than things like Google Sheets or Excel.

25:46 And I can't wait to get a chance to play with it.

25:48 Hey, guys.

25:49 Hello.

25:50 Welcome to Talk Python To Me.

25:51 Yeah.

25:52 It's great to have you here.

25:53 First, introduce yourselves.

25:54 Thanks for having us.

25:55 I'm Dawa.

25:56 I've been doing Python professionally for, I don't know, 20 years or so.

26:00 I'm Jack.

26:02 I'm Dawa's co-founder.

26:03 Uh-huh.

26:03 Been doing Python a little less than that, but met Dawa about five years ago.

26:07 And we founded Nip Time about a year ago.

26:10 Yeah.

26:10 So, you know, let's dive into it.

26:13 Nip Time, what's the product?

26:15 What's the problem you're solving?

26:16 Yeah.

26:17 The proposition that we have is pretty straightforward.

26:19 We build a spreadsheet on top of a Jupyter notebook engine, which basically gives you all the data science, superpowers that the notebook gives you in a familiar way.

26:31 It's a familiar spreadsheet environment, which means that you can share your work as a Python programmer much easier with people that are not familiar with notebooks because they have the universal data canvas of a spreadsheet.

26:42 How interesting, because one of the big challenges data scientists often have is they work in Jupyter, they work in Jupyter, and then some executive wants to share it at a presentation or they want to continue working on it, but they're not developers.

26:55 So what do you do?

26:56 You write an Excel file and you hand that off and then you re-import it somewhere, maybe?

27:01 I don't know.

27:01 Yeah, yeah.

27:02 The typical flow is indeed very much like you write out the CSV, you email that to the person that is going to put it into Excel.

27:10 That person then creates a graph in Excel, screenshot that graph in Excel and sends it to the person that puts it in the presentation and then the CEO can do something with it.

27:19 It goes either in PowerPoint or it goes in Word.

27:21 Yeah, one of those two, right?

27:23 Probably the picture.

27:24 But that's a bunch of steps that are disassociated from data.

27:28 So that's one problem, right?

27:29 That's the one problem.

27:30 But since no one really sees your product in action while we're talking here, maybe just a bit of an explanation.

27:36 It looks very much like Google Docs or one of the online Excel, I say Doc, I mean Sheets, like one of the online spreadsheet things.

27:44 It doesn't look like something embedded into notebooks, right?

27:48 Yeah, that's right.

27:49 It is a spreadsheet first and foremost.

27:51 It looks a lot like Google Sheets, but you can run Python in it.

27:54 Yes.

27:55 You can run Python both directly in the spreadsheet cells.

27:57 You can also define other functionality in Python and then run that with your spreadsheet.

28:02 Yeah, I mean, to me, that's where the magic is, right?

28:05 Like Excel or Sheets, the spreadsheets more broadly are super useful.

28:10 But it's always like, how do I do an if statement in this dreaded thing again?

28:14 And how do I do a max with a condition?

28:17 You know, just all the programming aspect of going beyond just having raw data is just like, oh boy, this is.

28:23 And you just showed me an example where like here, you just write range of a thing and boom, it just writes that out.

28:27 Or you write a Python tertiary statement and it just runs.

28:32 Right.

28:33 Yeah, but also common things in spreadsheets that are hard are data cleaning, right?

28:38 You get some data from somewhere and it's not quite right.

28:41 And most of the time people end up doing this by hand.

28:43 And that's fine the first time you do it.

28:46 The second time and the third time it gets very annoying.

28:49 While if you just write a little bit of Python, you can clean data like that.

28:53 Yeah.

28:53 And then the next time you have the data, you just rerun the script and it's clean again.

28:57 So that's a very powerful way of doing this thing.

29:00 And we have a full Python environment.

29:03 It's not just a lightweight, you know, runs in the browser.

29:05 You can do pip install anything you want.

29:08 So you can connect any API out there, use any data, export any data.

29:13 It's a complete environment.

29:15 Yeah, how interesting.

29:15 Yeah, there's a little window where you can write straight Python, you know, some function that does arbitrary Python.

29:22 And then you can invoke it like a function in the spreadsheet, right?

29:25 Exactly.

29:26 Exactly.

29:26 And you can talk to things on the internet.

29:28 For example, I could do web scraping there.

29:30 Yes.

29:31 So you can call an API, like a currency API?

29:33 Yeah, exactly.

29:34 Okay.

29:35 But yeah, any REST call you want to make, you just import requests and go for it.

29:40 Wow.

29:41 So where does it run?

29:42 Is this PyScript, Pyodide?

29:44 Is this Sculpt?

29:45 Is this Docker on a server?

29:47 It's all running in a Docker container.

29:49 Okay.

29:49 Server side.

29:50 That's how it works.

29:51 And that's kind of, we do that for maximum flexibility, maximum capability.

29:55 So it means that anything you can install, anything you can run on a Jupyter notebook running on Linux, you can run in Neptune.

30:02 I see.

30:02 So we get full Python 3.11 or 3.10 or whatever it is.

30:05 Yep.

30:06 And, you know, we ship with a bunch of useful packages pre-installed.

30:10 But if you want to install something else, you just open up our dependency management window.

30:14 Okay.

30:15 Install anything else you want to use.

30:16 It's all very manageable, very configurable.

30:19 Well, it looks super good to me.

30:21 What's the user model?

30:23 Do I go and create an account on your site and it's kind of like Google Docs or what's the story?

30:27 Yep.

30:28 Exactly.

30:28 You can try it out.

30:29 You can go to neptine.com in the upper right.

30:31 Just click log in.

30:32 You can create an account.

30:33 It's totally free to use the free tier.

30:36 Yeah.

30:36 Give it a shot.

30:37 Awesome.

30:38 All right.

30:39 Final question.

30:39 You know, how did you guys get here in Startup Row?

30:41 You know, everyone wants to build something amazing with open source, but how did you turn that into a business and something you can put your full time into?

30:48 I mean, I guess we're kind of lucky in that when we started, I, you know, I pitched it to a bunch of people that due to no fault of their own got into some money.

31:03 And they, they were willing to back us.

31:05 And then later we joined YC for the winter batch.

31:09 Awesome.

31:09 And in that process, we, you know, got a little bit of publicity and were picked up for the Startup Row.

31:16 Just to add to that too, based on our experience in Y Combinator, there are lots of open source tools out there that are able to get started on some commercial path just, just based on the community that they're building, based on the users.

31:28 Right.

31:29 It's a, it's a very good path.

31:31 I feel like this whole open core business model has really taken off in the last couple of years where it used to be a PayPal donate button.

31:38 And now it's a legitimate offering that businesses will buy.

31:41 And it's good.

31:42 I think it's very positive.

31:43 So I'm really impressed with what you guys built.

31:46 I think it's awesome.

31:47 I think people really like it.

31:48 Yeah.

31:49 So good luck.

31:50 Thanks for being here.

31:51 Thank you so much.

31:52 Now up is Nixla.

31:53 We have Federico Garza and Christian Chula here to tell us about their time series startup ready to make predictions based on an open source time series ecosystem.

32:02 Hey there.

32:02 Hello.

32:03 Welcome to Talk Python.

32:04 Hello.

32:05 Hello.

32:05 Let's start with introductions.

32:06 Who are y'all?

32:07 So I am Christian Chalew.

32:09 I'm a co-founder of Nixla.

32:11 Yep.

32:11 Hello.

32:12 I'm Fede.

32:12 I'm CTO and co-founder of Nixla.

32:14 Nice to meet you both.

32:15 Welcome.

32:16 Welcome to the show.

32:17 Really great to have you here at PyCon.

32:19 And yeah, let's start with the problem y'all are trying to solve.

32:23 Okay.

32:24 Yeah.

32:24 So at Nixla, what we do is time series forecasting.

32:27 So as you know, time series forecasting is a very relevant task that a lot of companies and practitioners need to solve.

32:36 So essentially predicting future values of something, right?

32:40 It could be demand of a product or the weather.

32:42 So there are many use cases for forecasting.

32:45 It's a very common problem in industry.

32:47 And essentially we want to provide tools to developers, engineers, researchers to be able to do this more efficiently and with good practices.

32:55 And yeah, that's mostly it.

32:58 Right.

32:58 Okay.

32:58 So is this like a Python API?

33:01 Is this a database?

33:02 What is the actual product?

33:06 The product.

33:06 Yeah.

33:07 So we have an ecosystem of Python libraries and we have different libraries for different use cases.

33:14 For example, we have the stats forecast library, which specializes in statistical econometric models.

33:21 And also we have a more complex models and libraries for deep learning and machine learning applications.

33:30 Yeah.

33:31 Nice.

33:31 And if you train some of these models yourself on certain data, things like that, or where do you get the models from?

33:38 The idea behind the libraries is that you can use whatever your data is.

33:43 The only restriction is that it must be time series data, but you can use whatever data you have.

33:50 Okay.

33:51 Fantastic.

33:52 And where's its data?

33:54 Python's at the heart of so much data processing these days.

33:57 And, you know, I guess give a shout out to all the different Python packages that are already out there.

34:02 Maybe you want to just give a rundown on those and what they're for and then talk about them.

34:06 Yeah.

34:06 So we have like six packages right now.

34:10 They are all libraries on GitHub that you can install or install it with Conda.

34:14 And essentially they focus on different ways of approaching forecasting.

34:18 And they're essentially libraries built on Python, depending on some of them built on Numba.

34:22 Other methods are in.

34:24 Oh, you guys are using Numba.

34:25 Yeah.

34:26 Oh, okay.

34:26 And it makes a huge difference?

34:28 Yeah.

34:28 It makes a huge difference.

34:29 All right.

34:29 Yeah.

34:30 Tell people really, really quickly.

34:31 What is Numba?

34:32 So Numba is this library which allows you to compile just in time your code.

34:38 So it's a lot faster than using just plain Python.

34:42 And how easy is it to use?

34:44 It's really easy.

34:45 Okay.

34:46 Yeah.

34:46 In fact, we wanted to make our library more efficient and more faster.

34:54 And we did it in like two weeks only using Numba.

34:58 So it was really easy to use.

35:00 Yeah.

35:00 Awesome.

35:01 Awesome.

35:01 And some other packages you see is PyTorch.

35:04 So like our deep learning methods, neural forecasting approaches are built on PyTorch or PyTorch Lightning.

35:11 Yeah.

35:12 Fantastic.

35:13 So would you say that your business model is something of an open core model where it's kind of built on top of these libraries?

35:19 Absolutely.

35:20 Yeah.

35:21 Yeah.

35:21 So for now, we have been focusing on building these libraries, the community.

35:25 We have a very active community on Slack and people that use us and contribute with our code.

35:29 And we are building services on top of these libraries like enterprise solutions or hosting computation or even simplifying the usage further.

35:39 So for example, APIs where you can just simply pass your data.

35:43 I want to know what is going to happen next on this data.

35:45 Do you pass us some historical data and ask us to make predictions?

35:49 Make predictions and then we produce the predictions.

35:51 Okay.

35:52 Yeah.

35:52 This is one of the types.

35:53 Yeah.

35:54 So we are working on these different applications and services.

35:57 Awesome.

35:57 It sounds really cool.

35:58 Thanks.

35:59 So final question, how do you make your way over here to Startup Row at PyCon?

36:03 Like how did you start your company and how did you get here?

36:06 Yeah.

36:06 It has been a long journey.

36:08 I mean, we have been like for a year working on these libraries and services.

36:16 And right now we are focusing on building the startup, right?

36:20 We want to be able to do this full time for a long time and really, yeah, build something that can help people.

36:26 Yeah.

36:27 Are you looking to offer an API, like an open AI sort of model or running people's code as a service?

36:34 Or where are you thinking you're going?

36:35 Yeah.

36:36 Yeah.

36:36 That's definitely one of the options.

36:38 But yeah, we are finishing our funding runs.

36:41 And once we finish that, funding helps a lot on software development.

36:46 Funding helps a lot on development.

36:47 And yeah, so we are exploring different venues and there's very exciting things to come.

36:52 All right.

36:53 Well, we all wish you the best of luck on your project.

36:56 And thanks for taking the time to talk to us.

36:58 No, thank you for inviting me.

36:59 Yeah, you bet.

36:59 Thanks.

37:00 Bye.

37:00 We'll speak with Piero Molina from PrettyBase.

37:02 They empower you to rapidly build, iterate, and deploy ML models with their declarative machine learning platform.

37:09 Piero, welcome to Talk Python To Me.

37:10 Thank you very much for having me.

37:12 Yeah, it's fantastic to have you here.

37:13 Quick introduction for everyone.

37:15 Sure.

37:15 So I'm Piero and I'm the CEO of PrettyBase.

37:18 Can tell you about PrettyBase in a second.

37:20 I'm also the author of Ludwig, which is an open source Python package for training machine learning models.

37:26 And yeah.

37:27 Awesome.

37:28 Well, great to meet you.

37:28 Tell us about your company.

37:30 Yeah.

37:30 So PrettyBase tries to solve the problem of the inefficiency in the development process

37:37 of machine learning projects.

37:39 Usually they take anywhere from six months to a year or even more, depending on the organization's,

37:45 you know, their degree of expertise in developing machine learning projects.

37:49 And so with using our platform, companies can get down to like from months to days of development.

37:56 And that makes them substantially faster.

37:58 Each machine learning project becomes cheaper.

38:01 And organizations and teams can do many more machine learning projects.

38:06 Yeah.

38:06 I mean, training is where the time and the money is spent.

38:09 Yeah.

38:10 At least computationally.

38:11 I mean, paying developers is expensive too.

38:12 Right.

38:13 But in terms of people say machine learning or AI, it takes all this energy.

38:17 And it does take energy to answer questions, but it really takes energy to train the models,

38:22 right?

38:22 Yeah.

38:23 Definitely.

38:23 There's training models is a huge part.

38:25 Managing the data and putting it in a shape and form that is useful for training the models.

38:30 There's also another big piece of the reason why these teams take so long to develop models.

38:37 And also, usually there's several people involved in the process.

38:42 There are different stakeholders.

38:44 Some of them are more machine learning oriented.

38:46 Some of them are more engineers.

38:47 Some of them may be analysts or product developers that need to use the models downstream.

38:52 And so, the handoff of the artifacts and of the whole process between these different people

39:00 is also source of a lot of friction.

39:03 And with the platform that we are building, we are trying also to reduce the friction as

39:06 much as possible.

39:07 Yeah.

39:07 Sounds great.

39:08 Is it about managing that workflow or is it about things like transfer learning and other

39:14 more theoretical ideas?

39:17 Like where exactly are you doing this?

39:20 So, to give you a little bit more of a picture, I would say where we are starting from is from

39:25 Ludwig, which is his open source project.

39:27 And what Ludwig allows people to do, it allows to define machine learning models and pipelines

39:32 in terms of a configuration file.

39:35 So, you don't need to write the low-level PyTorch or TensorFlow code.

39:39 You can just write a configuration that maps with the schema of your data.

39:43 Okay.

39:43 And that's literally all you need to get started.

39:46 So, it makes it substantially easier and faster to get started training models.

39:50 Then, if you are more experienced, you can go down and change more than 700 parameters

39:54 that are there and change all the details of training, of the models themselves, the pre-processing,

40:01 so you have full flexibility and control.

40:02 And you can also go all the way down to the Python code, add your own classes, register them

40:09 from the decorator, and they become available in the configuration.

40:11 Very cool.

40:12 This is what we have in the open source.

40:14 Right, right.

40:14 And what we're building on top of it is all the...

40:17 Again, you can think about this for people who may be familiar with Terraform, for instance.

40:22 What Terraform does for infrastructure, so they're finding your infrastructure through a configuration

40:26 file.

40:27 Ludwig does it for machine learning.

40:29 Got it.

40:29 That's a good analogy.

40:30 Okay.

40:30 And so, PratyBase, what does it, it uses this basic concept of models as configuration,

40:36 really, and builds on top of it all sorts of infrastructure that is needed for organizations

40:41 that are big enterprises to use it in the cloud.

40:44 Okay.

40:44 So, we have, like, we can deploy on cloud environments.

40:48 We abstract away the infrastructure aspect of it, so you can run the training of your models

40:53 and inference on either one small CPU machines or a thousand large GPU machines, and you don't

40:59 need to think about it, basically.

41:00 Oh, cool.

41:00 So, I just say train it, and if you happen to have GPUs available, you might use them.

41:05 Right, absolutely.

41:06 Okay.

41:06 Excellent.

41:07 So, where does PratyBase fit into this?

41:11 Like, where's the business side of this product?

41:14 Right, right.

41:14 I would say PratyBase makes it easy for teams, really, to develop machine learning products,

41:20 right?

41:20 As if, Ludwig, you can define your own configuration.

41:23 But it's like, you know, a single user experience, if you want, right?

41:26 PratyBase becomes like a multi-user experience, where, again, you deploy on the cloud, and you

41:32 can connect with data sources.

41:33 In Ludwig, you provide, like, a CSV file or a data frame, a Pandas data frame.

41:37 With PratyBase, you can, like, connect to Snowflake, to Databricks, to MySQL databases,

41:43 to S3 buckets, and do all of those things.

41:45 And also, there's a notion of model repositories, because when you start to train a model, the first

41:50 one is never the last one that you train.

41:52 And so, and an analogy to Git, really.

41:55 In Git, you have commits, and you have teams doing different commits and collaborating together.

41:59 In our platform, you have multiple models that are configurations, and multiple people

42:03 training new different models, spawning from the previous ones.

42:06 So there's all a lineage of models that can be compared among each other.

42:09 Yeah.

42:09 And then the very last piece is that we make it easy to deploy these models with one click of a button.

42:14 So you go from the data to the deployed model very, very quickly.

42:18 Fantastic.

42:18 It sounds great.

42:19 So final question.

42:20 A lot of people out there working in open source, they'd love to be here on Startup Row,

42:25 talking about their startup based on their project.

42:28 It sounds like what you built is based on the open core model, which seems to be really,

42:32 really successful these days.

42:33 You know, tell us a bit about how you got here.

42:36 Right.

42:36 So basically, I think it started from the open source, really.

42:39 I started developing Ludwig when I was working at Uber.

42:44 And initially, it was like my own project was a way for myself for being more efficient and

42:50 working on the next machine learning project without reinventing the wheel every single time.

42:54 And I built that because I'm lazy and I don't want like when I do one thing more than twice,

42:59 then I automate it for myself, really.

43:01 Productive laziness or something like this.

43:04 And so then other people in the company started using it.

43:07 And that convinced me that making it open source, also because it was built on top of other open

43:12 source projects, would have been a great way to both have people contribute to it and improve

43:16 it and also give back to the community.

43:17 Because again, I was using myself a lot of open source projects to build it.

43:21 Right.

43:22 And then from there, I made it so that we donated the project to the Linux Foundation.

43:27 So now it's backed by the Linux Foundation.

43:29 And also the governance is open as opposed to what it was before when I was at Uber.

43:33 And from there, actually, you know, I met a bunch of people, some of my co-founders at

43:38 the company, thanks to the project.

43:40 And we decided that, so for instance, one of them is Professor Chris from Stanford.

43:44 He was developing a similar system that was closed internally at Apple.

43:48 And so we said, this thing worked at Uber, worked at Apple, works in the open source,

43:52 open source.

43:52 Let's make a company out of this.

43:54 Right.

43:54 Fantastic.

43:55 Yeah.

43:55 Solving some problems for these big teams.

43:57 Right.

43:58 Excellent.

43:58 Well, best of luck on your company.

44:01 Thank you very much, Mike.

44:01 Yeah.

44:02 Thanks for being here.

44:02 Yeah.

44:03 Absolutely.

44:03 A pleasure.

44:04 Thank you so much.

44:04 We'll finish up our stroll down startup lane by talking with the folks at Pinecone.

44:08 We have Nikhil Rao to talk about the pure Python full stack web app platform that they've

44:14 built.

44:14 Nikhil, welcome to Talk Python.

44:16 Yeah.

44:16 Great to be here.

44:17 Thanks for having me.

44:17 It's great to have you here.

44:19 I love going through all the different projects on startup row and talking about them and

44:24 shedding a little light on them.

44:25 So happy to have you here.

44:26 Yeah.

44:27 Yeah.

44:27 Give a quick introduction on yourself.

44:28 Yeah.

44:29 So I'm Nikhil.

44:29 I'm the CEO, co-founder of Pinecone.

44:32 And we're building a way to make web apps in pure Python.

44:35 So we have an open source framework and anyone can install this and basically start creating

44:39 their apps front end and back end using Python.

44:42 Our company went through the recent Y Combinator batch, just ended the winter 23 batch.

44:47 And recently we raised our seed ran and starting to hire out and pretty much grow out our project

44:51 and company from here.

44:52 Okay.

44:52 Well, awesome.

44:53 Congratulations.

44:54 That sounds really cool.

44:55 Give us an idea of, I guess, you know, why do you build this, right?

44:59 We've got Flask.

45:00 We've got Django.

45:00 Yeah.

45:01 I mean, heck, we even have Ruby if you really want it.

45:03 Yeah.

45:03 There's a lot.

45:04 So previous to this, like you mentioned, there's frameworks like Flask and Django.

45:08 And whenever you wanted to, a Python developer wanted to make a web app, they use something

45:12 like this, but you always have to pair it with another front end library.

45:14 So you can't just make your front end using Python.

45:16 You still have to end up using JavaScript, HTML, React, stuff like that for your front end.

45:20 And so a lot of people, if you're coming from a Python background, it's a lot of work

45:24 to kind of get started with these.

45:26 It's a different language, different tool set.

45:27 So we really wanted something where Python developers can just use these tools they already know

45:31 and be able to make these web apps without having to go learn something completely different.

45:35 So as opposed to these tools like Flask and Django, we're very focused on unifying the front

45:39 end and back end into one framework.

45:40 So you don't need a separate front end and back end.

45:42 And that allows us to kind of, the user can just focus on the logic of their app and not

45:47 kind of these technical details on the networking and all this other stuff.

45:49 Yeah.

45:50 It sounds interesting.

45:50 I mean, I know many Python people who don't want to do JavaScript.

45:54 Yeah.

45:55 They don't want to do multiple languages.

45:57 But, you know, it's traditionally, at least in the web framework world, you're speaking

46:02 many, many languages.

46:03 You're speaking HTML, CSS, JavaScript is a big one.

46:08 And honestly, I think that there was a period where people were super invested in JavaScript

46:13 and thought that was kind of the right way or the necessary way.

46:17 And that would take away a lot of what's important about the web framework, right?

46:21 Like, well, it doesn't matter if it's Flask or Django, we're just going to return JSON

46:25 anyway because it's all Angular.

46:26 So who cares, right?

46:27 Yeah.

46:27 But I don't think that's where people really, most, many people, at least the people choosing

46:31 Python want to be.

46:33 And so, yeah, how is your stuff different?

46:35 So I think exactly what you said before this, to make a serious web app, you always have to

46:39 go to JavaScript.

46:40 And what we're really trying to do is make everything in Python, including your front end.

46:44 And so basically, yeah, we're trying to integrate the two together.

46:47 So basically, you don't have to go learn these technical details you didn't want before.

46:52 We realized for all the logic of your app, you're using Python anyway.

46:55 Like Python's used in so many industries, databases, ML, AI, infrastructure.

47:00 And when these people want to make a front end, it is possible to make JavaScript or these

47:04 JavaScript front ends.

47:05 But that's a lot of overhead.

47:06 And before our kind of our framework, there are different low code tools to make front

47:11 ends without JavaScript.

47:11 But they all kind of have a limit.

47:13 And they all have a graduation risk is what we found.

47:15 So you can start making your UI.

47:17 Can you make any website with them?

47:19 Right.

47:20 Like Streamlit and Anvil are both notable ones that kind of come to mind.

47:24 But neither of them, I like them both a lot, but neither of them are necessarily like, I'm

47:28 just going to build a general purpose web app.

47:31 They're focused in their particular area.

47:33 Yes, exactly.

47:34 So I've used tools like Streamlit, Gradio in the past.

47:36 And a lot of that was inspiration for Pinecone.

47:38 It's really great because it's super easy to get started with.

47:41 You don't have to go learn these things, but they all have this kind of ceiling you

47:44 hit.

47:44 So they're mostly good for like data science apps, dashboard apps.

47:48 But as you try to expand your app into like a full stack web app, start adding these new

47:51 features, a lot of times you find these frameworks don't really scale with your ideas.

47:55 And your two options are either you have to kind of constrain your idea into what these

47:59 vendors offer you, or you use that for prototyping.

48:02 And when you're making a customer facing production app, you scrap it and go to like a JavaScript

48:06 React world.

48:07 So what we're really trying to do is make something like these Anvil or Streamlit easy

48:11 to get started with for Python developers.

48:12 But as you want to expand to these complex cases, you should be able to stay in our framework

48:16 and we should be able to handle that also.

48:18 Interesting.

48:18 So how does the front end interactivity work if it's Python?

48:21 Yeah.

48:22 And this is also where I think we're a bit different.

48:23 We're trying to really leverage a lot of the web dev ecosystem and not recreate everything

48:27 from scratch.

48:28 So for the front end, we leverage React and Next.js.

48:30 So our front end compiles down to a Next.js app.

48:32 And from this...

48:33 Oh, you're transpiling the Python?

48:35 We transpile the Python to Next.js.

48:36 And this gives you a lot of great features.

48:38 You get single page app features from Next.js, a lot of these performance features.

48:41 And that means from our perspective, we don't have to recreate all this stuff.

48:45 And also, we don't have to create components one by one.

48:48 We just leverage React.

48:49 And for what we do in Pinecode for the front end is we just wrap React components and make

48:53 them accessible.

48:53 So even if we don't offer something, and other local tools, sometimes if they don't offer a

48:58 component you need, you may be kind of constrained in what you can build.

49:01 We easily have a way for anyone to wrap their own third-party React libraries.

49:05 So we're really trying to make the existing stuff out there accessible rather than recreating

49:09 it.

49:09 Yeah.

49:10 So you can sort of extend it with React if you get boxed in.

49:13 That's your escape hatch.

49:13 Exactly.

49:14 Okay.

49:15 So that's kind of how our front end works.

49:17 And for the back end, we use FastAPI to handle all the states.

49:20 So the user state is all on the back end on the server.

49:22 And this is what allows us to pretty much keep everything in Python.

49:25 So none of the logic is transpiled to JavaScript, only the React.

49:28 And all the logic stays in Python.

49:30 So you can use any of your existing Python libraries, any existing tools.

49:33 You don't have to wait for us to kind of make these integrations.

49:36 So it's kind of leveraging React, but also leveraging Python and kind of bringing them together.

49:40 What's the deployment look like?

49:41 So we're working on an easy deployment.

49:43 So you can just type PC deploy.

49:45 We'll set up all your infrastructure and you'll get a URL back with your app live.

49:48 But also we're an open source framework.

49:50 So it's also very easy to self-host and self-deploy.

49:52 And so what we're really trying to do is make it accessible and easy, but never kind of lock

49:57 you into our framework.

49:58 I see.

49:58 So I could put like Nginx in front of it or something.

50:01 Exactly.

50:01 Like, so right now we're still working on our hosting deployment.

50:03 So everyone right now who's deployed is hosting on AWS DigitalOcean or a tool like this

50:07 with Nginx.

50:08 Yeah.

50:08 So it integrates just like you would deploy a Flask or React app.

50:11 Got it.

50:11 But we're really trying to make an optimized hosting service around this later.

50:14 Yeah, sure.

50:15 It makes sense.

50:16 All right.

50:16 Sounds like a great product.

50:17 Thanks, sir.

50:18 Final question here.

50:20 You know, how'd you get here?

50:21 How'd you start the company?

50:22 How'd you land on Startup Row?

50:24 I mean, you talked about Y Combinator a little.

50:25 Yeah.

50:26 So I talked a little bit.

50:27 We did the Y Combinator batch.

50:28 And really the idea is not only having an open source framework, but having like a business

50:33 model around it and being able to create like these features around it.

50:36 So we're really focused on kind of being similar to have an open source framework, similar to

50:41 like how Vercel has Next.js and their hosted version and kind of bringing that to the Python

50:44 community.

50:45 So Python is like one of the fastest growing languages.

50:47 Obviously, like that's why Python is so big.

50:49 And for the web dev part, there's not really a good ecosystem for that.

50:53 So when people want to share their ideas, we're really trying to become that de facto

50:56 way for Python developers to create their apps and share.

50:59 And so, yeah, basically working on our hosting service, growing out our team now and trying

51:03 to build up all this like ecosystem around it so people can easily get their ideas out

51:07 to the world.

51:07 Awesome.

51:08 Well, congratulations and thanks for being here.

51:10 This has been another episode of Talk Python To Me.

51:13 Thank you to our sponsors.

51:15 Be sure to check out what they're offering.

51:17 It really helps support the show.

51:18 Take some stress out of your life.

51:21 Get notified immediately about errors and performance issues in your web or mobile applications with

51:26 Sentry.

51:27 Just visit talkpython.fm/sentry and get started for free.

51:32 And be sure to use the promo code talkpython, all one word.

51:35 Want to level up your Python?

51:37 We have one of the largest catalogs of Python video courses over at Talk Python.

51:41 Our content ranges from true beginners to deeply advanced topics like memory and async.

51:46 And best of all, there's not a subscription in sight.

51:49 Check it out for yourself at training.talkpython.fm.

51:52 Be sure to subscribe to the show.

51:54 Open your favorite podcast app and search for Python.

51:57 We should be right at the top.

51:58 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

52:03 and the direct RSS feed at /rss on talkpython.fm.

52:07 We're live streaming most of our recordings these days.

52:11 If you want to be part of the show and have your comments featured on the air,

52:14 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

52:19 This is your host, Michael Kennedy.

52:20 Thanks so much for listening.

52:22 I really appreciate it.

52:23 Now get out there and write some Python code.

52:25 I'll see you next time.