Learn Python with Talk Python's 270 hours of courses

#489: Anaconda Toolbox for Excel and more with Peter Wang Transcript

Recorded on Tuesday, Nov 26, 2024.

00:00 Peter Wang has been pushing Python forward since the early days of its data science roots.

00:04 We're lucky to have him back on the show.

00:07 We're going to talk about the Anaconda toolbox for Excel, as well as many other trends and topics that are hot in the Python space right now.

00:14 I'm sure you're going to enjoy listening to the two of us exchange our takes on the topics and trends.

00:19 This is Talk Python to Me, episode 489, recorded November 26, 2024.

00:24 Are you ready for your host, please?

00:27 You're listening to Michael Kennedy on Talk Python to Me.

00:30 Live from Portland, Oregon, and this segment was made with Python.

00:34 Welcome to Talk Python to Me, a weekly podcast on Python.

00:40 This is your host, Michael Kennedy.

00:42 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython, both accounts over at fosstodon.org,

00:50 and keep up with the show and listen to over nine years of episodes at talkpython.fm.

00:56 If you want to be part of our live episodes, you can find the live streams over on YouTube.

01:00 Subscribe to our YouTube channel over at talkpython.fm/youtube and get notified about upcoming shows.

01:06 This episode is brought to you by Sentry.

01:08 Don't let those errors go unnoticed.

01:10 Use Sentry like we do here at Talk Python.

01:12 Sign up at talkpython.fm/sentry.

01:15 And this episode is brought to you by Bluehost.

01:18 Do you need a website fast?

01:19 Get Bluehost.

01:20 Their AI builds your WordPress site in minutes, and their built-in tools optimize your growth.

01:26 Don't wait.

01:27 Visit talkpython.fm/bluehost to get started.

01:30 Peter, welcome back to Talk Python to Me.

01:32 Well, thank you so much for having me.

01:34 It's great to be here.

01:35 It's always good to have you.

01:36 I love your perspective on things.

01:38 It's always very thoughtful and fun, and I don't see why this would be any different.

01:42 We're going to have a great time.

01:43 I'm looking forward to the conversation very much.

01:45 I am as well.

01:46 So let's, you know, you've been on the show a number of times, but let's maybe just have you do a quick introduction for yourself.

01:53 Just for people who don't know you.

01:54 Who are you?

01:55 I am, yes.

01:56 So I am the co-founder at Anaconda.

01:58 I also helped create the PyData movement.

02:02 So really taking the Python, scientific Python numerical ecosystem into the world of data analysis and open source data science.

02:09 I guess 12 years ago we started.

02:11 And I've created a number of open source tools, some more popular than others.

02:17 But really, I think maybe my biggest role and contribution to the ecosystem has been as sort of a community steward or as someone who's been, you know, really trying to help drive commercial adoption and really trying to make sure that there are on ramps for businesses, large scale and small scale, to adopt Python and continue using Python.

02:35 And that really, a lot of my perspective there, even though, of course, many of my colleagues in the open source Python community have spent a lot of time as developers in startups and, you know, big high tech companies.

02:48 One of the things that I learned early on as a consultant was that enterprises have a very, very different approach to adopting technology and open source even now can still be challenging to them.

02:59 So there's a role for some of us to play in that commercialization and advocacy for open source.

03:05 And, you know, 15 years ago, I was advocating for Python in, you know, rooms full of Java and .NET, very, very stodgy and angry Java and .NET architects.

03:13 And now, you know, it's sort of.

03:15 How could you possibly come and recommend a dynamic language for our professional software?

03:21 It doesn't even have types.

03:21 It's got this white space stuff.

03:24 It doesn't.

03:24 Do you have tabs or space?

03:25 I mean, yeah, I lived.

03:26 I mean, I have, you know, a thousand yard stare thinking about some of those meetings sometimes.

03:30 But we prevailed.

03:31 And now we have the other side.

03:33 We have the flip side of the coin, which is that our little technology ecosystem has now grown up.

03:37 It's crossed the chasm.

03:38 And now everyone's using it, even if they don't like it.

03:41 Even if they kind of hate it.

03:43 They have to use it because now it's a thing everyone does use.

03:46 And so conversely, so the flip side of the coin is defending Python, even when it is ubiquitous.

03:52 So it's really interesting having to engage in these different dynamics of the conversation and then also trying to bring my perspective into conversations as the ecosystem matures and the cast of players is different.

04:05 So it's just been, you know, honestly, I just feel a great deal of gratitude towards the community, towards many of the great people I've met.

04:12 People like you, you know, I mean, how the energy it takes to run a series like you've done for as many years as you've done it.

04:20 It is almost 10 years now.

04:21 It's unbelievable.

04:22 It's just such, it's really oftentimes very, very, it's under-recognized work, I would say, as, you know, a community advocate and as someone who elevates so many great voices in the community.

04:33 And so all of us kind of in the Python tribe ecosystem, you know, we're all trying to do our best to kind of make things go, but it's just been really wonderful to see the impact that we've had.

04:44 And I think a lot of the work that we did lay the groundwork for modern machine learning and then, you know, making GPUs accessible and usable.

04:51 And now Python is the language that everyone is doing AI in, you know, whether they're deep into the innards of it or not, they're writing mostly Python code to build these AI tools.

05:00 And that's really been a wonderful and interesting thing to see.

05:02 So anyway, that was a long-winded thing about who am I, but yes, so I am the founder.

05:07 I was a CTO here at Anaconda for a while, sort of founding CTO.

05:10 And then I was CEO for four years.

05:12 And this year I stepped into the chief AI officer role really to try to help us explore and define what is the future of a company that's really been focused on Python coding for data science.

05:23 What does that future look like in a world that is obviously being rapidly transformed by AI?

05:27 Yeah.

05:27 Well, you're leading one of the main data science companies for the primary language for AI in the world.

05:34 So, yeah, it makes a lot of sense.

05:36 That's awesome.

05:37 You know, when we talked, I can't remember how many times ago, but one of the times ago we talked about bringing Python into the enterprise and sort of making it more acceptable for large companies and so on.

05:49 I want to show you a graph here.

05:52 I'm sure you're familiar with the TOB index.

05:54 Yeah, yeah, yeah.

05:55 Many people are.

05:55 But I think there's something interesting here that I noticed.

05:58 It just blew my mind.

06:00 And I just like to get your thoughts on it.

06:01 So a while ago, a couple of years ago, Python became the number one language on the TOB index, right?

06:07 But what's interesting is not just that, but it is more than double the second most popular language, more than double.

06:18 And so it's more than double as popular as C++ was number two, then Java, then C, then C#, then JavaScript.

06:24 The most popular one of those is more than double.

06:26 But here's, that's not the number that's actually most interesting to me.

06:29 The more interesting number is the derivative of that, which is the rate of change.

06:33 Python is growing at almost 9% year over year, where the others are like half a percent, a third of a percent, 1%, negative 3%, you know?

06:43 What do you think about this?

06:45 How's that sound with you?

06:46 Wild, right?

06:47 Well, I mean, you know, I guess, you know, congratulations, I'm sorry.

06:51 Because Python, we haven't decreased the number of warts by 10% year over year, right?

06:57 I mean, more and more people are learning this language, and there are some idiosyncrasies, and there is a complex ecosystem.

07:02 But, you know, the real story behind this is what they're not showing is the denominator.

07:08 And as I recall, TOB is based on a number of things.

07:11 It's a sort of a mixed model, right, of web searches, stack overflow, which I don't know how they adjust that given, you know, modern usage of LLMs and ChatGPT and copilot and things like that.

07:22 But it's a combination of all these different kinds of metrics.

07:25 And I know one of the criticisms of TOB has been that, oh, it doesn't tell the whole story, right?

07:31 But it certainly is a swag.

07:33 And the swag has been Python's been on the uptick, on the rise.

07:37 I think with the advent of, I guess, November, this month marks the two-year anniversary of ChatGPT.

07:44 The number of new people coming in to making software or doing deeper things with code than they could have done before is growing.

07:52 And so I think when we look at a thing like this, you know, this is the fun number.

07:57 So I actually, in my PyCon keynote when I announced PyScript, I sort of asked people this question.

08:01 How many software developers are there in the world?

08:03 What is actually anyone who knows anything about IT in the world?

08:06 Anyone who can, you know, plug in a server, knows how to boot, you know, boot disk, can write a line of SQL code, can, you know, fudge some Python or JavaScript or HTML together.

08:16 Anybody who can do anything related to computers, not stare at the screen dumbfounded.

08:20 What percentage of the population is that?

08:23 And the swag is something like 4% of the population, right?

08:26 Which in a world of 8 billion people, that's a lot of people.

08:29 But that's not a lot of people.

08:31 Imagine a room of 100 people, four of them know how to do anything with computers out of 100, right?

08:36 Maybe one and a half know about farming.

08:39 So that's even more terrifying.

08:40 But we live in a computational world and about 4% of the people of the world's population are involved in IT.

08:46 Now that's not, you know, normalized against developed countries versus developing countries and all this other kind of stuff.

08:51 But at the end of the day, you know, we would admit that there are something on the order of tens of millions of software developers.

08:56 But if we say we're going to open the doors to allow anyone to be asking a question of an interactive coding assistant to then be able to produce software, to maybe write a Kubernetes, configure a Docker file, Kubernetes YAML thing, whatever it is.

09:12 You are now opening the floodgates to a 25X number of people that can do code.

09:19 It won't be great code.

09:20 Actually, sometimes might be better code than a junior dev could do.

09:23 But nonetheless, the point is there was a lot of headroom that we're ignoring.

09:26 And I think a lot of Python's growth can be attributed to that.

09:29 And one of the biggest things that we found going to businesses and talking about using Python, especially for data science, was how many of them did not understand this concept of a data person who would use Python or R to do a data analysis.

09:43 For them, an analyst is someone who used Excel or Tableau.

09:47 And then coders are people who wrote code.

09:50 And they would lump the Python data science Jupyter notebook person in with the Java nerds.

09:56 And those are two very different kinds of people, right?

09:58 So I think this expansion and growth of these people who code Python as a way of getting some stuff done but are not software developers, that's going to push those TOE numbers even higher.

10:09 Yeah, I think so as well.

10:11 It's a great observation.

10:12 And there's been all these learn to code movements, trying to get kids to learn coding and stuff.

10:18 And I do think those are valuable.

10:20 I don't think it means we need 10x the number of software developers.

10:23 But I think having programming skills is awesome.

10:25 But I feel like LLMs and ChatGPT and so on, they're going to make a way bigger impact than if you took a scratch class in fifth grade, you know?

10:34 Yeah.

10:35 Yeah.

10:35 And I, you know, there's a, well, not to get too philosophical on it, I suppose.

10:39 But one of the, oh, can you hear me okay?

10:41 I thought my, how did my, yeah.

10:44 I don't think we need 25x more software engineers, right?

10:48 Just like we teach people math, we don't need 25x more mathematicians.

10:51 And we teach people English, we don't need 25x more authors.

10:54 But everyone should be able to author, to do some math, to think quantitatively about the world.

11:00 And with code, I think the important thing isn't, are you a scratch coder?

11:04 It's do you understand how computers process information?

11:07 And what is happening to your stuff, right?

11:09 And how you're interacting with these things.

11:11 It's really, I think that's kind of the weird moment we're going through where people really have to look at these kind of programming-like skills or knowledge of computers as a literacy or a life skill.

11:25 Again, it's like learning to drive.

11:26 Doesn't mean you're going to be an F1 race car driver.

11:28 There's only a few dozen of those in the world.

11:30 But everyone needs to know how to drive, right?

11:33 So there's something about this that I think is lost when we're teaching.

11:37 I think what happened is in the last 20 years, software development was in such demand.

11:42 And all of those software businesses took off so much that they made so much money.

11:46 So software engineers made so much money.

11:48 So everyone's like, I'm going to learn how to code.

11:50 So I'm going to these really sweet coding jobs.

11:52 And it turns out that's not actually the thing, right?

11:56 It looked like a gold rush, sort of.

11:58 It was absolutely a gold rush.

11:59 Yeah, absolutely.

12:00 Yeah.

12:00 Awesome.

12:01 Well, let's talk a bit about Anaconda.

12:04 First, Justin, maybe, I mean, you talked about your roles changing there, but just give us a sense of what is Anaconda.

12:11 I know there's a lot of people out there listening.

12:13 I'll throw it with another stat that blew me away.

12:16 The PSF JetBrains Python developer survey asks all the respondents, how long have you been programming as a professional software developer in any language, not just Python?

12:27 And the 50% of the respondents said less than two years.

12:30 Right.

12:31 Right.

12:31 So a lot of the people out there listening maybe don't even know what Anaconda is, even though it's very well known within the Python space.

12:38 Give us a bit about what you guys are up to these days.

12:40 Yeah.

12:41 So we started, I think, probably best known as the providers of the eponymous Anaconda distribution and the installers for the various packages.

12:50 And really our bread and butter is building out, you know, the Python packages, maintaining a kind of a curated repository and really selling access to that to commercial businesses that want to actually have a vendor back, vendor supported source of Python packages.

13:07 And for most Python packages, I mean, there's many people in the world who are very happy pip installing off PyPI.

13:12 Things work great.

13:13 But when you get into some of the more numerical or maybe GPU intensive packages, the way it gets built is enormously important and can sometimes be very, very challenging.

13:22 So when we started Anaconda over 10 years ago, it was actually very, very hard to get those pre-built binaries installed.

13:29 Now it's a little bit easier.

13:31 People are rolling a lot of libraries into, you know, these big fat wheels from PyPI.

13:35 And that can work pretty well in some cases.

13:38 In some cases, it fails catastrophically.

13:39 Right.

13:40 But what we do is we provide enterprise support.

13:42 We provide also just a validated and secure open source supply chain around these really important binary artifacts that then become deployed into Docker containers that become then the runtime for machine learning and data processing for, you know, very sensitive customer data and things like that.

13:58 So we have a model that's very similar to what Red Hat did for Linux, right, where you can get Linux for free.

14:04 You can run the free stuff.

14:05 You can go and pick up packages from the free community repositories.

14:07 But if you're really serious about a business runtime environment and you really want to have like, you know, a throat to throttle, so to speak, if something is not quite right, then you go and you go pay Red Hat for the annual subscription.

14:17 And so we have very similar business model around that.

14:19 We made the Conda package manager.

14:22 But to be very clear, the Conda package manager is an open source project.

14:25 So we put that into the open source and have a community around that.

14:29 The Conda Forge community is the package build community around that.

14:33 We host the repositories there, but those are free for everyone to use.

14:35 So we really have been, you know, from a business standpoint, that's a bread and butter working with things like Microsoft Excel and working with things like Snowflake to really embed polished Python runtimes inside those computational environments.

14:49 That's something we started doing a few years back.

14:50 This year, I think for the most part, people's experiences of Anaconda have not changed too much, just Conda packaging, whatever.

14:57 But there are a lot of things kind of happening behind the scenes, right?

15:00 So we did, for instance, the Python Excel stuff, which we'll be talking about later, is a huge thing that we're very excited about.

15:05 But then we've also been pushing forward on our perspectives on AI.

15:08 So we know that a lot of, like I said earlier, a lot of AI is built on Python.

15:12 A lot of AI relies on having the right packages and software alongside the models and whatnot.

15:17 And we have a particular belief that the models will get smaller and more people want to run these models locally or close to their sensors and their data.

15:26 They're not going to want to upload these into the cloud, or it's not going to be feasible because there's a lot of video streams from many different kinds of sensors.

15:31 And they're going to want to have a solid software stack to run these models that they need to update and fine tune.

15:38 And so really looking at ourselves as a provider of that level of the software stack for people, that's kind of what we're orienting towards.

15:47 So we released a product called AI Navigator, which people can go and use to download models.

15:52 We host the model repository.

15:54 So we have a curated set of quantized models that we're hosting.

15:57 And so people who, you know, one of the things that we got from our enterprise customers quite a bit was, hey, we like using your enterprise software package repository.

16:04 We love that it's, you know, has your stamp of security approval on it.

16:08 But do you have anything like that for models?

16:10 Because we just right now download stuff off Hugging Face, and that's kind of okay.

16:14 But like, you know, like we don't really know who quantized this.

16:17 That's running arbitrary code off the internet.

16:19 How could it go wrong?

16:20 And, you know, not to throw shade on any of that, because an open innovation is great.

16:25 But people do feel like, you know, we need something in between there.

16:29 So we have both our AI Navigator, which lets you run models locally.

16:31 It's similar to like if you use like LM Studio or something like that.

16:35 It's a nice way to sort of get all this stuff running locally.

16:36 But then you also get a, we have our model repo, which is kind of right now in an alpha stage, alpha beta stage for people if they want to run that on-prem.

16:47 They can then have an on-prem model repo, and IT can actually have a view on what's getting deployed in various places.

16:53 So it's a very similar kind of model.

16:55 We're just kind of responding to customer needs.

16:56 So there's a couple of products that we're building on the AI side.

17:01 This portion of Talk Python to me is brought to you by Sentry.

17:03 Code breaks.

17:05 It's a fact of life.

17:06 With Sentry, you can fix it faster.

17:08 As I've told you all before, we use Sentry on many of our apps and APIs here at Talk Python.

17:14 I recently used Sentry to help me track down one of the weirdest bugs I've run into in a long time.

17:19 Here's what happened.

17:20 When signing up for our mailing list, it would crash under a non-common execution pass, like situations where someone was already subscribed or entered an invalid email address or something like this.

17:32 The bizarre part was that our logging of that unusual condition itself was crashing.

17:38 How is it possible for our log to crash?

17:41 It's basically a glorified print statement.

17:44 Well, Sentry to the rescue.

17:46 I'm looking at the crash report right now, and I see way more information than you'd expect to find in any log statement.

17:52 And because it's production, debuggers are out of the question.

17:55 I see the traceback, of course, but also the browser version, client OS, server OS, server OS version, whether it's production or Q&A, the email and name of the person signing up.

18:07 That's the person who actually experienced the crash.

18:09 Dictionaries of data on the call stack and so much more.

18:12 What was the problem?

18:13 I initialized the logger with the string info for the level rather than the enumeration dot info, which was an integer based Enum.

18:23 So the logging statement would crash saying that I could not use less than or equal to between strings and ints.

18:29 Crazy town.

18:30 But with Sentry, I captured it, fixed it, and I even helped the user who experienced that crash.

18:37 Don't fly blind.

18:38 Fix code faster with Sentry.

18:40 Create your Sentry account now at talkpython.fm.

18:43 slash Sentry.

18:44 And if you sign up with the code, TALKPYTHON, all capital, no spaces, it's good for two free months of Sentry's business plan, which will give you up to 20 times as many monthly events as well as other features.

18:58 And there's a lot of other exploratory bits of AI.

19:00 So we've added an AI assistant into our hosted cloud notebooks, and we've just actually added GPU support to cloud notebooks.

19:08 So if you go to amicom.com and you go create an account, you have actually access to a hosted cloud notebook solution that also has access to GPUs and has a chat assistant.

19:17 But the thing is that that's not, that can be a local experience as well.

19:20 So if you're running Jupyter locally, you have an in-situ, you know, chat assistant right there.

19:25 And that's been trained, that's our own fine tunes based on how people use Jupyter interactively, which is different than training Jupyter on a corpus of uploaded notebooks.

19:34 Those are actually very different kinds of things.

19:35 So those are kind of the new things we've been working on.

19:37 And then, of course, there's updates to various kinds of technologies.

19:40 Panel, we're doing, you know, dashboard building.

19:42 PyScript continues to get better.

19:44 And so, yeah, I'm probably forgetting a dozen other things.

19:47 But those are kind of the highlights of the things that we've been pushing on.

19:50 Yeah, yeah.

19:51 And you guys also did PyScript, which we'll talk a little bit about later, potentially, if we have some time for it.

19:56 What are your thoughts on this PEP 759 external wheel hosting?

20:01 You refer to wheels as big, heavy, binary things.

20:05 Oh, no, no.

20:06 So to be very clear, wheels can be very lightweight, but they can also be very heavy.

20:11 And in fact, what a lot of people who are not package maintainers, you know, I think most Python users are like,

20:17 the package install part of my workflow is the least interesting and oftentimes the most painful part of my workflow.

20:22 I really don't care.

20:23 I just want it to work, right?

20:24 And unfortunately, you know, that doesn't always go well because you end up having to get dragged into complexity.

20:30 And so we're going to get into some of the complexity here.

20:33 But for those who are interested, I did give a talk at PyBay just a month ago, I think.

20:37 And it's up on YouTube now.

20:39 It's called The Five Demons of Python Packaging.

20:41 And I try to lay out for people why it is that the why is Python so screwed up in terms of the Python packaging environment?

20:48 Can we just do what cargo does?

20:50 Or can we just do what NPM does?

20:51 Or why doesn't Go have this problem?

20:53 Or what, you know, like yada, yada.

20:55 And so I think there's a, I don't want to argue with people, but I want to share my context, right?

21:02 In that context.

21:03 Oh, this is it, right?

21:03 I'll put it in the show notes for people.

21:06 Yeah.

21:06 Great.

21:06 Yeah.

21:07 My context is that there are, if you go to the very beginning of the slide, I have a picture from Gödel Escher Bach, right?

21:13 The cover of the book, Gödel Escher Bach.

21:15 And it shows that this block in the middle looks like the letter G, E, or B, depending on which direction you're shining the light.

21:21 Many people may be familiar with this book and its cover.

21:23 And that's how I think of Python packaging.

21:25 I have the great misfortune of having wandered all five or six dimensions in my career.

21:29 But most people experience only two or three of those dimensions.

21:32 So for their perspective, it could just be this simple.

21:34 Or why don't I just do that?

21:35 I just homebrew my dependencies.

21:37 Or I just use RPM to install these base libraries.

21:40 And then I pip install these other things in a Docker.

21:42 And I'm done, right?

21:43 But what we have to do as maintainers and also as people who kind of cater to a broad tens of millions large user land is we have to account for all six of those dimensions.

21:53 And sometimes they're not obvious to people, right?

21:55 So in the case of wheels, to come back to the discussion about wheels and Barry's proposal there, the issue that many maintainers have with wheels, especially if they're maintainers of libraries that have a lot of C dependencies, is that you're stuck between a devil and a rock and a hard place.

22:14 And so you have to decide, do I bundle all these dependencies in?

22:17 In which case, I own the compatibility issues and I own the security issues that come with having bundled in a bunch of C libraries or C++ libraries.

22:27 If I don't bundle them in, then users are kind of stuck trying to install this thing, right?

22:32 And so what we find is that for some libraries, it's okay.

22:36 For other libraries, it's very difficult.

22:39 And the Python ecosystem, I think, ultimately still does not have a great solution to this.

22:43 And wheels, as you know, the way that the PyPA and the pip maintainers and other people we've talked to over the years, they really want to think of the packaging problem as a Python packaging problem.

22:55 And with Conda, because it was built, it was purpose built for the scientific Python ecosystem, we knew from the outset that our problem was bigger than Python.

23:03 In fact, most of our problem was packaging stuff that wasn't Python.

23:06 The Python bytess were trivial almost, right?

23:09 But it's the, you know, what version of Clang?

23:11 What C++ ABI do you use?

23:13 What version of this do you link?

23:15 And what flags do you turn on when you invoke GCC here?

23:17 When you go and you compile something and all of that, not only are you including all the C libraries,

23:22 you're also inheriting all of the design, sorry, the configuration options of the build tool chain at compile time.

23:30 It's an enormously complex thing that ends up getting wrapped up into a single wheel, right?

23:35 So with Conda, these are represented explicit dependencies with feature flags.

23:39 And you can then sort of do this package sort of dependency solver, which of course is, you know, good and bad.

23:44 If it's, you know, the ecosystem is too big, the solve takes too long.

23:47 But then if you don't do that solve, then you just run into stochastic and irreproducible installations.

23:53 So what do you do, right?

23:53 So this proposal, but specifically about this proposal that Barry's put out there, this is, I think, meant to address,

24:00 and he works now at NVIDIA.

24:02 And people who've had to deal with GPU packages know that the CUDA libraries can be quite large.

24:07 They get pretty big.

24:08 And they are somewhat of a tax on the hosting infrastructure, on the CDN there for PyPI.

24:13 And so the limits that PyPI has imposed makes it difficult for them to kind of have increasingly larger packages and all these like nightly bills that they want to upload.

24:22 So this proposal is, can we split this out?

24:25 These RIM files.

24:26 And so we put a placeholder in place on PyPI, and then put the actual package over here, right?

24:30 Right.

24:31 If NVIDIA wants to ship a 400 meg pip installable package, let them host it.

24:36 Right.

24:37 Something like that, right?

24:38 That's right.

24:38 And so this gets to one of the demons I mentioned, which is that a repository is not the same as a distribution,

24:45 and it's not the same as an FTP site or a Dropbox.

24:48 These are actually very different things.

24:50 And I have not actually commented officially on this on the discourse at all.

24:55 I've had some private conversations with people, but I might as well, since you asked.

24:59 My thought is that this is not a bad idea, but the challenge is we are going to go down a path, essentially,

25:07 of having to solve the cap theorem.

25:09 Okay.

25:10 What I mean by that is if you view PyPI, like what is PyPI?

25:13 Is it a Dropbox where anyone could just upload whatever?

25:16 Okay.

25:17 But the problem there is if someone installs something today and they try to install again tomorrow or two weeks from now,

25:23 they're going to generally want the same kinds of things.

25:25 And if it's just a Dropbox, then no one is responsible for providing a stable snapshot of what points to what

25:32 and what is what even.

25:34 Because in the space of even an hour or two on something as large as PyPI, you can have updates.

25:40 You can have updates of versions of the different build strings.

25:42 All people can yank a file because, oh my God, I bought the build on that and so on and so forth.

25:47 So if you view it just as a Dropbox, that's okay.

25:49 But then all the work of consistency and reproducibility gets foisted onto the install tools, the client-side tools.

25:56 Whether it's UV, whether it's PIP, whether it's Poetry, PyM, you name it.

26:01 Hatch, PDM, Poetry, all these wonderful tools.

26:03 Anaconda, too.

26:04 I should put that out there, right?

26:05 Everyone is trying to solve this problem.

26:07 And the question is, is the repository host going to solve the atomic sort of state view?

26:13 Or is the client going to have to piece it together somehow?

26:15 And then the reason I mentioned the cap theorem is if you say, okay, we should pretend that it's somewhat of a stable repository, not just a Dropbox.

26:23 But as a repository, that means that there's a snapshot of things at a state in time, the metadata.

26:28 Well, okay.

26:29 But if you allow for some of these packages to be hosted over here, now you have a distributed database.

26:34 You have a distributed object database.

26:36 Massively distributed.

26:37 Massively distributed.

26:39 And NVIDIA, obviously, as a trillion-dollar company, can keep up a CDN.

26:44 Okay?

26:44 But they cannot avoid net splits.

26:46 They cannot avoid a backhoe in Reston nuking a fiber line somewhere, right?

26:50 You cannot avoid the kinds of things that happen that cause a distributed network database to suffer consistency problems.

26:59 So now someone building a Docker.

27:01 And when you're looking at something as important as the CUDA-oriented libraries, the RAPID libraries, these, like, really important foundational GPU libraries, people are pulling these to the tune of thousands and thousands per second.

27:12 So you have got to make sure you're consistent all the time.

27:14 Otherwise, someone's production Docker build falls back on an old version because they couldn't get the most recent one.

27:20 Or a time, I mean, in the best case, it times out.

27:22 It complains loudly.

27:22 The worst case, it fails silently and falls back.

27:25 And whatever client-side tool they're using then says, well, I can't get that, so I'm going to solve this way.

27:29 I'm going to solve this other thing over here.

27:30 They pick up an old version, and now you have, whatever reason, your model build deployment doesn't work, right?

27:36 Right.

27:36 Maybe you picked up a regression equivalent of a bug that you know is fixed in the new one.

27:41 You asked for the new one, but you didn't get the new one.

27:43 You didn't get the new one.

27:43 And so the point is, the reason I cite the cap theorem is at some point, a package hosting site like this has to decide, is it just a Dropbox?

27:52 Is it a repository?

27:53 Right.

27:53 And if it's the latter, then it has to offer an atomic and consistent, verifiable view of the metadata.

28:01 That's what with Conda, we have a thing called RepoData.json, and it is a big fat file, and it is basically the snapshot of the repository, and it is transactional.

28:10 So you get the whole thing, and then you can do the solve and say, I want these pieces, and then you go get those pieces.

28:15 So from a design standpoint, architect standpoint, it's not that complicated, but it is a different, it's a design difference.

28:21 Then when you think about pip installing off PyPI, it is eager and opportunistic, let's say, right?

28:27 So it goes, tries to install a thing, it grabs that, looks through it and says, oh, I need these things.

28:31 It goes and grabs those things.

28:32 And so over the course of a multi-second long or maybe in a multi-minute long install, you don't actually get a consistent view of the repository.

28:40 And so with this, this now says, really stresses the question.

28:46 I think what the RIM proposal does, it really forces the question of, is PyPI a repository or is it a hosting site?

28:54 Like Sunsite thought, yeah, FTP at Sunsite.edu back in the day, and you just go grab whatever and good luck.

28:59 And all of that complexity of consistency is then the responsibility of the client side tools to manage.

29:04 That is the design decision you can make, right?

29:06 Yeah, I hadn't even thought about it from a reproducibility science or even just old applications perspective.

29:12 That's pretty interesting to think about.

29:14 If I read this, Pep, I don't recall if it talked about a backup that would allow the PyPA to create some kind of snapshot that says, even if those mirrors went away, this is the reality of things, right?

29:28 I mean, it's one thing to say, well, let's let NVIDIA handle the 100 megabyte download request a thousand times a second.

29:34 But we got a copy of it, and if that fails, we can switch over to our version versus let's let them hold that part.

29:42 And if it fails, it's just gone.

29:44 And there's nothing but an empty RIM.

29:46 There's no wheel.

29:46 Right.

29:47 But consider this.

29:49 If I discover a zero day in one of those things, and by the way, these things are low-level libraries that touch kernel space things and things in the driver, and they touch memory, and there's all these things.

30:00 And there are real implications on this kind of stuff.

30:05 So let's say there's a zero day where I have some hack, and NVIDIA, because they were notified of it, they patched it.

30:10 But you have a fallback.

30:11 And if I somehow managed to block your ability to resolve or to access that, now you're still picking up the old stuff.

30:18 And you're building those into your Docker, right?

30:20 So it's not just availability.

30:22 This is why it's not just availability and fallback.

30:25 It is consistency.

30:26 So that when you push an update, it forces an update, right?

30:29 Right.

30:29 So when I said I thought there'd be some kind of backup fallback, in my mind, I don't even know that this isn't the pepper that it's addressed at all.

30:37 It maybe, I don't recall that it is, is before PyPI allows it to be listed as a new version, they've downloaded a local copy, and they've got the new one.

30:45 They're not going to serve it to you, but they have it.

30:47 You know what I mean?

30:48 Yeah.

30:48 It's going to be interesting to see if this gets accepted and what implications there are.

30:53 But on the other hand, the flip side is it's also scary that pip and PyPI have over $100,000 of donated bandwidth.

31:01 And if that donation ever stops, all of a sudden, out of the blue, we'd also be in a world of hurt, right?

31:06 And so if we could distribute that risk a little, which I think is the purpose of this, Pep, that's helpful.

31:12 Yeah.

31:12 Yeah.

31:13 I think it would be interesting to solve that problem.

31:15 I like to solve technical problems at the technical level and economic problems at the economic level, right?

31:22 And so conflating these two can lead to, I think, it muddles things a bit.

31:28 So the question is, what is a good design for a distributed, high-performance, but consistent and secure repository?

31:37 Well, that is traditional CDN.

31:41 I mean, it's not really that hard of a problem.

31:42 It's been solved many times by many different kinds of people, right?

31:45 Whether it's Windows update, Apple update, whether it's like what we do with Conda.

31:48 You put a CDN in front of a consistent sort of index, and you force that kind of consistent index, and you do that.

31:56 And that's okay.

31:56 That's what you do.

31:57 Now, how you fund that is a different question, right?

32:00 And so it could be we fund.

32:02 We just put all the money into the PyPA or some PSF delegate organization that goes and just runs a packaging thing, some consortium that people donate into.

32:12 I mean, $100,000 a month.

32:14 Yeah, I mean, look, the Conda repos cost more than that.

32:18 We ship a lot of binaries, right?

32:20 So I think the bandwidth costs maybe more than what you're citing, actually, for PyPI because it's much more popular in Conda.

32:26 But at the end of the day, it is something on the order of a few million dollars a year, I think, in CDN equivalent costs.

32:33 And when you look at how many people use it in the world and depend on it, there is that budget available in the world.

32:39 So the fact that we as a community don't know how to talk to the businesses that depend on it to get those dollars, that's our problem.

32:47 That's not really because businesses spend more than that on marketing parties, launch parties for random crap, right?

32:53 Yeah.

32:54 We have infrastructure that powers the world, and we as a volunteer kind of community of people who love Python can't figure out how to have a conversation with them to get a few more million dollars out of them.

33:02 That's our problem.

33:03 How do we get better at that?

33:04 Yeah.

33:05 Yeah, for sure.

33:06 Well, let's just go back to BitTorrent.

33:07 All right.

33:10 Speaking of companies that have some money to work with, let's talk Excel.

33:14 Let's talk about Excel.

33:16 Yeah.

33:17 Literally, it's the tool that probably processes the most financial numbers in the world, you know?

33:22 You know, it is ironic that you mentioned this now because you brought up TOB, and I would say Excel is the dominant programming environment in the world.

33:30 Yeah.

33:30 And Excel didn't make the list.

33:31 Didn't make the list.

33:33 Didn't make the list.

33:34 Maybe it should.

33:34 I should go talk to the TOB editors.

33:36 If it's not the most popular programming thing, it probably is the most widely deployed database.

33:43 Yes.

33:44 It is the world's most popular database, the world's most popular computational environment, the world's most popular programming environment.

33:49 Yeah.

33:49 Because it actually was the last one of the things from the 70s and 80s with, like, end user computing and programming for everyone.

33:58 This was one of the last great tools, you know, the spreadsheet.

34:02 So normal people, muggles, could approach this and put computational things together.

34:08 This portion of Talk Python to Me is brought to you by Bluehost.

34:11 Got ideas, but no idea how to build a website?

34:15 Get Bluehost.

34:16 With their AI design tool, you can quickly generate a high-quality, fast-loading WordPress site instantly.

34:22 Once you've nailed the look, just hit enter and your site goes live.

34:26 It's really that simple.

34:27 And it doesn't matter whether you're a hobbyist, entrepreneur, or just starting your side hustle.

34:31 Bluehost has you covered with built-in marketing and e-commerce tools to help you grow and scale your website for the long haul.

34:38 Since you're listening to my show, you probably know Python.

34:41 But sometimes it's better to focus on what you're creating rather than a custom-built website and add another month until you launch your idea.

34:48 When you upgrade to Bluehost Cloud, you get 100% uptime and 24-7 support to ensure your site stays online through heavy traffic.

34:58 Bluehost really makes building your dream website easier than ever.

35:01 So what's stopping you?

35:02 You've already got the vision.

35:03 Make it real.

35:04 Visit talkpython.fm/bluehost right now and get started today.

35:09 And thank you to Bluehost for supporting the show.

35:12 But it has limited expressiveness.

35:15 Well, I don't mean that in a mean way, but just to say that people, you know, you don't really want to see hundreds of lines of Excel.

35:22 Like, that's kind of a terrifying thing, right?

35:23 Because it is a very much an immediate mode data transformation kind of language.

35:28 And so the Anaconda Toolbox for Excel is there to complement the Python support that just recently got added to Excel.

35:36 And that's GA now.

35:37 Yeah, I was going to say, let's start there, actually.

35:39 Sure, sure.

35:40 Because why does this exist?

35:41 Didn't Microsoft just add Python to Excel?

35:44 Yes.

35:45 So that's worth talking about, right?

35:47 So Excel on Windows has support for Python in the formula bar.

35:53 So if you have Windows and you run Excel, you can type equals PY in the formula bar and start punching in Python code, multiple line Python code.

36:03 You can refer to cell ranges, and it will spill your data frames into the grid.

36:07 You can chart things.

36:09 It's insane that we're here.

36:11 Like, it's amazing that works, okay?

36:12 Now, the way that works is it analyzes your code.

36:16 It actually looks at the range references and the cell reference and everything else, pulls all that data and your code, and it sends it over the wire to Azure Confidential Compute, which is an extra secure, extra lockdown version of Azure.

36:27 And it runs the code there on a customized Anaconda environment that we built for that.

36:32 So you have access to a lot of the data tools that you would expect, pandas, matplotlib, things like that.

36:37 So you do all that stuff, and then it comes back, and then it sort of renders in the grid.

36:41 So the computational environment doesn't have access to your full spreadsheet, and it also doesn't have access to the internet.

36:50 So there's no risk of data leakage and loss and things like that or tampering.

36:54 So in any case...

36:55 It probably also doesn't have access to your local file system if you've got other things to pull in or you want to write a Python file and import it as a module.

37:02 None of that, right?

37:03 Exactly.

37:03 Exactly.

37:04 And you've served as a wonderful straight man for me here because that is a bit of a limitation.

37:09 So the pluses are, number one, it's built in, and it runs the full-on Anaconda install on the Azure Confidential Compute.

37:19 So you have access to the bona fide packages and whatnot for doing data analysis.

37:23 The downside is there's no access to the internet.

37:25 There's no access to your local file system, things like that.

37:27 You have to sort of load those into Excel and then punch that data over.

37:30 So to complement that, we created this thing that we call the Anaconda Toolbox.

37:34 And so that is actually a WebAssembly-based plugin for Microsoft Excel, and it runs on the Mac version of Excel as well, and it runs locally.

37:48 So it is an Anaconda WebAssembly sort of environment hosted inside a plugin for Excel.

37:55 So that then has sort of richer access to things, but it is limited in that only those libraries that we cross-compiled into WebAssembly can work in there.

38:06 That being said, you can use sort of a pip install, install pure Python packages into that environment.

38:13 And you can also then, you know, one of the interesting things that we heard from people was that they, look, they could be a Microsoft shop.

38:22 They trust, of course, they trust Azure.

38:23 They trust Microsoft.

38:25 But they just have an IT policy where they're not allowed to sort of, they don't want the risk at all of the data leaving their local machine.

38:31 And this would run everything local.

38:33 It's just running everything local on your laptop or on your workstation directly inside the grid.

38:38 And we're working, I mean, the Excel team knows we're doing this.

38:42 They like seeing some things we're doing here.

38:44 And the scope here is that actually with this, we can explore features and run a little bit ahead of where the built-in Excel capability is.

38:51 So they have, they sort of have a way to sort of almost not quite A-B test, but see what kind of features are good and what features are not as interesting.

38:59 And they can then eventually over time, I think, roll those into the mainline Excel support.

39:03 Because when you have a product like Excel that's relied upon by so many people, you actually are limited in how fast you can move and in what things you can roll in.

39:10 Because once you put a thing in, you really can't take it out, right?

39:13 It's almost like a programming language, yeah.

39:15 It's a really important thing that people make billion-dollar decisions based on the values popping out of those cells.

39:19 So it's really important that that team is able to focus on the stability and have a very refined user experience.

39:26 For us, we can move a little faster, try different kinds of things.

39:29 So one of the things that we're really exploring and one of the things I'm hoping will come out of this Excel work is that we will see much more cross-team collaboration between data science kind of folks and business analysts and other kind of stakeholders.

39:43 who may themselves not be writing Python or not be very familiar with Python.

39:46 But using our toolbox, one of the things we have in there are simple ways to share code snippets and simple ways to share new kinds of data sources.

39:56 And to have really a lot of ways to make simple dashboards and visualizations that then essentially turn your Excel spreadsheet from not just a database and a computational environment, but into a full-on application deployment environment.

40:11 So for data scientists, you know, one thing I've heard a lot over the years from data science folks is that they turn into service providers for their teams where they're just being asked for new Excel outputs every single week.

40:24 Or every couple of weeks, they come back around and say, hey, that was great what you did with all your magic code stuff.

40:29 But can you rerun that analysis?

40:31 But this way and email me the, you know, email me the CSV of the spreadsheet.

40:36 And so with this, you can now deploy your code forward into a spreadsheet.

40:41 That's then the live forward deployed thing.

40:43 It can pull from a Jupyter notebook that you have running and hosted.

40:47 And then you actually are using this to deploy a live version of a self-service data environment analysis environment for your business stakeholders.

40:56 So create almost an API for your data that Excel is the front end.

41:00 You turn your notebooks into APIs and then your end users are using Excel, which is their native environment, as a way to hit that API to generate really quick visualizations.

41:10 We have a built-in, you know, LLM chat.

41:12 So you can ask for new kinds of this and tweak plots this way and that way.

41:15 And when you like what you've got, you can share it as a snippet to other people.

41:18 So that's a really powerful modality.

41:20 We believe that's a very powerful modality.

41:22 But this is the kind of feature which we want to get out there.

41:25 A lot of people play with it.

41:27 We'll tweak it.

41:28 And then once it gets really dialed in, then that's something which the Excel team might look at and say, oh, yeah, that's kind of what it should look like.

41:34 Now let's go and build this into Excel itself.

41:36 Let's go build this into native SharePoint and Outlook and other kinds of, you know, things.

41:41 So anyway, that's kind of the point of it.

41:44 But we're really super excited about all the kinds of innovation that can happen in this toolbox.

41:49 But we really rely on end users, right, to give us that kind of feedback.

41:53 What do you do at the notebook level to make it hosted and accessible as an API?

41:58 Is that just straight Jupyter or is there something more interesting?

42:01 Yeah, no, it's straight Jupyter.

42:02 But if you hosted that, and so all of this sort of to get access to the toolbox, you need to just create an Anaconda.

42:07 It's a free Anaconda cloud account.

42:08 And then the notebooks are going to be the ones hosted in your Anaconda cloud account.

42:14 And then the snippets are also there.

42:16 So everything is sort of shared through your Anaconda account, right?

42:19 And so then your friends who are using this can just reference that and drop that in.

42:23 And then they're able to use the code.

42:25 Yeah, that sounds pretty awesome.

42:26 Are you able to, you know, you said it's based on WebAssembly.

42:29 Are you able to create your own packages as WebAssembly things or pip install things?

42:35 You can pip install things.

42:37 Yes, you can pip install things.

42:38 Absolutely.

42:39 That's the, that's also the power of this is that you can kind of go a little nuts.

42:42 It is a little bit Wild West.

42:43 So definitely use with caution.

42:45 There are a lot of foot guns, but, but again, as a plugin, we can do more of that exploration

42:49 and find out where, where people, where the comfort level is, both for individual users,

42:54 as well as for organizations.

42:55 Right.

42:56 And, and we do have, I do want to plug that we have a webinar coming up.

43:01 It'll be exactly when it is, but we have a webinar coming up.

43:05 around using, using the assistant and all that.

43:10 And I should have been prepared with that information before mentioning it.

43:15 No worries.

43:15 I'll tell you what, let's put it in the show notes.

43:17 Send it to me.

43:18 Put the show notes.

43:19 Yeah.

43:19 Yeah.

43:19 Yeah.

43:19 Yeah.

43:20 Awesome.

43:20 So a question from Dennis in the audience, someone who it sounds like is really in the

43:26 know for these things, asking this sort of question.

43:28 Yes.

43:28 Okay.

43:29 So Anaconda Toolbox.

43:30 Let me read it for the people who are not watching.

43:32 Anaconda Toolbox for Excel, which kind of licenses?

43:35 That are needed to make use of this combination in an O365 environment.

43:39 Right.

43:39 So Office 365 right now, right?

43:41 Doesn't support the, the built-in Excel stuff.

43:44 Cause that's again, just for Excel on windows.

43:46 And so you can use Anaconda Toolbox inside there and the toolbox.

43:51 It is it's free, free to use.

43:53 We will have sort of some upgraded and premium features built into that, but you just need

43:57 to create an Anaconda cloud login or create a cloud account.

44:00 And then you should be able to use that.

44:01 So if you just have any version of Excel, you don't need things special.

44:05 The Office 365 version of Excel, right?

44:07 That's like the web.

44:07 That's the, the web hosted.

44:09 I see.

44:10 Got it.

44:10 Got it.

44:11 Got it.

44:11 Got it.

44:11 And that's why it's.

44:12 I'm assuming, I'm assuming that's, that's what that's referring to.

44:14 I may be misunderstanding the question, but hopefully that made sense for Dennis.

44:17 Yeah.

44:18 I'm not, I'm not sure.

44:19 There's a lot going on with licensing and 365 for sure.

44:22 I bet there's people, I'm sure there are people whose job it is to just know the licensing.

44:26 Oh, sure.

44:27 Yeah.

44:27 And large organizations of a hundred thousand, you know, Windows users.

44:30 One thing to be clear is that for people who are in a corporate environment, your administrators,

44:35 your IT administrators will have to, it depends on what update cycle they're on, that they take

44:40 updates to Office and Excel.

44:42 So even though the, the Excel for Python or Python for in Excel feature is generally available.

44:48 Now your internal corporate environment may not be picking up that feature until the next

44:52 kind of iteration cycle when they, you know, the refresh cycle they do internally.

44:56 So that's just a caveat that I will put up there.

44:58 Okay.

44:58 And I may have missed this when you're talking about it, but do people need to have the Microsoft

45:03 Python Excel support to work with your toolbox?

45:06 You know, I, that's a question I should know the answer for.

45:09 I'm assuming I'm going to say yes, because you should have that support anyway, because

45:14 you should be using that.

45:15 But, oh, no, no.

45:16 I mean, actually, if you're on Windows, you should do it, but you don't have to have it

45:20 because it is a plugin that works on Mac and it works on the web as well, on, on Excel,

45:24 Office 365 on the web.

45:26 So it runs kind of independently, but we tried to make it as compatible as possible.

45:31 And one, one thing I forgot to mention really huge feature is that you can build your own

45:35 UDFs, right?

45:37 Like your ability to write a new function and then have that be available.

45:43 You can publish that as a function that isn't available in the, in, in the built-in list

45:48 of Excel functions.

45:49 And you can run that on a row and just drag and applies to everything in a row.

45:53 Like it's a very nice way to do some custom functions.

45:55 Yeah.

45:56 Okay.

45:56 It sounds really neat.

45:58 And I think WebAssembly gives you a lot of flexibility, especially since there's, as you

46:02 said, the web version.

46:04 Is that WebAssembly version based on PyOxid?

46:08 PyOxide?

46:09 PyOxid?

46:09 Yep.

46:10 PyOxid.

46:10 Sorry, I'm mixing these together.

46:12 PyOxide or PyScript or like you're, I know you did a lot of work.

46:15 Yes.

46:15 It's PyScript and PyOxid, I believe.

46:17 I don't know exactly which portions are which.

46:20 It definitely relies on PyOxid for the WebAssembly runtime because we're using some of those

46:25 data packages, which are built against PyOxid.

46:27 Okay.

46:28 Yeah.

46:28 That's really great.

46:29 So what do you see people doing?

46:30 I mean, you described some scenarios, but you got any interesting stories of what some things

46:34 people are able to accomplish with this?

46:35 Oh, well, so not the toolbox specifically, but with the Python for Excel support.

46:43 Last week, we were in Chicago, Microsoft Ignite, and a gentleman from KPMG was talking about

46:49 how he turned hundreds of lines of very involved VBA for doing this tax preparation analysis.

46:58 He turned it into just a few dozen lines of Python code.

47:02 Right.

47:04 And so I think that speaks to the power of this in that what I hope to see is that many

47:14 more people in the Excel environment are able to use the great tools we have for data transformation

47:20 in Python.

47:20 But more importantly, that it lowers and reduces the friction between the data science

47:27 teams and their business stakeholders.

47:28 I think data scientists get trapped a lot in becoming kind of the data monkeys for folks.

47:33 Right.

47:33 This Excel spreadsheet is awesome.

47:35 Could you generate me one with the new data from today?

47:37 Because now we've got more traffic.

47:38 Right.

47:38 Or we run the thing or whatever else.

47:40 And what I hope actually that maybe the deeper, more transformational thing is I hope it creates

47:47 more interesting spreadsheets that causes all of these business users to be thinking, to have

47:53 their eyes open about how much more interesting ways there are to think about their quantifiable

47:58 problems, to run what if analyses, to slide a slider bar and say, okay, I want to see this

48:05 weekly, monthly, yearly.

48:06 I want to align to, you know, week starts or I want to start to calendar years or whatever

48:11 else.

48:11 Simple snapshotting kinds of things, which when you are, I think, a very, very experienced

48:16 Excel person, you know you can do it, but it is a very involved process.

48:20 It's like writing machine code.

48:22 Right.

48:22 It's very involved.

48:23 And so with this, you can just say, oh, with three lines of Python, I've now transformed

48:29 my data in bulk.

48:30 It's vector oriented and array oriented, data frame oriented.

48:33 And I think the more people think about that and the more they realize how easy it is to

48:37 do pivot tables with just switching two or three parameters in a couple of lines of pandas

48:42 or polar's code, like mind blown.

48:44 Right.

48:45 As opposed to clicking around on UIs all the time.

48:47 So I'm really.

48:48 I think it's going to be a gateway.

48:49 Yeah.

48:49 I'm really hoping.

48:50 Yeah.

48:51 Yeah.

48:51 I think the way that Python has kind of brought people who didn't see themselves as data

48:55 scientists into the data science realm and eventually they're like, but I'm just going

48:59 to work in notebooks instead of my other tools or whatever.

49:02 I feel like this could have a similar effect.

49:05 You're like, this is really cool.

49:07 I didn't know we could do this.

49:08 And then, so this is all Python.

49:10 Yeah.

49:10 That's all Python in the back end.

49:11 Well, maybe we should just do that directly.

49:14 You know what I mean?

49:14 Sort of work, get their feet wet and move, move into that direction.

49:18 So, yeah, yeah, I'm trying to get my folks here to get those links across to you.

49:24 So you can put those in the show notes.

49:25 Oh, perfect.

49:25 Yeah.

49:25 So I think it's going to have that kind of interesting effect.

49:29 The reason I brought up, I brought up this eight of the biggest Excel mistakes of all time.

49:33 There's all these different articles of, right.

49:36 You know, weird, weird things because, you know, for example, in my five bugged the wrong

49:41 phones because there was some formatting error that auto filled or, you know, we had to change

49:46 the name of a gene because it kept getting marked and all this random craziness, right?

49:51 Right, right, right, right.

49:52 So I think having Python available, because to me, Excel feels like the world's most insane

49:59 non-visible set of go-to statements.

50:01 I go here, then I go to that cell, then I go to here, then I go over there, and then we do

50:05 this.

50:05 And then, but you can't look at the spreadsheet and know that that's what it's doing.

50:10 Right.

50:10 But with Python in there, you have a much more structured way of thinking about your

50:16 data.

50:16 You're not forced into 17 interconnected go-to statements.

50:20 Yeah.

50:20 And you reminded me of the thing, which I, yeah, I said before, I just forgot to mention that

50:24 if you put more of the business logic into the Python code, well, guess what?

50:28 You can check in Python code.

50:30 Yeah.

50:30 You can diff Python code, and it uses variable names and variable references.

50:34 And if you do a little bit of hygienic approach to this, you set your variable references to

50:39 cell range to the top, then you manipulate the variables, and then you put the outputs down

50:42 there.

50:43 And now you really have very clean business logic you can follow.

50:47 Not to say that Python doesn't have its warts, of course.

50:50 I'm not some like, you know, whatever.

50:51 I see, I know how complex a Jupyter Notebook can get, for instance.

50:55 But at the end of the day, you're absolutely right that Excel is at the same time, both

51:00 so functional, right?

51:01 It is a data flow language.

51:02 It is immediate mode, dependency graph oriented, you know, in the grid.

51:07 A lot of good things about it.

51:08 Reactive, live, a lot of great things about that model for normals, for normies, for muggles,

51:14 whatever you want to call them.

51:14 People who are not imperative iterated state machine coders like we are, right?

51:19 But there's a place for that too, right?

51:22 Because you're absolutely right.

51:23 People will have these like weird linkages between different bits and pieces when what

51:27 they really wanted was a couple of named references and then some iterative logic or imperative

51:33 logic on those things with a branch statement somewhere.

51:35 You know, a couple of for loops, nif then statement can go a long way to simplifying very complicated

51:41 data flows.

51:42 Yeah.

51:42 That and vector math rather than copy the formula, drag it down 20,000 rows.

51:48 Right.

51:49 Why is this?

51:49 They warn you now.

51:50 At least they warn you again when like you're off by one and you forgot your minute of value.

51:54 There's like a little yellow triangle.

51:55 Okay.

51:55 That's good.

51:56 That's good.

51:56 Yeah.

51:57 Nice.

51:59 All right.

52:00 Well, let's, you know, let's talk about a couple other things while you hear that you

52:04 all are into.

52:06 We just, there we go.

52:08 We just had Fabio drop in and he did a lot of the work.

52:14 I've had him on the show at least once, maybe twice to talk about PyScript.

52:18 Do you want to give us a PyScript update?

52:19 What is it for people who don't know?

52:21 I mean, they've been just, yeah.

52:21 They've been.

52:22 Micro Python, by the way, this is, this is huge right here.

52:25 Yeah.

52:25 Like it just works.

52:26 Right.

52:27 Yeah.

52:27 Tell us about it.

52:28 Well, so when we first released PyScript, it was, it was a sort of proof of concept.

52:33 Like, oh my God, you can, with doing some like interesting things, you can now just be writing

52:37 bracket, Py dash script inside HTML and be executing first class.

52:41 But we were running on Pyodide, which is a full on CPython build.

52:46 And it can then import things like NumPy and Matplotlib.

52:49 But, but it was kind of heavy to load.

52:52 It was a lot.

52:53 Right.

52:53 It was more designed for things like Jupyter light.

52:57 These were like, we're going to load up a big notebook and we're going to do awesome.

53:00 Awesome.

53:01 It's zero install.

53:02 Right.

53:02 It's, it's like you literally can just run everything in the browser without installing

53:06 anything because it's really downloading everything into the browser.

53:08 Right.

53:09 Yeah.

53:09 But it's, which is fine if you just start up once, but if you're trying to build a consumer

53:14 facing web app, it might be way too heavy.

53:16 Exactly.

53:16 It's not really a replacement for that.

53:18 So what we did most recently, number one, we added MicroPython support.

53:22 MicroPython is less than 80 kilobytes or so, no, maybe 300 kilobytes.

53:26 Now PyScript and MicroPython together is like 300 kilobytes and it starts in under a second.

53:31 So now you really can use Python instead of JavaScript to go and do things in the DOM.

53:36 So we have full DOM support.

53:38 You know, there's a native web API to manipulate the DOM and it supports web workers and async and

53:44 blocking calls and native storage.

53:46 So you can actually use the MicroPython bit and that's where the bracket MPY dash script and

53:53 that loads at the MicroPython environment and you can now be then loading the full Pyodide

53:58 PyScript in the backend and as an asynchronous sort of web worker.

54:02 So it's really an interesting thing that is, you can do a lot of interesting things with

54:07 it now, I think.

54:08 And we have experimental support for R. So we've not left our friends in the data science community

54:13 behind.

54:14 So that'll be great.

54:15 And, you know, we have some basic LLM and AI demos and there's people working on running

54:21 this on microcontrollers.

54:23 And so you can go manipulate, you know, embedded devices and everything.

54:28 So it's just a fun, fun thing.

54:29 Community is building interesting things at PyCon.

54:31 Lukash gave a really interesting talk, show how to do like 3D, like WebGL kind of stuff

54:36 with micro or with PyScript.

54:39 So I would say just, yeah, we're, we're sort of right now, you know, we have PyScript.com

54:44 where if you want to build things and share them more easily, that's great.

54:46 But if you are just want to use this as a technology, you totally should.

54:50 He's like, just take it and play with it and make little things with it.

54:53 Now there's a weekly calls, I think community calls that we do just join

54:57 and hop in on the discord and just get involved.

54:59 I think it's a really fun thing.

55:00 Yeah.

55:01 I really, I'm very excited about it.

55:02 I would love to see something like a view or a react just where just here's your front

55:08 end and you get to write it in this language you all love and already know.

55:11 I'm not necessarily hating on JavaScript, but it's not.

55:14 It's okay to hate on JavaScript.

55:15 Yeah.

55:16 I'm not intent.

55:17 My thought more is you're already working in one language.

55:21 Why do I need to do a second language just because of the place in which that language

55:25 executes is so dramatically limited, right?

55:28 It's the whole reason that node became a thing is we're already doing JavaScript.

55:32 Can we just keep doing JavaScript?

55:33 That's right.

55:34 Well, for Python people, it's just reverse that, right?

55:37 Right.

55:37 Right.

55:37 And there's a lot of really nice frameworks for front end stuff.

55:41 And the nice thing about PyScript is that it is pretty easy to use the JavaScript bridge

55:45 to call to all of those things, right?

55:46 What we don't have is a Python wrapper around those things.

55:49 So you can keep staying in the Python language.

55:51 Yes, exactly.

55:51 You have to use little proxy objects.

55:52 But for the most part, it's pretty nice to, like some on the examples, the 3JS examples,

55:58 some of these other things show how you can manipulate the JavaScript objects directly.

56:02 And we're just looking for more people to add interesting things to it.

56:06 But I agree with you.

56:07 Something that is like a Vue.js or React kind of thing that's natively PyScript would be,

56:14 I think, really unlock kind of that next phase of community growth and developer excitement.

56:19 Yeah, yeah, yeah.

56:20 It feels like the notebook in the browser deal is almost nailed.

56:25 Right.

56:25 But the, I'm going to replace my React front end with a Python front end, that's probably

56:31 the next frontier.

56:32 Yeah.

56:32 Maybe what I ought to do is I ought to put a bounty up there for someone to build a PyScript,

56:38 PyScript React thing embedded inside a Wasm object or Wasm container on the app protocol

56:45 and just put like a $20,000 bounty out there and see who gets there first.

56:49 Because I think that would show us how it could get done.

56:51 There are some things in decentralized web, these frame objects and whatnot that are there

56:55 to put little applets in the, on chain.

56:59 And I think we're, maybe we just, people just need a little kick like that.

57:03 Yeah, that would be amazing.

57:05 Right.

57:06 Yeah.

57:07 There was the whole keynote.

57:08 There's been several versions, you know, Carol Willing gave one of them.

57:12 Russell Keith McGee gave one sort of about, there's the few places that are really important

57:16 in computing that Python doesn't really touch.

57:18 And it was kind of mobile and web front end.

57:20 Right.

57:20 And so that would take down one of those two, which would be pretty amazing.

57:23 And maybe indirectly somehow find its way to mobile if it could be on the web, you know?

57:27 Yeah.

57:27 React Native is how a lot of people are doing the mobile development.

57:30 So if you have a wrapper for React, like that should, that should be there.

57:33 Right.

57:33 Yeah, exactly.

57:34 That'd be wild.

57:35 Yeah.

57:35 Awesome.

57:36 Okay.

57:36 Let's talk.

57:37 You got a few minutes to talk to Blue Sky real quick before we wrap things up?

57:40 Yeah, I do.

57:41 Yep.

57:41 You have a really interesting profile here on, I'll certainly put that into the show notes.

57:47 I am a new Blue Sky.

57:50 I just created an account last week.

57:52 Finally, I was thinking, you know, already I'm on Mastodon and I'm kind of on Twitter and

57:56 I've just got a lot to, I've got a lot of email addresses.

57:59 I kind of got enough, right?

58:00 But really recently, a lot of people have been moving from the tech space to Blue Sky.

58:05 And I'm like, you know what?

58:06 I had two thoughts.

58:07 One, like, I'll just create an account and see what it's like over there.

58:09 Why not?

58:09 That was thought one.

58:10 Thought two is, I better go there before somebody steals like the podcast handle on my name and

58:16 stuff, you know, like that.

58:17 Because that happened on Mastodon before I got there.

58:19 I'm like, oh, maybe not.

58:21 So, yeah.

58:22 So, I now have, I now have at mkennedy.codes as I'm here officially, right?

58:28 Right.

58:28 But yeah.

58:29 So, I'm really, I'm surprisingly delighted by it.

58:33 So, maybe.

58:33 Good.

58:33 Good.

58:34 I've seen you talking about a couple of projects and different things.

58:36 So, yeah.

58:37 You want to riff on that for a minute?

58:38 Yeah.

58:39 So, I know you're writing a lot about it, right?

58:41 You just did an article or something on it.

58:43 Oh, no, I haven't yet.

58:44 I've been meaning to.

58:45 This is the Thanksgiving post-tryptophan induced haze.

58:49 I might go and just bang something out.

58:51 Okay.

58:52 But I should tell my story, I guess, and my interaction with this ecosystem that, you know,

58:57 obviously, I think most people know me for my engagement in Python and Python, PyData,

59:01 Anaconda, all these kinds of things.

59:03 But for about eight years now, I've been involved in the decentralized web community in one form

59:10 or another.

59:10 I go to some of the camps that the Internet Archive puts together.

59:12 And I funded some projects around this area.

59:14 And years ago, I started funding a project called Beaker Browser, which is a decentralized

59:18 web browser for looking at local, it's a local first web browser that was imminently modifiable.

59:26 And then you would then use some kind of backend data transport like IPFS or HyperCore, DAT,

59:33 whatever kinds of things.

59:35 And so you have a very different approach to building websites, building web apps when you

59:40 have this data decoupled from the view, right?

59:44 Ultimately, I had surmised that a lot of the crappiness of the current internet and social

59:52 media apps and all these kinds of things was due ultimately to a flaw in the web itself,

59:59 which is the client server model.

01:00:00 Intrinsically dumps user data.

01:00:02 And from a software architecture perspective, what we might say is that we take an MVC architecture

01:00:07 and we split the view model and we put the view model out here, but we put the business

01:00:11 logic in the model over here and it's all owned by a trillion dollar tech company, right?

01:00:15 And so all of us that have to pay...

01:00:17 They observe the heck out of it as the data flows.

01:00:18 They observe the heck out of that and they sell those observations and they profit handsomely

01:00:22 from it.

01:00:23 But then it also leads to all sorts of...

01:00:27 There's a dark shadow to all of that, right?

01:00:30 Which is not good.

01:00:31 From a civil libertarian as well as from an entrepreneur, from a tech innovator perspective,

01:00:34 I don't like any of that.

01:00:36 So my investment into these decentralized web technologies were fueled by this realization

01:00:42 that I had.

01:00:43 That we had to really rebuild internet architecture from the bottom up and how we build applications,

01:00:47 period.

01:00:47 So that you can actually draw a thematic line from where my investments and things like

01:00:51 Beaker Browser to us supporting at Anaconda development of tools like PyScript and Beware,

01:00:57 Russell and Keith McGee, right?

01:00:58 Empowering end users to build applications and permissionless and as easily as possible,

01:01:03 share them with other users.

01:01:04 So we have a lot of innovation and creativity and joy ultimately in this creative space that

01:01:08 computing actually should be.

01:01:10 So ultimately Beaker Browser didn't work out as a company, but the guy who was behind it spent

01:01:17 four years, I mean, I funded him for those four years, doing a lot of experiments.

01:01:21 And what does decentralized Twitter, decentralized Reddit, decentralized Facebook, what do these

01:01:25 kinds of like local first experiences look like?

01:01:26 What is the right design for a protocol to do?

01:01:29 Like, you know, because he worked on secure scuttlebutt and these gossip networks, like

01:01:32 Nostra, like what Nostra sort of has.

01:01:33 And then like, what are the things where we have to centralize?

01:01:36 All of those learnings kind of got wrapped up into then what is now app protocol and Blue

01:01:41 Sky.

01:01:41 So he got hired in by Jay, who's the CEO of Blue Sky to go and build the, what is now the Blue

01:01:46 Sky app?

01:01:47 And the app protocol is the result of the Beaker Browser guy, Paul Frazee, working closely

01:01:53 with, you know, people like Jeremy and others that are the core devs at Blue Sky.

01:01:58 And so that's kind of my connection to this ecosystem.

01:02:01 So I'm user number six or seven, I think user number six on the network.

01:02:06 And I'm a big, huge proponent of this stuff.

01:02:08 So that has like no formal intersection with my work in the Python space.

01:02:14 It's just another aspect of technology that I'm very passionate about.

01:02:16 But both are actually connected thematically in the sense of empowering end users.

01:02:21 Actually, the Python bit, PyData, the Excel bit, and then the Blue Sky bit all connect in the sense of

01:02:27 empowering people, regular humans to use computers to the best, to making their lives better.

01:02:35 Whether it's asking more interesting questions of the world, whether it's being able to then share

01:02:40 and deploy the things they build with other people, whether it's, you know, reaching out to more and more

01:02:45 people, Excel has a billion users or something like that, 700 million users.

01:02:49 Right.

01:02:49 Meet them where they are.

01:02:50 Meet them where they are.

01:02:52 Yeah.

01:02:52 Yeah.

01:02:53 Right.

01:02:53 And then with Blue Sky, it's how do we kind of go and like just have a different way of people

01:03:00 connecting to each other on the internet, being able to make interesting things

01:03:03 and really share in a spirit of collaboration.

01:03:06 And so one of the really important things about Blue Sky is that the way the moderation system works,

01:03:11 all those block lists and all those kinds of things that people have.

01:03:14 You know, I'm an old school internet guy.

01:03:17 And I remember if you were on Usenet, you would have a thing called a K-file.

01:03:20 It's your local kill file.

01:03:21 Right.

01:03:22 Which is people I just don't effing want to hear from because they're obviously idiots.

01:03:25 And your readers, news readers would respect that.

01:03:29 Right.

01:03:30 And so something that Paul and I talked about years ago was a social K-file would be a way

01:03:35 to have a social group be able to really just disincentivize and lock out the bad faith interactions.

01:03:42 Right.

01:03:42 Now, of course, you could go too far with that and just create a little echo chamber for yourself.

01:03:47 But there is a lot of space between spammers, trolls everywhere, spamming us with like crypto

01:03:52 spam and just like total echo chamber.

01:03:55 There's space in the middle.

01:03:56 Right.

01:03:57 So I think what Blue Sky is demonstrating with the shared block lists and things like

01:04:00 that and people just being able to mute things built in the UI.

01:04:03 Those are the kinds of tools we can make available when the company and the technology isn't incentivized

01:04:09 to create more angry bullshit sort of interactions and engagement.

01:04:13 So the algorithms don't promote that and push the heck out of it to get engagement.

01:04:17 No, the default Blue Sky algorithm is just chronological feed of following.

01:04:21 So if you don't want to see somebody's stuff, don't follow them.

01:04:24 You know, if you go to some of the other feeds, there's more algorithmic stuff there if you want to have it.

01:04:28 But by default, it's just whoever you chose to follow.

01:04:31 And if you don't like what they say and you know them, you should engage with them and say, hey, man, I don't agree with that.

01:04:35 Like, let's talk about that.

01:04:36 Right.

01:04:37 So I think this is...

01:04:38 Or unfollow them and just...

01:04:38 Or just unfollow them.

01:04:39 It's like, you know what?

01:04:40 They talk too much about politician XYZ.

01:04:42 I don't want to hear about that.

01:04:43 Right?

01:04:43 It's a very simple approach.

01:04:45 I need a space where I don't want to hear about that.

01:04:47 Yeah.

01:04:47 One thing I'll give a shout out to help people get started if they're interested in is they have this cool idea called starter packs.

01:04:53 Right.

01:04:53 And I created one called Python Personalities.

01:04:56 And you're on here, Peter.

01:04:57 So if they follow the starter pack, they'll automatically follow you.

01:05:01 But there's click follow and you get 60, 70 Python, nice, engaged people to follow.

01:05:09 And if you want to find other starter packs, there's blueskydirectory.com.

01:05:13 That's right.

01:05:14 Blueskydirectory.com.

01:05:15 There's 60,000 or so of these directories.

01:05:19 Yeah.

01:05:20 And you can search here.

01:05:20 You can say, I'm interested in Python or I'm interested in motorcycle or whatever it is you're interested in.

01:05:25 Right.

01:05:26 And there's AI.

01:05:27 Yeah.

01:05:27 I saw somebody have a starter pack called Friendly Weather Scientist Using AI.

01:05:34 It's very specific.

01:05:37 But if you want that group, there you are.

01:05:39 All right.

01:05:40 Let's close this out with a real quick bit of real-time feedback from Fabio from PyScript.

01:05:44 Oh, OK.

01:05:45 Yep.

01:05:45 There's a framework called PuePy on top of PyScript that is native and reactive.

01:05:50 Very early, though.

01:05:51 Yes, I had the guy on.

01:05:52 I'm sorry.

01:05:53 I don't remember the first name of the guy.

01:05:55 But I had the guy behind PuePy on.

01:05:57 And it looks really interesting.

01:05:58 I'm hopeful for it.

01:05:59 But I'm not necessarily sure yet that it's the answer.

01:06:01 And then Fabio also says, we're also working on the invent framework.

01:06:06 That is not really comparable to React for you, but in that direction.

01:06:09 I just love his YouTube avatar.

01:06:12 You said, well, hold on.

01:06:13 We should go to his YouTube.

01:06:14 Look at that.

01:06:16 Look at that.

01:06:17 This looks like an amazing picture from Halloween.

01:06:19 It's a very serious business profile.

01:06:21 Right.

01:06:22 Yes, Fabio's Fabio's partner is a world championship winning sort of body painter.

01:06:28 And so he has those.

01:06:29 He does the photography.

01:06:30 So that's why that's not just a random thing.

01:06:32 It's kind of related to his family stuff.

01:06:36 Yeah.

01:06:36 But it's kind of hilarious.

01:06:37 Awesome.

01:06:38 All right.

01:06:39 Well, I suppose we're probably out of time, Peter.

01:06:42 But it's always delightful to talk to you and so many things.

01:06:44 Yeah.

01:06:45 Well, thank you so much for having me on.

01:06:47 This is a lot of fun.

01:06:48 Thank you.

01:06:48 Yeah, you bet.

01:06:49 And I'll put the links for all the things we talked about in the share notes.

01:06:53 Let's leave it with a closeout for the Excel thing.

01:06:55 People want to get started with that.

01:06:56 What do they do?

01:06:56 Go to anaconda.com slash Excel or just go to your Excel.

01:07:00 Again, Excel for Windows.

01:07:01 Just type in equals PY.

01:07:02 See if that works.

01:07:03 And then for the Anaconda Toolbox, you can just also go into Excel plugin finder and you

01:07:10 can find it there.

01:07:10 Or you just Google Anaconda Toolbox and you'll find the install instructions there.

01:07:13 So definitely check it out and give us feedback on it.

01:07:15 Awesome.

01:07:16 Thanks.

01:07:16 See you later.

01:07:17 Thank you.

01:07:17 This has been another episode of Talk Python to Me.

01:07:21 Thank you to our sponsors.

01:07:23 Be sure to check out what they're offering.

01:07:25 It really helps support the show.

01:07:26 Take some stress out of your life.

01:07:28 Get notified immediately about errors and performance issues in your web or mobile applications with

01:07:34 Sentry.

01:07:34 Just visit talkpython.fm/sentry and get started for free.

01:07:39 And be sure to use the promo code talkpython, all one word.

01:07:42 And this episode is brought to you by Bluehost.

01:07:45 Do you need a website fast?

01:07:47 Get Bluehost.

01:07:48 Their AI builds your WordPress site in minutes and their built-in tools optimize your growth.

01:07:53 Don't wait.

01:07:54 Visit talkpython.fm/bluehost to get started.

01:07:58 Want to level up your Python?

01:07:59 We have one of the largest catalogs of Python video courses over at Talk Python.

01:08:03 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:08:08 And best of all, there's not a subscription in sight.

01:08:11 Check it out for yourself at training.talkpython.fm.

01:08:14 Be sure to subscribe to the show.

01:08:16 Open your favorite podcast app and search for Python.

01:08:19 We should be right at the top.

01:08:20 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the

01:08:26 direct RSS feed at /rss on talkpython.fm.

01:08:30 We're live streaming most of our recordings these days.

01:08:32 If you want to be part of the show and have your comments featured on the air, be sure to

01:08:36 subscribe to our YouTube channel at talkpython.fm/youtube.

01:08:41 This is your host, Michael Kennedy.

01:08:42 Thanks so much for listening.

01:08:43 I really appreciate it.

01:08:45 Now get out there and write some Python code.

01:08:47 Bye.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon