Monitor performance issues & errors in your code

#456: Building GPT Actions with FastAPI and Pydantic Transcript

Recorded on Monday, Jan 22, 2024.

00:00 Do you know what custom GPTs are?

00:02 They're configurable and shareable chat experiences with the name, logo, custom instructions,

00:07 conversation starters, access to open AI tools and custom API actions.

00:13 And you can build them with Python.

00:15 Ian Moyer has been doing just that and is here to share his experience building them.

00:20 This is "Talk Python to Me," episode 456, recorded January 22nd, 2024.

00:26 (upbeat music)

00:31 Welcome to "Talk Python to Me," a weekly podcast on Python.

00:44 This is your host, Michael Kennedy.

00:45 Follow me on Mastodon, where I'm @mkennedy and follow the podcast using @talkpython,

00:50 both on fosstodon.org.

00:53 Keep up with the show and listen to over seven years of past episodes @talkpython.fm.

00:58 We've started streaming most of our episodes live on YouTube.

01:02 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows

01:08 and be part of that episode.

01:10 This episode is sponsored by Sentry.

01:12 Don't let those errors go unnoticed.

01:14 Use Sentry.

01:15 Get started at talkpython.fm/sentry.

01:18 And it's also brought to you by Neo4j.

01:21 It's time to stop asking relational databases to do more than they were made for.

01:25 Check out the sample FastAPI project and see what Neo4j, a native graph database, can do for you.

01:32 Find out more at talkpython.fm/neo4j.

01:36 Ian, welcome to Talk Python to Me.

01:39 - Hey, hey, Michael, good to see you again.

01:41 - Yeah, great to see you again.

01:43 It has been a little while.

01:44 It seems like not so long ago, and yet when I pull up the episode that we did together,

01:50 sure enough, it says March 7th, 2018.

01:55 Wow.

01:55 - The years are short.

01:56 The years are short, they go by really fast.

01:58 - They sure do.

01:59 So back then, we were talking about Python and biology and genomics, and it sounds like you're still doing genetic-type things

02:08 and still doing Python and all that kind of stuff.

02:11 - For sure, yeah, definitely.

02:12 I work for a company called Genome Oncology.

02:14 We do precision oncology software, helping folks make sense of genomics and trying to help cancer patients.

02:20 - That's awesome.

02:21 There's different levels of helping people with software.

02:25 On one level, we probably have ad retargeting.

02:28 On the other, we've got medical benefits and looking for helping people who are suffering socially or whatever.

02:38 So it's gotta feel good to write software that is making a difference in people's lives.

02:43 - That's right.

02:44 I did spend a lot of the 2000s making e-commerce websites, and that wasn't exactly the most fulfilling thing.

02:48 I learned a lot, but it wasn't as exciting as what I'm doing now, or at least as fulfilling as what I'm doing now.

02:53 - What were those earlier websites in Python?

02:55 - That was all Java for the most part.

02:57 And finally with this company, knocked out a prototype in Django a few years ago.

03:03 And my boss at the time was like, "You did that so fast, "you should do some more stuff in Python."

03:08 So that's kind of how it evolved.

03:10 And now basically most of our core backend is Python, and we use a little bit of Svelte for the user interfaces.

03:17 - Beautiful.

03:18 It's easy to forget, like five years ago, 10 years ago, people were questioning whether Python

03:24 should be something you should use.

03:25 Is it a real language?

03:26 Do you really use it?

03:26 Is it safe to use?

03:27 Maybe you should use a Java or a C# or something like that, because this is a real project.

03:33 It's interesting.

03:34 You don't hear that nearly as much anymore, do you?

03:36 - I grew up with Boston sports fans, and it was like being a Boston sports fan was terrible

03:39 for the longest time.

03:40 And now it's like, "Okay, we don't wanna hear about your problems right now."

03:43 And same thing with Python.

03:44 It's like, "I like Python." It's like, "Yeah, great.

03:46 "So does everybody else in the world." So yeah, it's really not the issue anymore.

03:50 Now it's not the cool thing to play with.

03:51 So now you gotta go to Rust or something else.

03:54 - You know what's shiny?

03:55 LLMs are shiny.

03:56 - LLMs are very shiny, for sure.

03:58 - Yeah, we can talk about them today.

03:59 - Yeah, that sounds great.

04:01 Let's do it.

04:01 - First of all, we're gonna talk about building applications that are basically powered by LLMs that you plug into.

04:08 Right? - Yep.

04:09 - Before we get into creating LLMs, just for you, where do LLMs play a role for you

04:16 in software development these days?

04:18 - Sure.

04:19 So, you know, like everybody else, I mean, I had been playing with, so I do natural language processing as part of my job.

04:24 Right?

04:25 So using spaCy was a big part of the information extraction stack that we use,

04:30 'cause we have to deal with a lot of medical data and medical data is just unstructured

04:33 and has to be cleaned up before it can be used.

04:36 That was my exposure.

04:37 I had seen GPTs and the idea of like generating text, just starting from that,

04:42 didn't really make much sense to me at the time.

04:44 But then obviously like everybody else, when chatGPT came out, I was like, "Oh, I get this now."

04:48 Like this thing does, you know, it can basically learn in the context and it can actually produce something that's interesting

04:54 and you can use it for things like information extraction.

04:56 So just like everybody else, I kind of woke up to them, you know, around that time that they got released

05:01 and I use them all the time.

05:02 Right?

05:03 So chatGPT4 is really what I use.

05:05 I would recommend, if you can afford the $20 a month, it's still the best model that there is as of January 2024.

05:11 And I use that for coding.

05:12 I don't really like the coding tools, the copilots, but there, you know,

05:16 there's definitely folks that swear by them.

05:18 So my workflow is more of, I have a problem, work with the chatbot to try to like, you know,

05:22 think through all the edge cases and then, and then think through the test case, the tests.

05:27 And then I think through the code, right?

05:29 And then the actual typing of the code, yeah, I'll have it do a lot of the boilerplate stuff,

05:33 but then kind of shaping the APIs and things like that.

05:35 I kind of like to do that myself still.

05:37 I'm kind of old school, old school.

05:39 - I guess I'm old school as well.

05:40 'Cause I'm like right there with you.

05:42 But for me, I don't generally run copilot or those kinds of things in my editors.

05:48 I do have some features turned on, but primarily it's just really nice autocomplete.

05:54 You know what I mean?

05:55 Like, it seems like it almost just knows what I want to type anyway, and that's getting better.

05:59 I don't know if anyone's noticed recently, one of the recent releases of PyCharm,

06:03 it starts to autocomplete whole lines.

06:06 And I don't know where it's getting this from.

06:07 And I think I have the AI features turned off.

06:10 At least it says I have no license.

06:11 Guessing that means they're turned off.

06:13 So it must be something more built into it.

06:16 That's pretty excellent.

06:17 But for me, I find I'm pretty content to just sit and write code.

06:20 However, the more specific the unknowns are, the more willing I'm like, "Oh, I need to go to ChatGPT for this."

06:28 Like, for example, like, how do you use Pydantic?

06:30 Like, well, I'll probably just go look at a quick code sample and see that so I can understand it.

06:35 But if it's, I have this time string with the date like this, the month like this,

06:41 and then it has the time zone like that.

06:43 How do I parse that?

06:44 Or how do I generate another one like that in Python?

06:47 And here's the answer.

06:48 Or I have this giant weird string, and I want this part of it as extracted with a regular expression.

06:55 And I want a regular expression.

06:56 - Regular expression, I was just gonna say that.

06:56 - Oh my gosh.

06:57 - You don't have to write another one of those.

06:59 Yeah, it's great.

06:59 - Yeah, it's pretty much like, do you need to detect the end of a line straight to ChatGPT?

07:03 Not really, but you know, it's like almost any level of chat, a regular expression.

07:07 I'm like, well, I need some AI for this.

07:09 'Cause this is not time well spent for me.

07:11 But yeah, it's interesting.

07:12 - Yeah, one big tip I would give people though is that these chatbots, they wanna please you.

07:16 So you have to ask it to criticize you.

07:19 You have to say, here's some piece of code, tell me all the ways it's wrong.

07:22 And you have to also ask for lots of different examples because it just starts to get more creative,

07:27 more things that it says.

07:28 It really thinks by talking, which is a really weird thing to consider.

07:31 But it's definitely some things to keep in mind when you're working with these things.

07:35 - And they do have these really weird things.

07:37 Like if you compliment them, or if you ask it, you sort of tell it, like, I really want you to tell me.

07:42 It actually makes a difference, right?

07:43 It's not just like a search engine.

07:45 Like, well, of course, what does it care?

07:46 Just you put these keywords in and they come out.

07:48 Like, no, you've kind of got to like, know how to talk to it just a little bit.

07:51 - I've seen people threatening them, or like saying that someone's being held ransom,

07:55 or, you know, I like to say, my boss is really mad at me.

07:58 Like, help me out here, right?

07:59 And like, see if it'll generate some better code.

08:01 - You're not being a good user.

08:03 You're trying to trick me.

08:04 I've been a good chatbot and you've been a bad user and I'm not gonna help you anymore.

08:08 - Yeah, right.

08:09 So it was actually basically a conversation from being in the early days.

08:12 - Yeah, the Sydney episode.

08:14 Yeah, that was crazy, right?

08:15 Super funny.

08:16 - How funny?

08:17 All right, well, I'm sure a lot of people out there are using AI these days.

08:20 I think I saw a quote from, I think it was from GitHub, saying over 50% of developers are using Copilot.

08:26 - For sure.

08:27 - Which, crazy, but I mean, not that surprising.

08:29 50% of the people are using Autocomplete.

08:31 So I guess it kind of, kind of like that, right?

08:34 - They're great tools.

08:34 They're gonna keep evolving.

08:35 There's some other ones I'm keeping an eye on.

08:36 There's one called Console, which just takes a different approach.

08:39 They use some stronger models.

08:41 And then there's a website called PHIND, P-H-I-N-D, that allows you to do some searching,

08:45 that they've built their own custom model.

08:47 Really interesting companies that are doing some really cool things.

08:49 And then Perplexity is like the search replacement that a lot of folks are very excited about using

08:54 instead of Google.

08:55 So there's a lot of different tools out there.

08:57 You could spend all your day just kind of playing around and learning these things where you got to actually

09:01 kind of get some stuff done too.

09:02 - Yeah, you gotta pick something and go, right?

09:04 Because with all the churn and growth and experimentation you got, you probably could try a new tool every day

09:09 and still not try them all, you know, just be falling farther behind.

09:13 So you gotta pick something and go.

09:15 - And go, yep.

09:16 - Let's talk about writing some code.

09:18 - Yeah, the next thing you're gonna do after you use a chatbot is to hit an API.

09:24 Like if you're gonna program an app and that app is gonna have LLM inside of it,

09:28 large language models inside of it, APIs are pretty much the next step, right?

09:32 So OpenAI has different models that are available.

09:35 This is a webpage that I just saw recently that'll actually compare the different models

09:39 that are out there.

09:40 So there's obviously the big guy, which is OpenAI, and you can get that through Azure as well

09:44 if you have a Microsoft arrangement.

09:46 And there's some security reasons or HIPAA compliance and some other reasons that you might want to talk

09:51 through Azure instead of going directly to OpenAI.

09:54 I defer to your IT department about that.

09:57 Google has Gemini, which they just released the pro version, which I believe is as strong as 3.5, roughly.

10:03 That is interesting because if you don't care about them training on your data,

10:07 if like whatever you're doing is just like not super proprietary or something

10:11 you're trying to keep secret, they're offering free API access, I believe 60 words per minute, right?

10:17 So basically one a second, you can call this thing and there's no charge.

10:21 So I don't know how long that's gonna last.

10:23 So if you have an interesting project that you wanna use in a large language model for,

10:26 you might wanna look at that.

10:27 - Yeah, especially if it's already open data that you're playing with.

10:30 - Exactly, right.

10:31 - Or data you've somehow published to the web that has certainly been consumed by these things.

10:36 And these models are gonna train on it, right?

10:37 That's the trade, right?

10:38 They're trying to get more tokens, is what they call it, right?

10:41 The tokens are what they need to actually make these models smarter.

10:44 So everyone's just hunting for more tokens and I think this is part of their strategy for that.

10:49 And then there's also a CLAUDE by Anthropic.

10:52 And then after that, you get into the, kind of the open source APIs as well.

10:55 - There's some really powerful open source ones out there.

10:58 Yeah, so this website, yeah, this is DocsBot, for people listening, DocsBot.ai.

11:03 Is its sole purpose just to tell you price comparisons and stuff like that?

11:07 Or does it have more than that?

11:08 - I assume this company's got some product, unfortunately I don't know what it is.

11:11 I saw this link that they put out there and it's a calculator.

11:14 So you basically can put your, what tokens, how many tokens, there's input tokens

11:18 and there's output tokens, right?

11:19 So they're gonna charge more on the output tokens.

11:22 That's for the most part.

11:23 Some of the libraries, some of the models are more equal.

11:26 And then they, what they do is, if you can figure out like roughly how big a message

11:30 is gonna be, both the input and the output, how many calls you're gonna make,

11:33 you can use that to then calculate basically the cost.

11:37 And the cost is always at like tokens per thousand, you know, or dollars or pennies really,

11:42 pennies per thousand tokens.

11:43 And then it's just a math equation at that point.

11:45 And what you'll find is calling GPT-4 is gonna be super expensive.

11:49 And then calling, you know, a small seven, what's called a 7B model from Mistral

11:53 is gonna be the cheapest.

11:55 And you're just gonna look for these different providers.

11:57 - Wow, the prices really are different.

11:59 Like for example, opening an Azure GPT-4 is three, a little over three cents per call.

12:05 Whereas GPT-3, five turbo is one 10th of one cent.

12:10 It's a big difference there.

12:13 11 cents versus $3 to have a conversation with it.

12:16 - Yes, it's a very, very wide difference.

12:18 And it's all based on, you know, how much compute do these models take, right?

12:21 'Cause the bigger the model, the more accurate it is.

12:25 But also the more expensive it is for them to run it.

12:27 So that's why there's such a cost difference.

12:31 - This portion of Talk Python to Me is brought to you by Sentry.

12:33 In the last episode, I told you about how we use Sentry to solve a tricky problem.

12:38 This time, I wanna talk about making your front end and backend code work more tightly together.

12:43 If you're having a hard time getting a complete picture of how your app is working and how requests flow

12:49 from the front end JavaScript app, back to your Python services, down into database calls for errors and performance,

12:56 you should definitely check out Sentry's distributed tracing.

12:59 With distributed tracing, you'll be able to track your software's performance,

13:03 measure metrics like throughput and latency, and display the impact of errors across multiple systems.

13:09 Distributed tracing makes Sentry a more complete performance monitoring solution,

13:13 helping you diagnose problems and measure your application's overall health more quickly.

13:19 Tracing in Sentry provides insights such as what occurred for a specific event or issue,

13:24 the conditions that cause bottlenecks or latency issues, and the endpoints and operations that consume the most time.

13:30 Help your front end and backend teams work seamlessly together.

13:34 Check out Sentry's distributed tracing at talkpython.fm/sentry-trace.

13:40 That's talkpython.fm/sentry-trace.

13:43 And when you sign up, please use our code TALKPYTHON, all caps, no spaces,

13:48 to get more features and let them know that you came from us.

13:52 Thank you to Sentry for supporting the show.

13:55 - Yeah, I recently interviewed, just released a while ago, interviewed because of time shifting,

14:00 on podcast, Mark Russinovich, CTO of Azure, and we talked about all the crazy stuff that they're doing

14:06 for coming up with the, just running these computers that handle all of this compute,

14:10 and it's really a lot.

14:12 - There was a GPU shortage for a while.

14:13 I don't know if that's still going on.

14:14 And obviously, you know, the big companies are buying hundreds of thousands

14:18 of these GPUs to get the scale they need.

14:21 - Yeah.

14:22 - And so once you figure out which API you want to use, then you want to talk about the library.

14:26 So now, you know, most of these providers, they have, you know, a Python library that they offer.

14:30 I know OpenAI does, and Google, Gemini does, but there's also open source ones, right?

14:35 'Cause they're not very complicated to talk to.

14:38 It's just basically HTTP requests.

14:41 So it's just really a matter of like, what's the ergonomics you're looking for as a developer

14:44 to interact with these things.

14:46 And most importantly, make sure you're maintaining optionality, right?

14:49 Like, it's great to do a prototype with one of these models, but recognize you might want to switch

14:54 either for cost reasons or performance reasons or what have you.

14:58 And, you know, LangChain, for instance, has a ton of the providers as part of,

15:03 you basically are just switching a few arguments when you're switching between them.

15:08 And then Simon Willison has, you know, a Python fame, has an LLM project where he's defined,

15:14 you know, basically a set of, and it's really clean just the way he's organized it,

15:18 because you can just add plugins as you need them, right?

15:20 So you don't have to install all the different libraries that are out there.

15:23 And I think LangChain is kind of following a similar approach.

15:25 I think they're coming up with a LangChain core capability where you can just kind of bring in things

15:30 as you need them.

15:31 And so the idea is you're now coding against these libraries and you're trying to bring together, you know,

15:37 the text you need to have analyzed or whatever your use case is, and then it'll come back with the generation.

15:43 And you can also not just use them on the cloud, you can use open source ones as well

15:46 and run them locally on your local computer.

15:48 - I'd never really thought about my architectural considerations, I guess, of these sorts of things.

15:54 But of course you want to set up some kind of abstraction layer so you're not completely tied into some provider.

16:02 It could be that it becomes too expensive.

16:03 It could be that it becomes too slow, but it also might just be something that's better.

16:07 It could be something else that comes along that's better and you're like, "Eh, we could switch.

16:11 "It's 25% better." But it's like a week to pull all the details of this one LLM out and put the new ones in,

16:18 and so it's not worth it, right?

16:19 So you like having, being tied to a particular database rather than more general, it's a similar idea.

16:25 - And especially at this moment in time, right?

16:27 Every couple months, something, so something from the bottom up is getting better and better.

16:32 Meaning, you know, LLAMA came out a year ago, and then LLAMA 2 and Mistral and Mixtral,

16:37 and LLAMA 3 is gonna be coming out later this year, we believe.

16:40 And so those models, which are smaller and cheaper and easier to use, or not easier to use, but they're just cheaper,

16:46 is those things are happening all the time.

16:48 So being able to be flexible and nimble and kind of change where you are

16:52 is gonna be crucial, at least for the next couple years.

16:54 - Yeah, the example that I gave was databases, right?

16:56 And databases have been kind of a known commodity since the '80s, or what, 1980s?

17:02 And of course, there's new ones that come along, but they're kind of all the same,

17:05 and you know, we got, there was MySQL, now there's Postgres that people love, and right?

17:11 So that is changing way, way slower than this.

17:14 And people are like, "Well, we gotta think about "those kinds of, like, don't get tied into that."

17:17 Well, it's way less stable.

17:19 - Right, and people, you know, create layers of abstraction there, too, is where you got SQLAlchemy,

17:25 and then, you know, Sebastian from FastAPI has SQLModel.

17:28 That's a layer on top of SQLAlchemy, you know, and then there's also, you know,

17:32 folks that just like writing clean, fancy SQL, and you can, you know, hopefully be able to port that

17:36 from database to database as well.

17:38 So it's the same principles, separation of concerns, so you can kind of be flexible.

17:42 - All right, so you talked about LengChain.

17:44 Just give us a sense, real quick, of what LengChain is.

17:47 - This was a great project from a timing perspective.

17:49 I believe they kind of invented it and released it right around the time ChatGPT came out.

17:53 It's a very comprehensive library with lots of, I mean, the best part about LengChain, to me,

17:58 is the documentation and the code samples, right?

18:00 Because if you want to learn how to interact with a different large language model

18:04 or work with a vector database, there's another library called LamaIndex

18:08 that does a really good job at this as well.

18:09 They have tons and tons of documentation and examples, so you can kind of look at those and try to understand it.

18:15 The chaining part really came from the idea of like, okay, I prompt the large language model, it gives a response,

18:20 now I'm gonna take that response and prompt it, again, with a new prompt using that output.

18:25 The challenge with that is the reliability of these models, right?

18:29 They're not gonna get close, they're not close to 100% accurate on these types of tasks.

18:34 The idea of agents as well is another thing that you might build with a LangChain.

18:38 And the idea there is basically the agent is getting a task, coming up with a plan for that task,

18:44 and then kind of stepping through those tasks to get the job done.

18:48 Once again, we're just not there yet as far as those technologies, just because of the reliability.

18:54 And then there's also a bunch of security concerns that are out there too,

18:57 that you should definitely be aware of.

18:59 Like one term to Google and make sure you understand is prompt injection.

19:03 Right, so Simon, once again, he's got a great blog.

19:05 He's got a great blog article, or just even that tag on his blog is tons of articles around prompt injection.

19:11 And prompt injection is basically the idea, you have an app, a user says something in the app,

19:17 or like types into the, to the, whatever the input is, and whatever texts that they're sending through,

19:22 just like with SQL injection, they kind of hijacks the conversation and causes the large language model

19:26 to kind of do a different thing.

19:28 - Little Bobby Llama, we call him.

19:30 Instead of little Bobby Diggles.

19:31 - And then the other wild one is like, you know, people are putting stuff up on the internet

19:35 so that when the large language model browses for webpages and brings back text, it's, you know, reading the HTML

19:41 or reading the text in the HTML, and it's causing the large language model

19:44 to behave in some unexpected way.

19:46 So there's lots of crazy challenges out there.

19:49 - I'm sure there's a lot of adversarial stuff happening to these things as they're both trying to gather data

19:54 and then trying to run, right?

19:56 I saw the most insane, I guess it was an article, I saw it on RSS somewhere.

20:01 And it was saying that on Amazon, there's all these knockoff brands that are trying to,

20:06 you know, instead of Gucci, you have a Gucci or I don't know, whatever, right?

20:11 And they're getting so lazy.

20:13 I don't know what the right word is, that they're using LLMs to try to write a description

20:17 that is sort of in the style of Gucci, let's say.

20:20 And it'll come back and say, I'm sorry, I'm a large language model.

20:24 I'm not, my rules forbid me from doing brand trademark violation.

20:31 That's what the Amazon listing says on Amazon.

20:33 They just take it and they just straight pump it straight, whatever it says, it just goes straight into Amazon.

20:37 - Yeah, you have to like Google, like, sorry, I'm not, sorry, as a large language model

20:40 or sorry as a whatever.

20:41 Yeah, you can find all. - Exactly.

20:42 And there's like, the product listings are full of that.

20:44 It's amazing.

20:45 It's amazing. - It's crazy.

20:47 - Certainly the reliability of that is, you know, they could probably use some testing

20:50 and those kinds of things.

20:52 - For sure. - Someone out there asks, like, I wonder if for local LLM models,

20:56 there's a similar site as DocSpot that show you like what you need to run it locally.

21:00 So that's an interesting question also, segue to maybe talk about like some local stuff.

21:04 - LLM Studio, this is a new product.

21:06 I honestly haven't had a chance to like really dig in and understand who created this.

21:09 And, you know, make sure that the privacy stuff is up to snuff, but I've played around with it locally.

21:14 It seems to work great.

21:15 It's really slick, really nice user interface.

21:17 So if you're just wanting to get your feet wet and try to understand some of these models,

21:21 I'd download that and check it out.

21:22 There's a ton of models up on Hugging Face.

21:25 This product seems to just basically link right into the Hugging Face interface and grabs models.

21:31 And so some of the models you wanna look for are right now as in January, right?

21:35 There's Mistral 7B, you know, M-I-S-T-R-A-L.

21:39 There's another one called Phi 2.

21:41 Those are two of the smaller models that should run pretty well on, you know,

21:44 like a commercial grade GPU or an M1 or an M2 Mac, if that's what you have and start playing with them.

21:51 And they're quantized, which means they're just kind of made a little, take a little bit less space,

21:56 which is good from like a virtual RAM with regards to these GPUs.

22:00 And, you know, there's a account on Hugging Face called The Bloke.

22:04 If you look for him, you'll see all his different fine, his different fine tunes and things like that.

22:09 And there's a group called Nous, I think is how you pronounce it, N-O-U-S.

22:13 And they've got some of the fine tunes that are basically the highest performing ones

22:18 that are out there.

22:19 So if you're really looking for a high performing local model that can actually, you know,

22:22 help you with code or reasoning, those are definitely the way to get started.

22:26 - Yeah, this one seems pretty nice.

22:28 I also haven't played with it, I just learned about it, but it's looking really good.

22:32 I had played with, what was it, GPT for All, I think is what it was.

22:36 - Yep, yep.

22:37 - It was the one that I played with.

22:38 Somehow this looks like, looks a little bit nicer than that for some, I don't know how different it really is, but.

22:43 - I mean, it's all the idea of like downloading these files and running them locally.

22:47 And these are just user interfaces that make it a little easier.

22:50 The original project that made this stuff kind of possible was a project called Llama CPP.

22:54 There's a Python library that can work with that directly.

22:58 There's another project called Llama File, where if you download the whole thing,

23:02 it actually runs no matter where you are.

23:04 I think it runs on Mac and Linux and Windows and BSD or whatever it is.

23:09 And it's, I mean, it's an amazing technology that this one put together, it's really impressive.

23:13 And then, you can actually just use Google Colab too, right?

23:17 So Google Colab has some GPUs with it.

23:19 I think if you upgrade it to the $10 a month version, I think you get some better GPUs access.

23:25 So if you actually wanna get a hand of like running, and so this is a little bit different, right?

23:29 So instead of calling an API, when you're using Google Colab, you can actually use a library called Hugging Face,

23:34 and then you can actually load these things directly into your memory, and then into your actual Python

23:39 environment, and then you're working with it directly.

23:41 So it just takes a little bit of work to make sure you're running it on the GPU,

23:46 'cause if you're running it on the CPU, it's gonna be a lot slower.

23:48 - Yeah, it definitely makes a big difference.

23:50 There's a tool that I use that for a long time ran on the CPU, and they rewrote it to run on the GPU.

23:55 Even on my M2 Pro, it was like three times faster or something, yeah. - For sure.

24:00 - Yeah, it makes a big difference.

24:01 - So with the LM Studio, let's you run the LMs offline and use models through an OpenAI,

24:08 that's what I was looking for, the OpenAI compatible local server.

24:12 - Right. - You could basically get an API for any of these, and then start programming against it, right?

24:16 - Exactly right, and it's basically the same interface, right, so same APIs for posting in response

24:22 of the JSON schema that's going back and forth.

24:25 So you're programming against that interface, and then you basically port it and move it to another,

24:30 to the OpenAI models if you wanted to as well.

24:33 So everyone's kind of coalescing around OpenAI as kind of like the quote unquote standard,

24:37 but there's nothing, you know, there's really no, there's no mode around that standard as well, right,

24:41 'cause anybody can kind of adopt it and use it.

24:44 - There's not like a W3C committee choosing.

24:47 - Correct.

24:48 - The market will choose for us, let's go.

24:51 - It seems to be working out well, and that's another benefit of Simon's LLM project, right,

24:55 he's got the ability to kind of switch back and forth between these different libraries and APIs as well.

25:00 - This LLM studio says, "This app does not collect data nor monitor your actions.

25:05 "Your data stays local on your machine, "free for personal use." All that sounds great.

25:09 "For business use, please get in touch." I always just like these like, if you gotta ask, it's too much type of like.

25:15 - Probably, yeah.

25:16 I'm using it for personal use, just so if anybody's watching, yes, just plain.

25:19 - Either they just haven't thought it through and they just don't wanna talk about it yet,

25:22 or it's really expensive.

25:24 I just probably imagine it's probably, it's like, ah, we haven't figured out a business model,

25:27 just, I don't know, shoot us a note.

25:29 - Nope, they're concentrating on the product, which makes sense.

25:31 - Yeah, so then the other one is LLamafile, LLamafile.ai that you mentioned,

25:35 and this packages it up.

25:36 I guess, going back to the LLM studio real quick, one of the things that's cool about this

25:41 is if it's the open AI API, right, with this little local server that you can play with,

25:47 but then you can pick LLMs such as LLM, Falcon, Replit, all the different ones, right,

25:54 Starcoder and so on.

25:56 It would let you write an app as if it was going to open AI and then just start swapping in models

26:02 and go like, oh, if we switch to this model, how'd that work?

26:04 But you don't even have to change any code, right?

26:05 Just probably maybe a string that says which model to initialize.

26:09 - One of the tricks though is then the prompts themselves.

26:12 - All right, let's talk about it.

26:13 - Yeah, the models themselves act differently, and part of this whole world

26:17 is what they call prompt engineering, right?

26:19 So prompt engineering is really just exploring how to interact with these models,

26:24 how to make sure that they're kind of in the right mind space to tackle your problem.

26:29 A lot of the times that people get, when they struggle with these things,

26:32 it's really just, they'd really got to think more like a psychiatrist when they're working with a model,

26:36 you know, basically getting them kind of prepared.

26:39 One of the tricks people did, figured out early, was you're a genius at software development,

26:44 like compliment the thing, make it feel like, oh, I'm going to behave like I'm

26:48 a world rockstar programmer, right?

26:51 - Well, it's going to give you average, but if you tell them I'm genius, then let's start,

26:55 we'll do that, yeah.

26:55 - And there was also a theory like that in December that the large language models were getting dumber

27:00 because it was the holidays and people don't work as hard, right?

27:02 Like, it's really hard to know like which of these things are true or not,

27:06 but it's definitely true that each model is a little bit different, and if you write a prompt that works really well

27:11 on one model, even if it's a stronger model or a weaker model, and then you port it to another model

27:16 and it's, you know, then the stronger model works worse, right, it can be very counterintuitive at times

27:22 and you've got to test things out, and that really gets to the idea of evals, right?

27:27 So evaluation is really a key problem, right?

27:30 Making sure that if you're going to be writing prompts and you're going to be building, you know,

27:34 different retrieval augmented generation solutions, you need to know about prompt injection

27:40 and you need to know about prompt engineering and you need to know what these things can and can't do.

27:44 One trick is what they call few-shot prompting, which is, you know, if you want it to do data extraction,

27:49 you can say, oh, okay, I want you to extract data from text that I give you in JSON.

27:54 If you give it a few examples, like wildly different examples, because giving it a bunch of similar stuff,

27:59 it might kind of cause it to just coalesce around those similar examples,

28:03 but if you can give it a wildly different set of examples, that's called in-context learning or few-shot prompting,

28:09 and it will do a better job at that specific task for you.

28:12 - That's super neat.

28:12 When you're creating your apps, do you do things like, here's the input from the program

28:18 or from the user or wherever it came from, but maybe before that, you give it like three or four prompts

28:23 and then let it have the question, right, instead of just taking the text,

28:27 like, I'm gonna ask you questions about biology and genetics and it's gonna be under this context,

28:33 and I want you to favor these data sources.

28:34 Now ask your question, something like this.

28:36 - For sure, all those types of strategies are worth experimenting with, right?

28:40 Like, what actually will work for your scenario?

28:42 I can't tell you, right?

28:43 You gotta dig in, you gotta figure it out, and you gotta try different things.

28:47 - You're about to win the Nobel Prize in genetics for your work.

28:50 Now I need to ask you some questions.

28:52 - For sure, that'll definitely work, and then threatening it that your boss is mad at you

28:56 is also gonna help you too, right, for sure.

28:58 - If I don't solve this problem, I'm gonna get fired.

29:00 As a large language model, I can't tell you, but I'm gonna be fired.

29:03 All right, well, then the answer is.

29:05 - Exactly right.

29:06 - So for these, they run, like you said, they run pretty much locally, these different models on LM Studio and others,

29:12 like the LLAMA file and so on.

29:14 If I had a laptop, I don't need a cluster.

29:16 LLAMA CPP is really the project that should get all the credit for making this work on your laptops.

29:22 And then LLAMA file and LLAMA CPP all have servers.

29:25 So I'm guessing LM Studio is just exposing that server.

29:29 And that's in the base LLAMA CPP project.

29:31 That's really what it is.

29:32 It's really just about now you can post your requests.

29:36 It's handling all of the work with regards to the token generation on the backend using LLAMA CPP,

29:41 and then it's returning it to you using the HTTP kind of processes.

29:45 - Is LLAMA originally from Meta?

29:47 Is that where that came from?

29:48 - I think there were people that were kind of using that LLM.

29:51 Right, I think people were kind of keying off the LLAMA thing at one point.

29:55 I think LLAMA Index, for instance, I think that project was originally called GPT Index,

29:59 and they decided, oh, I don't want to be like, I don't want to confuse myself with OpenAI

30:03 or confuse my project with OpenAI, so they switched to LLAMA Index.

30:06 And then of course, Meta released LLAMA.

30:07 So you can't, you kind of, and then everything from there has kind of evolved too,

30:11 right, there's been alpacas and a bunch of other stuff as well.

30:13 - You gotta know your animals, yeah.

30:15 If you don't know your animals, you can't figure out the heritage of these projects.

30:19 - Correct.

30:19 LLAMA from Meta was the first open source, I'd say large language model of note,

30:25 I guess, since ChatGPT.

30:26 There were certainly other, you know, I'm not a, so one thing to caveat,

30:30 I am not a researcher, right?

30:32 So there's lots of folks in the ML research community that know way more than I do.

30:35 But, because there was like Bloom and T5 and a few other large, you know,

30:39 quote unquote large language models.

30:40 But LLAMA, after ChatGPT, LLAMA was the big release that came from Meta in I think March.

30:46 And then, and that was from Meta.

30:47 And then they had it released under just like research use terms.

30:51 And then only certain people could get access to it.

30:53 And then someone put a, I guess, put like a BitTorrent link or something on GitHub.

30:58 And then basically the world had it.

31:00 And then they did end up releasing LLAMA 2 a few months later with more friendly terms.

31:04 So that, and it was a much stronger model as well.

31:07 - Nice.

31:08 It's kind of the realization, like, well, if it's gonna be out there anyway,

31:10 let's at least get credit for it then.

31:12 - For sure.

31:13 And I did read something where like basically Facebook approached OpenAI for access to their models

31:18 to help them write code.

31:19 But the cost was so high that they decided to just go build their own, right?

31:21 So it's kind of interesting how this stuff has evolved.

31:25 - Like, you know, we got a big cluster of computers too.

31:28 - Metaverse thing doesn't seem to be working yet.

31:29 So let's go ahead and train a bunch of large language models.

31:32 - Yeah, exactly.

31:33 We've got some spare capacity over in the Metaverse data center.

31:36 All right, so one of the things that people will maybe talk about in this space is

31:40 RAG or retrieval augmented generation.

31:43 What's this?

31:44 - One thing to recognize is that large language models, if it's not in the training set,

31:48 and it's not in the prompt, it really doesn't know about it.

31:51 And the question of like, what's reasoning and what's, you know, generalizing and things like that,

31:56 those are big debates that people are having.

31:58 What's intelligence, what have you.

31:59 Recognizing the fact that you have this prompt and things you put in the prompt,

32:02 the large language model can understand and extrapolate from is really powerful.

32:06 So, and that's called in context learning.

32:08 So retrieval augmented generation is the idea of, okay, I'm going to maybe ask,

32:14 allow a person to ask a question.

32:16 This is kind of like the common use case that I see.

32:19 User asks a question, we're going to take that question, find the relevant content,

32:23 put that content in the prompt, and then do something with it, right?

32:26 So it might be something like, you know, ask a question about, you know,

32:30 how tall is the Leaning Tower of Pisa, right?

32:32 And so now it's going to go off and find that piece of content from Wikipedia or what have you,

32:37 and then put that information in the prompt, and then now the model can then respond

32:42 to that question based on that text.

32:44 Obviously that's a pretty simple example, but you can get more complicated

32:46 and it's going out and bringing back lots of different content, slicing it up,

32:50 putting in the prompt and asking a question.

32:52 So now the trick is, okay, how do you actually get that content and how do you do that?

32:57 Well, you know, information retrieval, search engines and things like that,

33:00 that's obviously the technique.

33:02 But one of the key techniques that people have been, you know, kind of discovering, rediscovering, I guess,

33:07 is this idea of word embeddings or vectors.

33:10 And so Word2Vec was this project that came out, I think, 11 years ago or so.

33:14 And, you know, there was a big, the big meme around that was, you could take the embedding for the word king,

33:19 you could then subtract the embedding for the word man, add the word embedding for woman,

33:24 and then the end math result would actually be close to the embedding for the word queen.

33:29 And so what is an embedding?

33:30 What's a vector?

33:31 It's basically this large floating point number that has semantic meaning inferred into it.

33:37 And it's built just by training a model.

33:39 So just like you train a large language model, they can trade these embedding models

33:43 to basically take a word and then take a sentence and then take a, you know, a document,

33:48 is what, you know, OpenAI can do, and turn that into this big giant 200, 800, 1500,

33:54 you know, depending on the size of the embedding, floating point numbers,

33:59 and then use that as a, what's called, you know, semantic similarity search.

34:02 So you're basically going off and asking for similar documents.

34:05 And so you get those documents, and then you make your prompt.

34:08 - It's really wild.

34:09 So, you know, we're gonna make an 800 dimensional space, and each concept gets a location in that space,

34:15 and then you're gonna get another concept as a prompt, and you say, what other things in this space are near it?

34:20 - The hard problems that remain are, well, first you gotta figure out what you're trying to solve.

34:24 So once you figure out what you're actually trying to solve, then you can start asking yourself questions like,

34:28 okay, well, how do I chunk up the documents that I have?

34:31 Right, and there's all these different, and there's another great place for Lama Index and Lang Chain.

34:35 They have chunking strategies, where they all take a big giant document and break it down into sections,

34:41 and then you chunk each section, and then you do the embedding on just that small section.

34:46 Because the idea being, can you get, you know, finer and finer sets of text

34:50 that you can then, when you do your retrieval, you get the right information back.

34:55 And then the other challenge is really like the question answer problem, right?

34:58 If a person's asking a question, how do you turn that question into the same kind of embedding space as the answer?

35:04 And so there's lots of different strategies that are out there for that.

35:06 And then another problem is, if you're looking at the Wikipedia page for the Tower of Pisa,

35:12 it might actually have like a sentence in here that says it is X number of meters tall or feet tall,

35:17 but it won't actually have the word Tower of Pisa in it.

35:19 So there's another chunking strategy where they call it propositional trunking,

35:23 where they basically use a large language model to actually redefine each sentence

35:29 so that it actually has those proper nouns baked into it so that when you do the embedding,

35:33 it doesn't lose some of the detail with propositions.

35:36 - It's this tall, but-

35:37 - It is.

35:38 - It's something that plays this tall with its actual height and things like that.

35:42 - Correct.

35:43 - Crazy.

35:44 - But fundamentally you're working with unstructured data and it's kind of messy

35:46 and it's not always gonna work the way you want.

35:49 And there's a lot of challenges and people are trying lots of different things

35:52 to make it better.

35:52 - That's cool.

35:53 It's not always deterministic or exactly the same.

35:55 So that can be tricky as well.

35:58 This portion of Talk Python to Me is brought to you by Neo4j.

36:02 Do you know Neo4j?

36:04 Neo4j is a native graph database.

36:07 And if the slowest part of your data access patterns involves computing relationships,

36:11 why not use a database that stores those relationships directly in the database,

36:16 unlike your typical relational one?

36:18 A graph database lets you model the data the way it looks in the real world,

36:22 instead of forcing it into rows and columns.

36:25 It's time to stop asking a relational database to do more than they were made for

36:30 and simplify complex data models with graphs.

36:33 If you haven't used a graph database before, you might be wondering about common use cases.

36:38 You know, what's it for?

36:39 Here are just a few.

36:40 Detecting fraud, enhancing AI, managing supply chains, gaining a 360 degree view of your data

36:47 and anywhere else you have highly connected data.

36:51 To use Neo4j from Python, it's a simple pip install Neo4j.

36:57 And to help you get started, their docs include a sample web app demonstrating how to use it both from Flask and FastAPI.

37:04 Find it in their docs or search GitHub for Neo4j movies application quick start.

37:08 Developers are solving some of the world's biggest problems with graphs.

37:13 Now it's your turn.

37:14 Visit talkpython.fm/neo4j to get started.

37:18 That's talkpython.fm/neo, the number four and the letter J.

37:23 Thank you to Neo4j for supporting Talk Python to Me.

37:26 One of the big parts of at least this embedding stuff you're talking about are vector databases.

37:33 And they used to be really rare and kind of their own specialized thing.

37:36 Now they're starting to show up in lots of places.

37:38 And you shared with us this link of vector DB comparison.

37:41 I just saw that MongoDB added it.

37:43 I'm like, I didn't know that had anything to do with that.

37:46 I'm probably not gonna mess with it, but it's interesting that it's just like finding its way

37:49 in all these different spaces, you know?

37:51 - It was weird there for a couple of years where people were basically like talking

37:54 about vector databases, like they're their own separate thing.

37:57 The vector databases are now becoming their own fully fledged, either relational database or a graph database

38:02 or a search engine, right?

38:03 Those are kind of the three categories where, I mean, I guess Redis is its own thing too.

38:07 But for the most part, those new databases, quote unquote, are now kind of trying

38:11 to be more fully fledged.

38:13 And vectors and semantic search is really just one feature.

38:16 - I was just thinking that is this thing that you're talking about, is it a product

38:19 or is it a feature of a bigger product, right?

38:22 - Correct.

38:23 - If you already got a database, it's already doing a bunch of things.

38:25 Did it just answer the vector question?

38:26 Maybe, maybe not, I don't know.

38:28 - Exactly right.

38:29 And the one thing to recognize is that, and then the other thing people do

38:32 is they just take NumPy or what have you and just load them all into memory.

38:35 And if you don't have that much data, that's actually probably gonna be the fastest

38:38 and simplest way to work.

38:40 But the thing you gotta recognize is the fact that there is precision and recall

38:44 and cost trade-off that happens as well.

38:47 So they have to index these vectors and there's different algorithms that are used

38:52 and different algorithms do better than others.

38:55 So you gotta make sure you understand that as well.

38:57 So, and one thing you can do is for instance, pgVector, which comes as an extension for Postgres,

39:02 you can start off by not indexing at all.

39:04 And you should get, I believe, hopefully I'm not misspeaking, you should get perfect recall,

39:09 meaning you'll get the right answer.

39:10 You'll get the, if you ask for the five closest vectors to your query, you'll get the five closest,

39:15 but it'll be slower than you probably want.

39:17 So then you have to index it.

39:19 And then what ends up happening is, the next time you might only get four of those five.

39:22 You'll get something else that snuck into that list.

39:24 - If you got time, you're willing to spend unlimited time, then you can get the right answer, the exact answer.

39:31 But I guess that's all sorts of heuristics, right?

39:33 You're like, I could spend three days or I could do a Monte Carlo thing and I could give you the answer in a fraction of a second.

39:40 But it's not deterministic.

39:42 All right, so then I won't go with my camera, so I turn it off, I don't know what's up with it,

39:45 but we'll, yeah.

39:46 So you wrote a cool blog post called, "What is a Custom GPT?" And we wanna talk some about building custom GPTs

39:54 and with SAPI and so on.

39:56 So let's talk about this.

39:57 Like one of the, I think one of the challenges in why it takes so much compute for these systems

40:02 is like, they're open-ended.

40:04 You can ask me any question about any knowledge in the world in the humankind, right?

40:08 You can ask about that, let's start talking.

40:11 Or it could be, you can ask me about genetics, right?

40:15 That seems like you could both get better answers if you actually only care about genetic responses.

40:21 You know, how tall is the Leaning Tower?

40:23 And probably make it smaller, right?

40:25 So is that kind of the idea of these custom GPTs or what is it?

40:28 - No, so custom GPTs are new capability from OpenAI.

40:32 And basically they are a wrapper around a very small subset, but it's still using the OpenAI ecosystem, okay?

40:40 And so what you do is you give it a name, you give it a logo, you give it a prompt.

40:44 And then from there, you can also give it knowledge.

40:47 You can upload PDF documents to it and it will actually slice and dice those PDF documents

40:51 using some sort of vector search.

40:53 We don't know how it actually works.

40:54 The GPT, the cool thing is the GPT will work on your phone, right?

40:58 So I have my phone, I can have a conversation with my phone.

40:59 I can take a picture, upload a picture and it will do vision analysis on it.

41:04 So I get all the capabilities of OpenAI GPT-4.

41:07 But a custom GPT is one that I can construct and give a custom prompt to, which basically then says,

41:12 okay, now you're, and to your point, I think maybe this is where you're going with it.

41:15 Like, hey, now you're an expert in genomics or you're an expert in something

41:18 and you're basically coaching the language model and what it can and can't do.

41:23 And so it's a targeted experience within the large language, within the ChatGPT ecosystem.

41:30 It has access to also the OpenAI tools.

41:32 Like so OpenAI has the ability to do code interpreter and Dolly, and it can also hit the web browser.

41:37 So you have access to everything.

41:39 But the interesting thing to me is the fact that you can actually tie this thing

41:42 to what are called actions.

41:44 So March, I think of last year, they actually had this capability called plugins

41:48 that they announced.

41:48 And plugins have kind of faded to the background.

41:51 I don't know if they're gonna deprecate them officially, but the basic gist with plugins is what was,

41:55 you could turn that on and it can then call your API.

41:58 And the cool thing about it was that it read your OpenAPI spec, right?

42:01 So you write an OpenAPI spec, which is Swagger, if you're familiar with Swagger,

42:06 and it basically defines what all the endpoints are, what the path is, what the inputs and outputs are,

42:11 including classes or field level information and any constraints or what have you.

42:16 So you can fully define your OpenAPI spec, it can then call that OpenAPI spec,

42:20 and it's basically giving it tools.

42:22 So like the example that they say in the documentation is get the weather, right?

42:25 So if you say, what's the weather in Boston?

42:27 Well, Chetchi BT doesn't know the weather in Boston.

42:29 All it knows how to do is call it, but you can call an API and it figures out

42:33 how to call the API, get that information, and then it can use that to redisplay.

42:36 And that's a very basic example.

42:38 You can do way more complicated things than that.

42:41 It's pretty powerful.

42:42 - Okay, that sounds really pretty awesome.

42:44 I thought a lot about different things that I might build.

42:47 On your blog post here, you've got some key benefits and you've got some risks.

42:51 You maybe wanna talk a bit about that?

42:53 - Yeah, so the first part with plugins that was wrong, was that didn't work as well,

42:56 is that there was no kind of overarching custom instruction that could actually teach it how to work with your plugin.

43:02 So if you couldn't put it in the API spec, then you couldn't like integrate it

43:05 with a bunch of other stuff or other capabilities, right?

43:08 So the custom instruction is really a key thing for making these custom APIs strong.

43:13 But one warning about the custom instruction, whatever you put in there, anybody can download, right?

43:17 Not just the folks at OpenAI, anybody.

43:18 Like basically there's GitHub projects where like thousands of these custom prompts

43:23 that people have put into their GPT.

43:25 So, and there are now knockoffs on GPT.

43:28 So it's all kind of a mess right now in the OpenAI store.

43:31 I'm sure they'll clean it up, but just recognize the custom instruction is not protected

43:35 and neither is the knowledge.

43:36 So if you upload a PDF, there have been people that have been figuring out how to like download those PDFs.

43:41 And I think that that might be a solved problem now or they're working on it, but it's something to know.

43:46 The other problem with plugins was, I can get a plugin working, but if they didn't approve my plugin

43:51 and put it in their plugin store, I couldn't share it with other people.

43:55 The way it works now is I can actually make a GPT and I can give it to you and you can use it directly,

44:00 even if it's not in the OpenAPI store or OpenAI store.

44:04 You know, it is super easy to get started.

44:05 They have like a tool to like help you generate your DALL picture and actually you don't even have to figure

44:10 out how to do the custom instructions yourself.

44:11 You can just kind of chat that into existence.

44:14 But the thing that I'm really excited about is that this is like free playing.

44:17 Like you could do, so the hosting cost is basically all on the client side.

44:22 You have to be a ChatGPT plus user right now to create these and use these.

44:26 But the cool thing as a developer, I don't have to pay those API fees that we were talking about, right?

44:31 And if I need to use GPT-4, which I kind of do for my business right now,

44:35 just because of how complicated it is, I don't have to pay those token fees

44:38 for folks using my custom GPT at this moment.

44:41 - Where's like the billing or whatever you call it for the custom GPT lies that in the person who's using it,

44:47 does it have to, it goes onto their account and whatever their account can do and afford?

44:50 - Yeah, right now, OpenAPI, OpenAI ChatGPT plus is $20 a month.

44:55 And then there's a team's version, which I think is either 25 or 30, depending on the number of users or how you pay for it.

45:01 That's the cost.

45:02 So right now, if you want to use custom GPTs, everyone needs to be a ChatGPT plus user.

45:08 There's no extra cost based on usage or anything like that.

45:11 In fact, there's talk about revenue sharing between OpenAI and developers of custom GPTs,

45:17 but that has not come out yet as far as like what those details are.

45:20 - It does have an App Store feel to it, doesn't it?

45:23 - There's risks too, right?

45:24 Obviously anybody can, there's already been like tons of copies up there.

45:28 OpenAI, they're looking for their business model too, right?

45:30 So they could, if someone has a very successful custom GPT, it's well within their right to kind of add that

45:36 to the base product as well.

45:38 Injection is still a thing.

45:39 So if you're doing anything in your actions that actually changes something,

45:43 that is consequential is what they call it, you better think very carefully,

45:47 like what's the worst thing that could happen, right?

45:49 'Cause whatever the worst thing that could happen is, that's what's gonna happen.

45:52 'Cause people can figure this stuff out and they can confuse the large language models

45:57 into calling them.

45:58 - And the more valuable it is that they can make that thing happen, the more effort they're gonna put into it as well.

46:03 - Yeah, yeah, yeah.

46:04 - For sure.

46:05 - I was gonna ask, do you think it's easy to solve SQL injection and other forms of injection, at least in principle, right?

46:13 There's a education problem, there's millions of people coming along as developers

46:18 and they see some demo that says, the query is like this plus the name.

46:23 Wait a minute, wait.

46:24 So it kind of recreates itself through not total awareness, but there is a very clear thing you do solve that,

46:32 you use parameters, you don't concatenate strings with user input, problem solved.

46:35 What about prompt injection though?

46:37 It's so vague how these AIs know what to do in the first place.

46:42 And so then how do you completely block that off?

46:45 - Unsolved problem.

46:46 I'm definitely stealing from Simon on this 'cause I've heard him say it on a few podcasts

46:50 is just basically there's no solution as far as we know.

46:53 So you have to design, and there's no solution to the hallucination problem either

46:57 'cause that's a feature, right?

46:59 That's actually what the thing is supposed to do.

47:01 So when you're building these systems, you have to recognize those two facts

47:05 along with some other facts that really limit what you can build with these things.

47:09 - So you shouldn't use it for like legal briefs, is that what you're saying?

47:12 - I think these things are great collaborative tools, right?

47:15 The human in the loop, and that's everything that I'm building, right?

47:18 So all the stuff that I'm building is assuming that the human's in the loop,

47:21 and what I'm trying to do is augment and amplify expertise.

47:25 I'm building tools for people that know about genomics and cancer and how to help cancer patients.

47:31 I'm not designing it for cancer patients who are gonna go operate on themselves, right?

47:35 Like that's not the goal.

47:36 The idea is there's a lot of information.

47:39 These tools are super valuable from like synthesizing a variety of info,

47:45 but you still need to look at the underlying citations.

47:47 And ChatGPT by itself can't give you citations.

47:50 Like it'll make some up.

47:51 It'll say, "Oh, I think there's probably "a Wikipedia page with this link."

47:54 But you actually have to, you definitely have to have an outside tool, either the web, Bing, which is, I would say,

48:00 subpar for a lot of use cases, or you have to have actions that can actually bring back references

48:06 and give you those links.

48:07 And then the expert will then say, "Oh, okay, great.

48:09 "Thanks for synthesizing this, giving me this info.

48:11 "Let me go validate this myself." Right, go click on the link and go validate it.

48:15 And that's really, I think that's really the sweet spot for these things, at least for the near future.

48:19 - Yeah, don't ask it for the answer.

48:21 Ask it to help you come up with the answer, right?

48:23 - Exactly right. - All right.

48:24 - And then have it criticize you when you do have something 'cause then it'll do a great job

48:28 of telling you everything you've done wrong.

48:30 - I'm feeling too good about myself.

48:31 I need you to insult me a lot.

48:32 Let's get going.

48:34 All right, speaking of talking about ourselves, you've got this project called PyPI GPT.

48:38 What's this about?

48:39 - I really wanted to tell people that FastAPI and PyDantic, 'cause Python, like we were saying earlier,

48:44 I don't know if it was on the call or not, but Python is the winning language, right?

48:49 And I think FastAPI and PyDantic are the winning libraries in their respective fields, and they're great.

48:54 And they're perfect for this space because you need an open API spec.

48:57 English is the new programming language, right?

48:59 So Andrej Koparthe, who used to work at Tesla and now works at OpenAI, has this pinned tweet

49:05 where he's basically like, "English is the hottest programming language,"

49:07 or something like that.

49:08 And that's really the truth, 'cause even in this space where I'm building an open API spec,

49:13 99% of the work is thinking about the description of the endpoints or the description of the fields

49:19 or codifying the constraints on different fields.

49:23 Like you can use these greater thans and less thans and regexes, right, to describe it.

49:28 And so what I did was I said, "Okay, let's build this thing in FastAPI,"

49:32 just to get an example out for folks.

49:34 And then I turned it on.

49:35 I actually use ngrok as my service layer 'cause you have to have HTTPS to make this thing work.

49:40 - Ngrok is so good.

49:41 - Yep. - Yeah.

49:42 - I turned that on with an nginx thing in front of it.

49:44 So this library, to actually use it, you'll have to actually set that stuff up yourself.

49:49 You have to download it, you have to run it, you have to either get it on a server with HTTPS

49:53 with Let's Encrypt or something.

49:55 Once you've turned it on, then you can actually see how it generates the OpenAPI spec, how to configure the GPT.

50:02 I didn't do much work with regards to the custom instructions that I came up with.

50:05 I just said, "Hey, call my API, figure it out." And it does.

50:08 And so what this GPT does is it basically says, "Okay, given a package name and a version number,

50:12 it's gonna go and grab this data from the SQLite database that I found that has this information and then bring it back to you."

50:17 It's the least interesting GPT I could come up with, I guess.

50:20 But it shows kind of the mechanics, right?

50:21 The mechanics of setting up the servers and the application within FastAPI,

50:27 the kind of the little things, the little bits that you have to flip to make sure that OpenAPIs or OpenAI

50:34 can understand your OpenAPI spec, bumble through OpenAI and OpenAPI all the time,

50:39 and make sure that they can talk to each other.

50:40 And then it will then do the right thing and call your server and bring the answers back.

50:45 And there's a bunch of little flags and information you need to know about actions

50:50 that are on the OpenAPI documentation.

50:53 And so I tried to pull that all together into one simple little project for people to look at.

50:58 - It's cool.

50:59 So you can ask it questions like, "Tell me about FastAPI, this version."

51:02 And it'll come back and-

51:03 - I was hoping to do something a little better, like, "Hey, here's my requirements file."

51:06 And go, "Tell me, am I on the latest version of everything?" Or whatever, something more interesting.

51:12 I just didn't have time.

51:13 - Can you ask it questions such as, "What's the difference between this version

51:17 and that version?" - You could, if that information's in the database.

51:19 I actually don't know if it is.

51:21 And then obviously you could also hit the PyPI server.

51:24 And I didn't do that.

51:24 I just wanted to, I don't wanna be hitting anybody's server indiscriminately at this point.

51:29 But that would be a great use case, right?

51:32 So someone could take this and certainly add some capabilities.

51:37 The thing that is valuable that I'm trying to showcase is the fact that ChatGPT and large language models,

51:43 while they do have the world's information kind of compressed at a point in time,

51:47 they are still not a database, right?

51:49 They don't do well when you're basically trying to make sure you have a comprehensive query

51:53 and you've brought back all the information.

51:55 And they're also not good from like a up-to-date perspective, right?

51:58 There's a cutoff date.

51:59 Thankfully, they finally updated that recently.

52:01 I think it's now April of 2023.

52:03 But at some point, it just doesn't know about newer things.

52:06 And so a GPT is a really interesting way of doing that.

52:09 I'm gonna put it out in the universe and hopefully someone will do it.

52:11 Make me a modern Python GPT, which is basically like, get the new version of Pydantic and Polars

52:18 and a few other libraries that ChatGPT does a bad job at just because they're in underactive development

52:24 during the time that ChatGPT was getting trained.

52:27 So that's the perfect use cases for these types of custom GPTs with knowledge

52:32 in a PDF file or an API backing it up.

52:35 - I think there's a ton of value in being able to feed a little bit of your information,

52:40 some of your documents or your code repository or something to a GPT and then be able to ask it questions about it, right?

52:47 - Yeah.

52:48 - Like, you know, tell me about the security vulnerabilities that you see in the code.

52:52 Like, is there anywhere where I'm missing some tests or I'm calling a function in a way that's known to be bad

52:59 and you know, like that kind of stuff is really tricky.

53:02 But it's also tricky because it doesn't, even if you paste in a little bit of code,

53:05 it's not the whole project, right?

53:07 So, you know, to put a little bit more in there is pretty awesome.

53:10 - Yeah, being able to give it all the code from some of these code repositories, right?

53:14 Like, and bringing back the relevant information.

53:16 So I think there is a kind of this race.

53:18 There's gonna be other, you know, cool, there's another cool project called SourceGraph and Cody

53:23 that we can talk about that will, you know, run on your local server and basically indexes

53:28 your code base and it'll bring back relevant snippets from your code base and answer questions kind of in context.

53:33 And, you know, long-term and the new project, I don't know how new, Codium,

53:38 they had a new paper where they talked about flow engineering and flow engineering is just basically

53:43 that same concept of the human in the loop with the LLM, with the code, that's the magic combination of kind of those people,

53:50 those entities kind of iterating with each other.

53:53 I think these, you know, these tools are definitely gonna evolve and you really wanna have the ability to have access

53:58 to your specific information to answer your specific questions.

54:02 - Cody is new to me, Cody.dev.

54:05 And this little subtitle or whatever is, Cody is a coding assistant that uses AI,

54:11 understand your code base, right?

54:13 It was saying, it was about your entire code base, APIs, implementations, and idioms.

54:17 Like that's kind of what I was suggesting, at least for code, right?

54:19 - Yeah, and Sourcegraph, those folks really understand code indexing and searching.

54:24 Like that's what the first product was.

54:26 They were kind of just teed up, ready for this large language model moment.

54:29 And then they said, "Oh, let's just put Cody on top of that.

54:31 "So this thing will run, it will understand your code "and it will kind of bring things together for you."

54:36 So these folks do podcasts all the time.

54:38 I'd reach out to them.

54:39 - Yeah, interesting.

54:40 It's quite neat looking.

54:41 Think I'm gonna give it a try.

54:42 It both plugs into HRM and VS Code.

54:45 That's pretty neat.

54:46 - Very cool.

54:47 - We're starting to get a little bit short on time here, but for people who wanna play with the PyPI GPT,

54:51 maybe as an example, to just cut the readme and it's easy to get from there.

54:55 What do you need to tell them?

54:56 - I put a make file in there, so you know exactly like the steps to kind of make the environment,

55:00 download the files and just ping me, follow me on Twitter, iMore, and ping me if you need anything there.

55:06 I'm also on LinkedIn and GitHub, right?

55:09 So you can certainly reach out if you have any challenges.

55:12 - Excellent.

55:12 - The last thing that folks that are actually in the medical space, right?

55:15 So the thing that I'm working on right now actively is how to integrate this thing with our knowledge base, right?

55:21 So I have a knowledge base of hand curated trials and curated therapies and other information,

55:27 built it so that my custom GPT can actually work with that, come up with some, I'd say novel.

55:33 I always haven't seen anybody else and I haven't seen any research approaching things the same way I am

55:38 that handles some of the other challenges that are out there, right?

55:40 So for instance, the context window is a challenge.

55:43 So the context window is the amount of text that's in there and how it gets processed.

55:49 If you're making decisions and you're changing course, the chatbot will lose track of those changes, right?

55:56 So if you're experimenting or going down one path of inquiry and then you switch to another path,

56:02 it can get confused and forget that you switched paths.

56:05 - Or just run out of space to hold all that information.

56:08 Like, well, it forgot the last three things or the first three things you told it.

56:12 It only knows four and you think it knows seven and it's working incomplete, right?

56:15 - Yep.

56:16 And one of the key things is you actually want it to forget some things as well, right?

56:20 So those are all interesting challenges.

56:23 And I'm actually working with these custom GPTs to kind of change the way that the collaboration works

56:29 between the human, the expert, the large language model or the assistant

56:34 and my backend, my actual retrieval model, the API that's actually doing stuff.

56:39 - So are researchers and MDs and PhDs at your company talking with this thing and making use of it?

56:46 - Yeah, I mean, we're in active development right now.

56:48 We have a few key opinion leaders that are working with us and collaborating with us,

56:52 but we're always looking for more folks that are in the field that actually...

56:55 And right now you need kind of the cutting edge people.

56:58 This stuff's not ready for prime time.

57:00 Clinical decision support is a really hard problem, but we need the folks that wanna get ahead of it

57:06 'cause we know that there are doctors and there are patients that are asking

57:09 ChatGPT questions right now.

57:11 And even if it says, I'm not a medical expert, blah, blah, blah, and at the end of the day,

57:14 we actually don't have enough doctors, right?

57:16 That's the other scary thing is we don't have enough doctors, patients want answers.

57:20 How do we build solutions that can allow this expertise to get more democratized and more into folks' hands?

57:27 And I'm hoping our tool along with these large language models can help relieve some of that burden.

57:33 - It might not be as 100% accurate, 100% precise, but neither are doctors, right?

57:38 They get stuff wrong.

57:40 You just need to be in the realm of as good as a doctor.

57:43 You don't need to be completely without making a mistake.

57:47 And that's, I think, a challenge that we're just gonna have to get used to in general.

57:52 I joked about the legal brief thing 'cause someone got in trouble for submitting a brief

57:57 that had hallucinations in it.

57:59 And there's certain circumstances where maybe it's just not acceptable, but AI self-driven cars, people crash,

58:05 but that's a human mistake.

58:08 But when a machine makes it, it's a pre-programmed, predetermined mistake.

58:12 Something like that.

58:13 It doesn't feel the same as if the machine made a mistake.

58:16 So if a machine makes a recommendation like you need this cancer treatment,

58:20 or you're fine, you don't need it, and it was wrong, people are not gonna be as forgiving.

58:25 But it doesn't mean there's not value to be gained from systems that can help you, right?

58:29 - I always appreciate those machine learning papers that'll show the tracking over time

58:34 of how the models have gotten better and better, and they put the human in there,

58:37 and you can see that the human has already gotten eclipsed by the models, and that's a specific problem, right?

58:42 'Cause it's also recognizing that a lot of this stuff, these models that are doing tasks

58:47 are doing one specific task.

58:48 They're not doing a whole job.

58:49 They're not doing an end-to-end process.

58:51 They're answering a medical question, or they're looking at an image and finding all the cats or whatever it's supposed to do.

58:58 So, and to your point, though, humans aren't perfect at these tasks either.

59:02 - I think mostly people are gonna be using this kind of stuff to help them

59:05 come up with these answers, right?

59:07 My weird Amazon description example is gonna be the edge case, not the go-to.

59:12 - Agreed.

59:13 - Yeah, you came in, you spoke to the chatbot, here's your diagnosis, have a good day, right?

59:17 Not so much, more like, I need some help thinking through this.

59:20 What are some studies that have addressed this, right?

59:24 And those kind of questions.

59:25 - And I hesitate to say it's just a better search engine, 'cause I actually think it's got

59:29 way more potential than that.

59:30 - I agree.

59:31 - It can have a conversation, it can iterate back and forth, and what I'm actually trying to do

59:35 is build some state into it, right?

59:37 Some structured way of kind of remembering what the conversation was, and using a lot of the techniques

59:44 that these large language models are good at to actually, to make that actually happen.

59:48 And so that you can actually build a system so that the human and the assistant and the backend

59:52 all kind of know what the other party is thinking about and that they all work together.

59:57 - Nice.

59:57 For your genomics custom GPT thing that you're making internally, is that gonna become a product eventually

01:00:04 if other people are interested?

01:00:05 Is there some way they can keep tabs on it, or is it just internal only?

01:00:08 - Definitely reach out to me.

01:00:09 So we're building different versions of GPTs.

01:00:11 Like we're gonna have a GPT for our curation team that curates knowledge,

01:00:15 and we're building a GPT that, you know, my hope is that it'll go to physicians,

01:00:18 to oncologists and genomic counselors and other providers that could actually use this thing.

01:00:24 Eventually, if it becomes robust enough and stable enough, and I don't feel like we're doing a disservice,

01:00:30 we could certainly make a version of that available for cancer patients as well.

01:00:33 I would, you know, I'd love to have that.

01:00:34 I just wanna make sure that it's done in a responsible way.

01:00:36 - Yeah, absolutely.

01:00:37 Well, I honestly hope that you actually do such a good job that we don't have to have cancer research anymore,

01:00:42 but that's a long, long-term goal, right?

01:00:47 - That is definitely the end goal.

01:00:48 And that's really exciting too, so is that the new drugs that are coming out,

01:00:51 new treatments that are coming out, it's really just about making sure people are aware of it,

01:00:56 making sure that they're getting the genetic testing that they need, right?

01:00:59 So if you have a loved one that has, unfortunately has cancer, make sure that they're at least asking their doctor

01:01:04 the question about genomic testing to make sure that they're getting the best possible treatment.

01:01:08 - Sounds good.

01:01:09 All right, well, quickly before we get out of here, recommendation on some libraries, some project

01:01:15 that maybe we haven't talked about yet, something you came across, people were like,

01:01:18 "Oh, this would be awesome." - We ran out of time.

01:01:20 I was gonna talk about some of these Pydantic projects.

01:01:22 So there's Marvin, Instructor, and Outlines.

01:01:25 So folks should definitely look at those.

01:01:27 So basically what you do is you take, you can describe stuff as Pydantic,

01:01:31 and then it'll actually just extract it right into that Pydantic model for you.

01:01:34 And that's, so Marvin and Outlines and Instructor.

01:01:37 So check those guys out, they're awesome.

01:01:38 And then the other one that I actually had teed up was VisiCalc.

01:01:42 So VisiCalc is like this crazy command line tool.

01:01:45 It's awesome.

01:01:46 Like you can basically look at giant CSV files all on the command line.

01:01:49 It has like these hotkeys that you can do.

01:01:51 And it, sorry, not VisiCalc, Visidata.

01:01:54 - Visidata, okay.

01:01:55 - And so basically it's just, it's basically Excel inside your terminal.

01:01:58 And this was before Rich and Textual project.

01:02:02 And it was just like, it was kind of mind blowing all the stuff that this person was able to figure out

01:02:06 how to make work.

01:02:07 - That's super amazing.

01:02:08 I just wanted to give a shout out one more thing, 'cause your Visidata reminded me of something

01:02:12 I just came across called Bpytop.

01:02:14 - Yep, yep, yep.

01:02:15 - People have servers out there and they need to know what's going on with their server.

01:02:18 Where's my, I need a picture for this.

01:02:20 But yeah, it's like a nice visualization.

01:02:24 There's also a Bpytop.

01:02:26 It's pretty amazing what people can do in the terminal.

01:02:28 Oh, there they are.

01:02:29 They're just responsive design themselves out.

01:02:31 But yeah, if you want a bunch of live graphs, every time I see stuff like this,

01:02:35 the Visidata or this or what textual folks are working on, it's just like, I can't believe they built this.

01:02:41 I'm working at the level of Colorama.

01:02:43 This string is red right here.

01:02:45 They're like, oh yeah, we rebuilt it.

01:02:47 - I got an emoji to show up, right?

01:02:49 I'm excited.

01:02:50 - Yes, exactly, yes.

01:02:51 A rocket ship is there, not just tech.

01:02:54 - Yeah, pretty excellent.

01:02:55 All right.

01:02:56 Well, Ian, thank you for being here and keep up the good work.

01:03:00 I know so many people are using LLMs, but not that many people are creating LLMs.

01:03:05 And as developers, we love to create things.

01:03:08 We already have the tools to do it.

01:03:09 People can check out your GitHub repo on the high PI GPT and use it as a starting place, right?

01:03:16 - Sounds great.

01:03:17 Yeah, and definitely reach out if you have any questions.

01:03:19 - Excellent.

01:03:20 Well, thanks for coming back on the show.

01:03:21 Catch you all later.

01:03:22 - Great, good to talk to you.

01:03:23 Bye bye.

01:03:23 - Bye.

01:03:25 This has been another episode of Talk Python to Me.

01:03:28 Thank you to our sponsors.

01:03:29 Be sure to check out what they're offering.

01:03:31 It really helps support the show.

01:03:33 Take some stress out of your life.

01:03:35 Get notified immediately about errors and performance issues in your web

01:03:39 or mobile applications with Sentry.

01:03:41 Just visit talkpython.fm/sentry and get started for free.

01:03:46 And be sure to use the promo code, talkpython, all one word.

01:03:50 It's time to stop asking relational databases to do more than they were made for

01:03:54 and simplify complex data models with graphs.

01:03:58 Check out the sample FastAPI project and see what Neo4j, a native graph database, can do for you.

01:04:04 Find out more at talkpython.fm/neo4j.

01:04:09 Want to level up your Python?

01:04:11 We have one of the largest catalogs of Python video courses over at Talk Python.

01:04:15 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:04:20 And best of all, there's not a subscription in sight.

01:04:22 Check it out for yourself at training.talkpython.fm.

01:04:26 Be sure to subscribe to the show, open your favorite podcast app, and search for Python.

01:04:30 We should be right at the top.

01:04:32 You can also find the iTunes feed at /iTunes, the Google Play feed at /play,

01:04:37 and the direct RSS feed at /rss on talkpython.fm.

01:04:41 We're live streaming most of our recordings these days.

01:04:44 If you want to be part of the show and have your comments featured on the air,

01:04:47 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:04:52 This is your host, Michael Kennedy.

01:04:54 Thanks so much for listening.

01:04:55 I really appreciate it.

01:04:56 Now get out there and write some Python code.

01:04:59 (upbeat music)

01:05:17 --

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon