#456: Building GPT Actions with FastAPI and Pydantic Transcript
00:00 Do you know what custom GPTs are?
00:02 They're configurable and shareable chat experiences with the name, logo, custom instructions,
00:07 conversation starters, access to open AI tools and custom API actions.
00:13 And you can build them with Python.
00:15 Ian Moyer has been doing just that and is here to share his experience building them.
00:20 This is "Talk Python to Me," episode 456, recorded January 22nd, 2024.
00:26 (upbeat music)
00:31 Welcome to "Talk Python to Me," a weekly podcast on Python.
00:44 This is your host, Michael Kennedy.
00:45 Follow me on Mastodon, where I'm @mkennedy and follow the podcast using @talkpython,
00:50 both on fosstodon.org.
00:53 Keep up with the show and listen to over seven years of past episodes @talkpython.fm.
00:58 We've started streaming most of our episodes live on YouTube.
01:02 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows
01:08 and be part of that episode.
01:10 This episode is sponsored by Sentry.
01:12 Don't let those errors go unnoticed.
01:14 Use Sentry.
01:15 Get started at talkpython.fm/sentry.
01:18 And it's also brought to you by Neo4j.
01:21 It's time to stop asking relational databases to do more than they were made for.
01:25 Check out the sample FastAPI project and see what Neo4j, a native graph database, can do for you.
01:32 Find out more at talkpython.fm/neo4j.
01:36 Ian, welcome to Talk Python to Me.
01:39 - Hey, hey, Michael, good to see you again.
01:41 - Yeah, great to see you again.
01:43 It has been a little while.
01:44 It seems like not so long ago, and yet when I pull up the episode that we did together,
01:50 sure enough, it says March 7th, 2018.
01:55 Wow.
01:55 - The years are short.
01:56 The years are short, they go by really fast.
01:58 - They sure do.
01:59 So back then, we were talking about Python and biology and genomics, and it sounds like you're still doing genetic-type things
02:08 and still doing Python and all that kind of stuff.
02:11 - For sure, yeah, definitely.
02:12 I work for a company called Genome Oncology.
02:14 We do precision oncology software, helping folks make sense of genomics and trying to help cancer patients.
02:20 - That's awesome.
02:21 There's different levels of helping people with software.
02:25 On one level, we probably have ad retargeting.
02:28 On the other, we've got medical benefits and looking for helping people who are suffering socially or whatever.
02:38 So it's gotta feel good to write software that is making a difference in people's lives.
02:43 - That's right.
02:44 I did spend a lot of the 2000s making e-commerce websites, and that wasn't exactly the most fulfilling thing.
02:48 I learned a lot, but it wasn't as exciting as what I'm doing now, or at least as fulfilling as what I'm doing now.
02:53 - What were those earlier websites in Python?
02:55 - That was all Java for the most part.
02:57 And finally with this company, knocked out a prototype in Django a few years ago.
03:03 And my boss at the time was like, "You did that so fast, "you should do some more stuff in Python."
03:08 So that's kind of how it evolved.
03:10 And now basically most of our core backend is Python, and we use a little bit of Svelte for the user interfaces.
03:17 - Beautiful.
03:18 It's easy to forget, like five years ago, 10 years ago, people were questioning whether Python
03:24 should be something you should use.
03:25 Is it a real language?
03:26 Do you really use it?
03:26 Is it safe to use?
03:27 Maybe you should use a Java or a C# or something like that, because this is a real project.
03:33 It's interesting.
03:34 You don't hear that nearly as much anymore, do you?
03:36 - I grew up with Boston sports fans, and it was like being a Boston sports fan was terrible
03:39 for the longest time.
03:40 And now it's like, "Okay, we don't wanna hear about your problems right now."
03:43 And same thing with Python.
03:44 It's like, "I like Python." It's like, "Yeah, great.
03:46 "So does everybody else in the world." So yeah, it's really not the issue anymore.
03:50 Now it's not the cool thing to play with.
03:51 So now you gotta go to Rust or something else.
03:54 - You know what's shiny?
03:55 LLMs are shiny.
03:56 - LLMs are very shiny, for sure.
03:58 - Yeah, we can talk about them today.
03:59 - Yeah, that sounds great.
04:01 Let's do it.
04:01 - First of all, we're gonna talk about building applications that are basically powered by LLMs that you plug into.
04:08 Right? - Yep.
04:09 - Before we get into creating LLMs, just for you, where do LLMs play a role for you
04:16 in software development these days?
04:18 - Sure.
04:19 So, you know, like everybody else, I mean, I had been playing with, so I do natural language processing as part of my job.
04:24 Right?
04:25 So using spaCy was a big part of the information extraction stack that we use,
04:30 'cause we have to deal with a lot of medical data and medical data is just unstructured
04:33 and has to be cleaned up before it can be used.
04:36 That was my exposure.
04:37 I had seen GPTs and the idea of like generating text, just starting from that,
04:42 didn't really make much sense to me at the time.
04:44 But then obviously like everybody else, when chatGPT came out, I was like, "Oh, I get this now."
04:48 Like this thing does, you know, it can basically learn in the context and it can actually produce something that's interesting
04:54 and you can use it for things like information extraction.
04:56 So just like everybody else, I kind of woke up to them, you know, around that time that they got released
05:01 and I use them all the time.
05:02 Right?
05:03 So chatGPT4 is really what I use.
05:05 I would recommend, if you can afford the $20 a month, it's still the best model that there is as of January 2024.
05:11 And I use that for coding.
05:12 I don't really like the coding tools, the copilots, but there, you know,
05:16 there's definitely folks that swear by them.
05:18 So my workflow is more of, I have a problem, work with the chatbot to try to like, you know,
05:22 think through all the edge cases and then, and then think through the test case, the tests.
05:27 And then I think through the code, right?
05:29 And then the actual typing of the code, yeah, I'll have it do a lot of the boilerplate stuff,
05:33 but then kind of shaping the APIs and things like that.
05:35 I kind of like to do that myself still.
05:37 I'm kind of old school, old school.
05:39 - I guess I'm old school as well.
05:40 'Cause I'm like right there with you.
05:42 But for me, I don't generally run copilot or those kinds of things in my editors.
05:48 I do have some features turned on, but primarily it's just really nice autocomplete.
05:54 You know what I mean?
05:55 Like, it seems like it almost just knows what I want to type anyway, and that's getting better.
05:59 I don't know if anyone's noticed recently, one of the recent releases of PyCharm,
06:03 it starts to autocomplete whole lines.
06:06 And I don't know where it's getting this from.
06:07 And I think I have the AI features turned off.
06:10 At least it says I have no license.
06:11 Guessing that means they're turned off.
06:13 So it must be something more built into it.
06:16 That's pretty excellent.
06:17 But for me, I find I'm pretty content to just sit and write code.
06:20 However, the more specific the unknowns are, the more willing I'm like, "Oh, I need to go to ChatGPT for this."
06:28 Like, for example, like, how do you use Pydantic?
06:30 Like, well, I'll probably just go look at a quick code sample and see that so I can understand it.
06:35 But if it's, I have this time string with the date like this, the month like this,
06:41 and then it has the time zone like that.
06:43 How do I parse that?
06:44 Or how do I generate another one like that in Python?
06:47 And here's the answer.
06:48 Or I have this giant weird string, and I want this part of it as extracted with a regular expression.
06:55 And I want a regular expression.
06:56 - Regular expression, I was just gonna say that.
06:56 - Oh my gosh.
06:57 - You don't have to write another one of those.
06:59 Yeah, it's great.
06:59 - Yeah, it's pretty much like, do you need to detect the end of a line straight to ChatGPT?
07:03 Not really, but you know, it's like almost any level of chat, a regular expression.
07:07 I'm like, well, I need some AI for this.
07:09 'Cause this is not time well spent for me.
07:11 But yeah, it's interesting.
07:12 - Yeah, one big tip I would give people though is that these chatbots, they wanna please you.
07:16 So you have to ask it to criticize you.
07:19 You have to say, here's some piece of code, tell me all the ways it's wrong.
07:22 And you have to also ask for lots of different examples because it just starts to get more creative,
07:27 more things that it says.
07:28 It really thinks by talking, which is a really weird thing to consider.
07:31 But it's definitely some things to keep in mind when you're working with these things.
07:35 - And they do have these really weird things.
07:37 Like if you compliment them, or if you ask it, you sort of tell it, like, I really want you to tell me.
07:42 It actually makes a difference, right?
07:43 It's not just like a search engine.
07:45 Like, well, of course, what does it care?
07:46 Just you put these keywords in and they come out.
07:48 Like, no, you've kind of got to like, know how to talk to it just a little bit.
07:51 - I've seen people threatening them, or like saying that someone's being held ransom,
07:55 or, you know, I like to say, my boss is really mad at me.
07:58 Like, help me out here, right?
07:59 And like, see if it'll generate some better code.
08:01 - You're not being a good user.
08:03 You're trying to trick me.
08:04 I've been a good chatbot and you've been a bad user and I'm not gonna help you anymore.
08:08 - Yeah, right.
08:09 So it was actually basically a conversation from being in the early days.
08:12 - Yeah, the Sydney episode.
08:14 Yeah, that was crazy, right?
08:15 Super funny.
08:16 - How funny?
08:17 All right, well, I'm sure a lot of people out there are using AI these days.
08:20 I think I saw a quote from, I think it was from GitHub, saying over 50% of developers are using Copilot.
08:26 - For sure.
08:27 - Which, crazy, but I mean, not that surprising.
08:29 50% of the people are using Autocomplete.
08:31 So I guess it kind of, kind of like that, right?
08:34 - They're great tools.
08:34 They're gonna keep evolving.
08:35 There's some other ones I'm keeping an eye on.
08:36 There's one called Console, which just takes a different approach.
08:39 They use some stronger models.
08:41 And then there's a website called PHIND, P-H-I-N-D, that allows you to do some searching,
08:45 that they've built their own custom model.
08:47 Really interesting companies that are doing some really cool things.
08:49 And then Perplexity is like the search replacement that a lot of folks are very excited about using
08:54 instead of Google.
08:55 So there's a lot of different tools out there.
08:57 You could spend all your day just kind of playing around and learning these things where you got to actually
09:01 kind of get some stuff done too.
09:02 - Yeah, you gotta pick something and go, right?
09:04 Because with all the churn and growth and experimentation you got, you probably could try a new tool every day
09:09 and still not try them all, you know, just be falling farther behind.
09:13 So you gotta pick something and go.
09:15 - And go, yep.
09:16 - Let's talk about writing some code.
09:18 - Yeah, the next thing you're gonna do after you use a chatbot is to hit an API.
09:24 Like if you're gonna program an app and that app is gonna have LLM inside of it,
09:28 large language models inside of it, APIs are pretty much the next step, right?
09:32 So OpenAI has different models that are available.
09:35 This is a webpage that I just saw recently that'll actually compare the different models
09:39 that are out there.
09:40 So there's obviously the big guy, which is OpenAI, and you can get that through Azure as well
09:44 if you have a Microsoft arrangement.
09:46 And there's some security reasons or HIPAA compliance and some other reasons that you might want to talk
09:51 through Azure instead of going directly to OpenAI.
09:54 I defer to your IT department about that.
09:57 Google has Gemini, which they just released the pro version, which I believe is as strong as 3.5, roughly.
10:03 That is interesting because if you don't care about them training on your data,
10:07 if like whatever you're doing is just like not super proprietary or something
10:11 you're trying to keep secret, they're offering free API access, I believe 60 words per minute, right?
10:17 So basically one a second, you can call this thing and there's no charge.
10:21 So I don't know how long that's gonna last.
10:23 So if you have an interesting project that you wanna use in a large language model for,
10:26 you might wanna look at that.
10:27 - Yeah, especially if it's already open data that you're playing with.
10:30 - Exactly, right.
10:31 - Or data you've somehow published to the web that has certainly been consumed by these things.
10:36 And these models are gonna train on it, right?
10:37 That's the trade, right?
10:38 They're trying to get more tokens, is what they call it, right?
10:41 The tokens are what they need to actually make these models smarter.
10:44 So everyone's just hunting for more tokens and I think this is part of their strategy for that.
10:49 And then there's also a CLAUDE by Anthropic.
10:52 And then after that, you get into the, kind of the open source APIs as well.
10:55 - There's some really powerful open source ones out there.
10:58 Yeah, so this website, yeah, this is DocsBot, for people listening, DocsBot.ai.
11:03 Is its sole purpose just to tell you price comparisons and stuff like that?
11:07 Or does it have more than that?
11:08 - I assume this company's got some product, unfortunately I don't know what it is.
11:11 I saw this link that they put out there and it's a calculator.
11:14 So you basically can put your, what tokens, how many tokens, there's input tokens
11:18 and there's output tokens, right?
11:19 So they're gonna charge more on the output tokens.
11:22 That's for the most part.
11:23 Some of the libraries, some of the models are more equal.
11:26 And then they, what they do is, if you can figure out like roughly how big a message
11:30 is gonna be, both the input and the output, how many calls you're gonna make,
11:33 you can use that to then calculate basically the cost.
11:37 And the cost is always at like tokens per thousand, you know, or dollars or pennies really,
11:42 pennies per thousand tokens.
11:43 And then it's just a math equation at that point.
11:45 And what you'll find is calling GPT-4 is gonna be super expensive.
11:49 And then calling, you know, a small seven, what's called a 7B model from Mistral
11:53 is gonna be the cheapest.
11:55 And you're just gonna look for these different providers.
11:57 - Wow, the prices really are different.
11:59 Like for example, opening an Azure GPT-4 is three, a little over three cents per call.
12:05 Whereas GPT-3, five turbo is one 10th of one cent.
12:10 It's a big difference there.
12:13 11 cents versus $3 to have a conversation with it.
12:16 - Yes, it's a very, very wide difference.
12:18 And it's all based on, you know, how much compute do these models take, right?
12:21 'Cause the bigger the model, the more accurate it is.
12:25 But also the more expensive it is for them to run it.
12:27 So that's why there's such a cost difference.
12:31 - This portion of Talk Python to Me is brought to you by Sentry.
12:33 In the last episode, I told you about how we use Sentry to solve a tricky problem.
12:38 This time, I wanna talk about making your front end and backend code work more tightly together.
12:43 If you're having a hard time getting a complete picture of how your app is working and how requests flow
12:49 from the front end JavaScript app, back to your Python services, down into database calls for errors and performance,
12:56 you should definitely check out Sentry's distributed tracing.
12:59 With distributed tracing, you'll be able to track your software's performance,
13:03 measure metrics like throughput and latency, and display the impact of errors across multiple systems.
13:09 Distributed tracing makes Sentry a more complete performance monitoring solution,
13:13 helping you diagnose problems and measure your application's overall health more quickly.
13:19 Tracing in Sentry provides insights such as what occurred for a specific event or issue,
13:24 the conditions that cause bottlenecks or latency issues, and the endpoints and operations that consume the most time.
13:30 Help your front end and backend teams work seamlessly together.
13:34 Check out Sentry's distributed tracing at talkpython.fm/sentry-trace.
13:40 That's talkpython.fm/sentry-trace.
13:43 And when you sign up, please use our code TALKPYTHON, all caps, no spaces,
13:48 to get more features and let them know that you came from us.
13:52 Thank you to Sentry for supporting the show.
13:55 - Yeah, I recently interviewed, just released a while ago, interviewed because of time shifting,
14:00 on podcast, Mark Russinovich, CTO of Azure, and we talked about all the crazy stuff that they're doing
14:06 for coming up with the, just running these computers that handle all of this compute,
14:10 and it's really a lot.
14:12 - There was a GPU shortage for a while.
14:13 I don't know if that's still going on.
14:14 And obviously, you know, the big companies are buying hundreds of thousands
14:18 of these GPUs to get the scale they need.
14:21 - Yeah.
14:22 - And so once you figure out which API you want to use, then you want to talk about the library.
14:26 So now, you know, most of these providers, they have, you know, a Python library that they offer.
14:30 I know OpenAI does, and Google, Gemini does, but there's also open source ones, right?
14:35 'Cause they're not very complicated to talk to.
14:38 It's just basically HTTP requests.
14:41 So it's just really a matter of like, what's the ergonomics you're looking for as a developer
14:44 to interact with these things.
14:46 And most importantly, make sure you're maintaining optionality, right?
14:49 Like, it's great to do a prototype with one of these models, but recognize you might want to switch
14:54 either for cost reasons or performance reasons or what have you.
14:58 And, you know, LangChain, for instance, has a ton of the providers as part of,
15:03 you basically are just switching a few arguments when you're switching between them.
15:08 And then Simon Willison has, you know, a Python fame, has an LLM project where he's defined,
15:14 you know, basically a set of, and it's really clean just the way he's organized it,
15:18 because you can just add plugins as you need them, right?
15:20 So you don't have to install all the different libraries that are out there.
15:23 And I think LangChain is kind of following a similar approach.
15:25 I think they're coming up with a LangChain core capability where you can just kind of bring in things
15:30 as you need them.
15:31 And so the idea is you're now coding against these libraries and you're trying to bring together, you know,
15:37 the text you need to have analyzed or whatever your use case is, and then it'll come back with the generation.
15:43 And you can also not just use them on the cloud, you can use open source ones as well
15:46 and run them locally on your local computer.
15:48 - I'd never really thought about my architectural considerations, I guess, of these sorts of things.
15:54 But of course you want to set up some kind of abstraction layer so you're not completely tied into some provider.
16:02 It could be that it becomes too expensive.
16:03 It could be that it becomes too slow, but it also might just be something that's better.
16:07 It could be something else that comes along that's better and you're like, "Eh, we could switch.
16:11 "It's 25% better." But it's like a week to pull all the details of this one LLM out and put the new ones in,
16:18 and so it's not worth it, right?
16:19 So you like having, being tied to a particular database rather than more general, it's a similar idea.
16:25 - And especially at this moment in time, right?
16:27 Every couple months, something, so something from the bottom up is getting better and better.
16:32 Meaning, you know, LLAMA came out a year ago, and then LLAMA 2 and Mistral and Mixtral,
16:37 and LLAMA 3 is gonna be coming out later this year, we believe.
16:40 And so those models, which are smaller and cheaper and easier to use, or not easier to use, but they're just cheaper,
16:46 is those things are happening all the time.
16:48 So being able to be flexible and nimble and kind of change where you are
16:52 is gonna be crucial, at least for the next couple years.
16:54 - Yeah, the example that I gave was databases, right?
16:56 And databases have been kind of a known commodity since the '80s, or what, 1980s?
17:02 And of course, there's new ones that come along, but they're kind of all the same,
17:05 and you know, we got, there was MySQL, now there's Postgres that people love, and right?
17:11 So that is changing way, way slower than this.
17:14 And people are like, "Well, we gotta think about "those kinds of, like, don't get tied into that."
17:17 Well, it's way less stable.
17:19 - Right, and people, you know, create layers of abstraction there, too, is where you got SQLAlchemy,
17:25 and then, you know, Sebastian from FastAPI has SQLModel.
17:28 That's a layer on top of SQLAlchemy, you know, and then there's also, you know,
17:32 folks that just like writing clean, fancy SQL, and you can, you know, hopefully be able to port that
17:36 from database to database as well.
17:38 So it's the same principles, separation of concerns, so you can kind of be flexible.
17:42 - All right, so you talked about LangChain.
17:44 Just give us a sense, real quick, of what LangChain is.
17:47 - This was a great project from a timing perspective.
17:49 I believe they kind of invented it and released it right around the time ChatGPT came out.
17:53 It's a very comprehensive library with lots of, I mean, the best part about LangChain, to me,
17:58 is the documentation and the code samples, right?
18:00 Because if you want to learn how to interact with a different large language model
18:04 or work with a vector database, there's another library called LamaIndex
18:08 that does a really good job at this as well.
18:09 They have tons and tons of documentation and examples, so you can kind of look at those and try to understand it.
18:15 The chaining part really came from the idea of like, okay, I prompt the large language model, it gives a response,
18:20 now I'm gonna take that response and prompt it, again, with a new prompt using that output.
18:25 The challenge with that is the reliability of these models, right?
18:29 They're not gonna get close, they're not close to 100% accurate on these types of tasks.
18:34 The idea of agents as well is another thing that you might build with a LangChain.
18:38 And the idea there is basically the agent is getting a task, coming up with a plan for that task,
18:44 and then kind of stepping through those tasks to get the job done.
18:48 Once again, we're just not there yet as far as those technologies, just because of the reliability.
18:54 And then there's also a bunch of security concerns that are out there too,
18:57 that you should definitely be aware of.
18:59 Like one term to Google and make sure you understand is prompt injection.
19:03 Right, so Simon, once again, he's got a great blog.
19:05 He's got a great blog article, or just even that tag on his blog is tons of articles around prompt injection.
19:11 And prompt injection is basically the idea, you have an app, a user says something in the app,
19:17 or like types into the, to the, whatever the input is, and whatever texts that they're sending through,
19:22 just like with SQL injection, they kind of hijacks the conversation and causes the large language model
19:26 to kind of do a different thing.
19:28 - Little Bobby Llama, we call him.
19:30 Instead of little Bobby Diggles.
19:31 - And then the other wild one is like, you know, people are putting stuff up on the internet
19:35 so that when the large language model browses for webpages and brings back text, it's, you know, reading the HTML
19:41 or reading the text in the HTML, and it's causing the large language model
19:44 to behave in some unexpected way.
19:46 So there's lots of crazy challenges out there.
19:49 - I'm sure there's a lot of adversarial stuff happening to these things as they're both trying to gather data
19:54 and then trying to run, right?
19:56 I saw the most insane, I guess it was an article, I saw it on RSS somewhere.
20:01 And it was saying that on Amazon, there's all these knockoff brands that are trying to,
20:06 you know, instead of Gucci, you have a Gucci or I don't know, whatever, right?
20:11 And they're getting so lazy.
20:13 I don't know what the right word is, that they're using LLMs to try to write a description
20:17 that is sort of in the style of Gucci, let's say.
20:20 And it'll come back and say, I'm sorry, I'm a large language model.
20:24 I'm not, my rules forbid me from doing brand trademark violation.
20:31 That's what the Amazon listing says on Amazon.
20:33 They just take it and they just straight pump it straight, whatever it says, it just goes straight into Amazon.
20:37 - Yeah, you have to like Google, like, sorry, I'm not, sorry, as a large language model
20:40 or sorry as a whatever.
20:41 Yeah, you can find all. - Exactly.
20:42 And there's like, the product listings are full of that.
20:44 It's amazing.
20:45 It's amazing. - It's crazy.
20:47 - Certainly the reliability of that is, you know, they could probably use some testing
20:50 and those kinds of things.
20:52 - For sure. - Someone out there asks, like, I wonder if for local LLM models,
20:56 there's a similar site as DocSpot that show you like what you need to run it locally.
21:00 So that's an interesting question also, segue to maybe talk about like some local stuff.
21:04 - LLM Studio, this is a new product.
21:06 I honestly haven't had a chance to like really dig in and understand who created this.
21:09 And, you know, make sure that the privacy stuff is up to snuff, but I've played around with it locally.
21:14 It seems to work great.
21:15 It's really slick, really nice user interface.
21:17 So if you're just wanting to get your feet wet and try to understand some of these models,
21:21 I'd download that and check it out.
21:22 There's a ton of models up on Hugging Face.
21:25 This product seems to just basically link right into the Hugging Face interface and grabs models.
21:31 And so some of the models you wanna look for are right now as in January, right?
21:35 There's Mistral 7B, you know, M-I-S-T-R-A-L.
21:39 There's another one called Phi 2.
21:41 Those are two of the smaller models that should run pretty well on, you know,
21:44 like a commercial grade GPU or an M1 or an M2 Mac, if that's what you have and start playing with them.
21:51 And they're quantized, which means they're just kind of made a little, take a little bit less space,
21:56 which is good from like a virtual RAM with regards to these GPUs.
22:00 And, you know, there's a account on Hugging Face called The Bloke.
22:04 If you look for him, you'll see all his different fine, his different fine tunes and things like that.
22:09 And there's a group called Nous, I think is how you pronounce it, N-O-U-S.
22:13 And they've got some of the fine tunes that are basically the highest performing ones
22:18 that are out there.
22:19 So if you're really looking for a high performing local model that can actually, you know,
22:22 help you with code or reasoning, those are definitely the way to get started.
22:26 - Yeah, this one seems pretty nice.
22:28 I also haven't played with it, I just learned about it, but it's looking really good.
22:32 I had played with, what was it, GPT for All, I think is what it was.
22:36 - Yep, yep.
22:37 - It was the one that I played with.
22:38 Somehow this looks like, looks a little bit nicer than that for some, I don't know how different it really is, but.
22:43 - I mean, it's all the idea of like downloading these files and running them locally.
22:47 And these are just user interfaces that make it a little easier.
22:50 The original project that made this stuff kind of possible was a project called Llama CPP.
22:54 There's a Python library that can work with that directly.
22:58 There's another project called Llama File, where if you download the whole thing,
23:02 it actually runs no matter where you are.
23:04 I think it runs on Mac and Linux and Windows and BSD or whatever it is.
23:09 And it's, I mean, it's an amazing technology that this one put together, it's really impressive.
23:13 And then, you can actually just use Google Colab too, right?
23:17 So Google Colab has some GPUs with it.
23:19 I think if you upgrade it to the $10 a month version, I think you get some better GPUs access.
23:25 So if you actually wanna get a hand of like running, and so this is a little bit different, right?
23:29 So instead of calling an API, when you're using Google Colab, you can actually use a library called Hugging Face,
23:34 and then you can actually load these things directly into your memory, and then into your actual Python
23:39 environment, and then you're working with it directly.
23:41 So it just takes a little bit of work to make sure you're running it on the GPU,
23:46 'cause if you're running it on the CPU, it's gonna be a lot slower.
23:48 - Yeah, it definitely makes a big difference.
23:50 There's a tool that I use that for a long time ran on the CPU, and they rewrote it to run on the GPU.
23:55 Even on my M2 Pro, it was like three times faster or something, yeah. - For sure.
24:00 - Yeah, it makes a big difference.
24:01 - So with the LM Studio, let's you run the LMs offline and use models through an OpenAI,
24:08 that's what I was looking for, the OpenAI compatible local server.
24:12 - Right. - You could basically get an API for any of these, and then start programming against it, right?
24:16 - Exactly right, and it's basically the same interface, right, so same APIs for posting in response
24:22 of the JSON schema that's going back and forth.
24:25 So you're programming against that interface, and then you basically port it and move it to another,
24:30 to the OpenAI models if you wanted to as well.
24:33 So everyone's kind of coalescing around OpenAI as kind of like the quote unquote standard,
24:37 but there's nothing, you know, there's really no, there's no mode around that standard as well, right,
24:41 'cause anybody can kind of adopt it and use it.
24:44 - There's not like a W3C committee choosing.
24:47 - Correct.
24:48 - The market will choose for us, let's go.
24:51 - It seems to be working out well, and that's another benefit of Simon's LLM project, right,
24:55 he's got the ability to kind of switch back and forth between these different libraries and APIs as well.
25:00 - This LLM studio says, "This app does not collect data nor monitor your actions.
25:05 "Your data stays local on your machine, "free for personal use." All that sounds great.
25:09 "For business use, please get in touch." I always just like these like, if you gotta ask, it's too much type of like.
25:15 - Probably, yeah.
25:16 I'm using it for personal use, just so if anybody's watching, yes, just plain.
25:19 - Either they just haven't thought it through and they just don't wanna talk about it yet,
25:22 or it's really expensive.
25:24 I just probably imagine it's probably, it's like, ah, we haven't figured out a business model,
25:27 just, I don't know, shoot us a note.
25:29 - Nope, they're concentrating on the product, which makes sense.
25:31 - Yeah, so then the other one is LLamafile, LLamafile.ai that you mentioned,
25:35 and this packages it up.
25:36 I guess, going back to the LLM studio real quick, one of the things that's cool about this
25:41 is if it's the open AI API, right, with this little local server that you can play with,
25:47 but then you can pick LLMs such as LLM, Falcon, Replit, all the different ones, right,
25:54 Starcoder and so on.
25:56 It would let you write an app as if it was going to open AI and then just start swapping in models
26:02 and go like, oh, if we switch to this model, how'd that work?
26:04 But you don't even have to change any code, right?
26:05 Just probably maybe a string that says which model to initialize.
26:09 - One of the tricks though is then the prompts themselves.
26:12 - All right, let's talk about it.
26:13 - Yeah, the models themselves act differently, and part of this whole world
26:17 is what they call prompt engineering, right?
26:19 So prompt engineering is really just exploring how to interact with these models,
26:24 how to make sure that they're kind of in the right mind space to tackle your problem.
26:29 A lot of the times that people get, when they struggle with these things,
26:32 it's really just, they'd really got to think more like a psychiatrist when they're working with a model,
26:36 you know, basically getting them kind of prepared.
26:39 One of the tricks people did, figured out early, was you're a genius at software development,
26:44 like compliment the thing, make it feel like, oh, I'm going to behave like I'm
26:48 a world rockstar programmer, right?
26:51 - Well, it's going to give you average, but if you tell them I'm genius, then let's start,
26:55 we'll do that, yeah.
26:55 - And there was also a theory like that in December that the large language models were getting dumber
27:00 because it was the holidays and people don't work as hard, right?
27:02 Like, it's really hard to know like which of these things are true or not,
27:06 but it's definitely true that each model is a little bit different, and if you write a prompt that works really well
27:11 on one model, even if it's a stronger model or a weaker model, and then you port it to another model
27:16 and it's, you know, then the stronger model works worse, right, it can be very counterintuitive at times
27:22 and you've got to test things out, and that really gets to the idea of evals, right?
27:27 So evaluation is really a key problem, right?
27:30 Making sure that if you're going to be writing prompts and you're going to be building, you know,
27:34 different retrieval augmented generation solutions, you need to know about prompt injection
27:40 and you need to know about prompt engineering and you need to know what these things can and can't do.
27:44 One trick is what they call few-shot prompting, which is, you know, if you want it to do data extraction,
27:49 you can say, oh, okay, I want you to extract data from text that I give you in JSON.
27:54 If you give it a few examples, like wildly different examples, because giving it a bunch of similar stuff,
27:59 it might kind of cause it to just coalesce around those similar examples,
28:03 but if you can give it a wildly different set of examples, that's called in-context learning or few-shot prompting,
28:09 and it will do a better job at that specific task for you.
28:12 - That's super neat.
28:12 When you're creating your apps, do you do things like, here's the input from the program
28:18 or from the user or wherever it came from, but maybe before that, you give it like three or four prompts
28:23 and then let it have the question, right, instead of just taking the text,
28:27 like, I'm gonna ask you questions about biology and genetics and it's gonna be under this context,
28:33 and I want you to favor these data sources.
28:34 Now ask your question, something like this.
28:36 - For sure, all those types of strategies are worth experimenting with, right?
28:40 Like, what actually will work for your scenario?
28:42 I can't tell you, right?
28:43 You gotta dig in, you gotta figure it out, and you gotta try different things.
28:47 - You're about to win the Nobel Prize in genetics for your work.
28:50 Now I need to ask you some questions.
28:52 - For sure, that'll definitely work, and then threatening it that your boss is mad at you
28:56 is also gonna help you too, right, for sure.
28:58 - If I don't solve this problem, I'm gonna get fired.
29:00 As a large language model, I can't tell you, but I'm gonna be fired.
29:03 All right, well, then the answer is.
29:05 - Exactly right.
29:06 - So for these, they run, like you said, they run pretty much locally, these different models on LM Studio and others,
29:12 like the LLAMA file and so on.
29:14 If I had a laptop, I don't need a cluster.
29:16 LLAMA CPP is really the project that should get all the credit for making this work on your laptops.
29:22 And then LLAMA file and LLAMA CPP all have servers.
29:25 So I'm guessing LM Studio is just exposing that server.
29:29 And that's in the base LLAMA CPP project.
29:31 That's really what it is.
29:32 It's really just about now you can post your requests.
29:36 It's handling all of the work with regards to the token generation on the backend using LLAMA CPP,
29:41 and then it's returning it to you using the HTTP kind of processes.
29:45 - Is LLAMA originally from Meta?
29:47 Is that where that came from?
29:48 - I think there were people that were kind of using that LLM.
29:51 Right, I think people were kind of keying off the LLAMA thing at one point.
29:55 I think LLAMA Index, for instance, I think that project was originally called GPT Index,
29:59 and they decided, oh, I don't want to be like, I don't want to confuse myself with OpenAI
30:03 or confuse my project with OpenAI, so they switched to LLAMA Index.
30:06 And then of course, Meta released LLAMA.
30:07 So you can't, you kind of, and then everything from there has kind of evolved too,
30:11 right, there's been alpacas and a bunch of other stuff as well.
30:13 - You gotta know your animals, yeah.
30:15 If you don't know your animals, you can't figure out the heritage of these projects.
30:19 - Correct.
30:19 LLAMA from Meta was the first open source, I'd say large language model of note,
30:25 I guess, since ChatGPT.
30:26 There were certainly other, you know, I'm not a, so one thing to caveat,
30:30 I am not a researcher, right?
30:32 So there's lots of folks in the ML research community that know way more than I do.
30:35 But, because there was like Bloom and T5 and a few other large, you know,
30:39 quote unquote large language models.
30:40 But LLAMA, after ChatGPT, LLAMA was the big release that came from Meta in I think March.
30:46 And then, and that was from Meta.
30:47 And then they had it released under just like research use terms.
30:51 And then only certain people could get access to it.
30:53 And then someone put a, I guess, put like a BitTorrent link or something on GitHub.
30:58 And then basically the world had it.
31:00 And then they did end up releasing LLAMA 2 a few months later with more friendly terms.
31:04 So that, and it was a much stronger model as well.
31:07 - Nice.
31:08 It's kind of the realization, like, well, if it's gonna be out there anyway,
31:10 let's at least get credit for it then.
31:12 - For sure.
31:13 And I did read something where like basically Facebook approached OpenAI for access to their models
31:18 to help them write code.
31:19 But the cost was so high that they decided to just go build their own, right?
31:21 So it's kind of interesting how this stuff has evolved.
31:25 - Like, you know, we got a big cluster of computers too.
31:28 - Metaverse thing doesn't seem to be working yet.
31:29 So let's go ahead and train a bunch of large language models.
31:32 - Yeah, exactly.
31:33 We've got some spare capacity over in the Metaverse data center.
31:36 All right, so one of the things that people will maybe talk about in this space is
31:40 RAG or retrieval augmented generation.
31:43 What's this?
31:44 - One thing to recognize is that large language models, if it's not in the training set,
31:48 and it's not in the prompt, it really doesn't know about it.
31:51 And the question of like, what's reasoning and what's, you know, generalizing and things like that,
31:56 those are big debates that people are having.
31:58 What's intelligence, what have you.
31:59 Recognizing the fact that you have this prompt and things you put in the prompt,
32:02 the large language model can understand and extrapolate from is really powerful.
32:06 So, and that's called in context learning.
32:08 So retrieval augmented generation is the idea of, okay, I'm going to maybe ask,
32:14 allow a person to ask a question.
32:16 This is kind of like the common use case that I see.
32:19 User asks a question, we're going to take that question, find the relevant content,
32:23 put that content in the prompt, and then do something with it, right?
32:26 So it might be something like, you know, ask a question about, you know,
32:30 how tall is the Leaning Tower of Pisa, right?
32:32 And so now it's going to go off and find that piece of content from Wikipedia or what have you,
32:37 and then put that information in the prompt, and then now the model can then respond
32:42 to that question based on that text.
32:44 Obviously that's a pretty simple example, but you can get more complicated
32:46 and it's going out and bringing back lots of different content, slicing it up,
32:50 putting in the prompt and asking a question.
32:52 So now the trick is, okay, how do you actually get that content and how do you do that?
32:57 Well, you know, information retrieval, search engines and things like that,
33:00 that's obviously the technique.
33:02 But one of the key techniques that people have been, you know, kind of discovering, rediscovering, I guess,
33:07 is this idea of word embeddings or vectors.
33:10 And so Word2Vec was this project that came out, I think, 11 years ago or so.
33:14 And, you know, there was a big, the big meme around that was, you could take the embedding for the word king,
33:19 you could then subtract the embedding for the word man, add the word embedding for woman,
33:24 and then the end math result would actually be close to the embedding for the word queen.
33:29 And so what is an embedding?
33:30 What's a vector?
33:31 It's basically this large floating point number that has semantic meaning inferred into it.
33:37 And it's built just by training a model.
33:39 So just like you train a large language model, they can trade these embedding models
33:43 to basically take a word and then take a sentence and then take a, you know, a document,
33:48 is what, you know, OpenAI can do, and turn that into this big giant 200, 800, 1500,
33:54 you know, depending on the size of the embedding, floating point numbers,
33:59 and then use that as a, what's called, you know, semantic similarity search.
34:02 So you're basically going off and asking for similar documents.
34:05 And so you get those documents, and then you make your prompt.
34:08 - It's really wild.
34:09 So, you know, we're gonna make an 800 dimensional space, and each concept gets a location in that space,
34:15 and then you're gonna get another concept as a prompt, and you say, what other things in this space are near it?
34:20 - The hard problems that remain are, well, first you gotta figure out what you're trying to solve.
34:24 So once you figure out what you're actually trying to solve, then you can start asking yourself questions like,
34:28 okay, well, how do I chunk up the documents that I have?
34:31 Right, and there's all these different, and there's another great place for Lama Index and Lang Chain.
34:35 They have chunking strategies, where they all take a big giant document and break it down into sections,
34:41 and then you chunk each section, and then you do the embedding on just that small section.
34:46 Because the idea being, can you get, you know, finer and finer sets of text
34:50 that you can then, when you do your retrieval, you get the right information back.
34:55 And then the other challenge is really like the question answer problem, right?
34:58 If a person's asking a question, how do you turn that question into the same kind of embedding space as the answer?
35:04 And so there's lots of different strategies that are out there for that.
35:06 And then another problem is, if you're looking at the Wikipedia page for the Tower of Pisa,
35:12 it might actually have like a sentence in here that says it is X number of meters tall or feet tall,
35:17 but it won't actually have the word Tower of Pisa in it.
35:19 So there's another chunking strategy where they call it propositional trunking,
35:23 where they basically use a large language model to actually redefine each sentence
35:29 so that it actually has those proper nouns baked into it so that when you do the embedding,
35:33 it doesn't lose some of the detail with propositions.
35:36 - It's this tall, but-
35:37 - It is.
35:38 - It's something that plays this tall with its actual height and things like that.
35:42 - Correct.
35:43 - Crazy.
35:44 - But fundamentally you're working with unstructured data and it's kind of messy
35:46 and it's not always gonna work the way you want.
35:49 And there's a lot of challenges and people are trying lots of different things
35:52 to make it better.
35:52 - That's cool.
35:53 It's not always deterministic or exactly the same.
35:55 So that can be tricky as well.
35:58 This portion of Talk Python to Me is brought to you by Neo4j.
36:02 Do you know Neo4j?
36:04 Neo4j is a native graph database.
36:07 And if the slowest part of your data access patterns involves computing relationships,
36:11 why not use a database that stores those relationships directly in the database,
36:16 unlike your typical relational one?
36:18 A graph database lets you model the data the way it looks in the real world,
36:22 instead of forcing it into rows and columns.
36:25 It's time to stop asking a relational database to do more than they were made for
36:30 and simplify complex data models with graphs.
36:33 If you haven't used a graph database before, you might be wondering about common use cases.
36:38 You know, what's it for?
36:39 Here are just a few.
36:40 Detecting fraud, enhancing AI, managing supply chains, gaining a 360 degree view of your data
36:47 and anywhere else you have highly connected data.
36:51 To use Neo4j from Python, it's a simple pip install Neo4j.
36:57 And to help you get started, their docs include a sample web app demonstrating how to use it both from Flask and FastAPI.
37:04 Find it in their docs or search GitHub for Neo4j movies application quick start.
37:08 Developers are solving some of the world's biggest problems with graphs.
37:13 Now it's your turn.
37:14 Visit talkpython.fm/neo4j to get started.
37:18 That's talkpython.fm/neo, the number four and the letter J.
37:23 Thank you to Neo4j for supporting Talk Python to Me.
37:26 One of the big parts of at least this embedding stuff you're talking about are vector databases.
37:33 And they used to be really rare and kind of their own specialized thing.
37:36 Now they're starting to show up in lots of places.
37:38 And you shared with us this link of vector DB comparison.
37:41 I just saw that MongoDB added it.
37:43 I'm like, I didn't know that had anything to do with that.
37:46 I'm probably not gonna mess with it, but it's interesting that it's just like finding its way
37:49 in all these different spaces, you know?
37:51 - It was weird there for a couple of years where people were basically like talking
37:54 about vector databases, like they're their own separate thing.
37:57 The vector databases are now becoming their own fully fledged, either relational database or a graph database
38:02 or a search engine, right?
38:03 Those are kind of the three categories where, I mean, I guess Redis is its own thing too.
38:07 But for the most part, those new databases, quote unquote, are now kind of trying
38:11 to be more fully fledged.
38:13 And vectors and semantic search is really just one feature.
38:16 - I was just thinking that is this thing that you're talking about, is it a product
38:19 or is it a feature of a bigger product, right?
38:22 - Correct.
38:23 - If you already got a database, it's already doing a bunch of things.
38:25 Did it just answer the vector question?
38:26 Maybe, maybe not, I don't know.
38:28 - Exactly right.
38:29 And the one thing to recognize is that, and then the other thing people do
38:32 is they just take NumPy or what have you and just load them all into memory.
38:35 And if you don't have that much data, that's actually probably gonna be the fastest
38:38 and simplest way to work.
38:40 But the thing you gotta recognize is the fact that there is precision and recall
38:44 and cost trade-off that happens as well.
38:47 So they have to index these vectors and there's different algorithms that are used
38:52 and different algorithms do better than others.
38:55 So you gotta make sure you understand that as well.
38:57 So, and one thing you can do is for instance, pgVector, which comes as an extension for Postgres,
39:02 you can start off by not indexing at all.
39:04 And you should get, I believe, hopefully I'm not misspeaking, you should get perfect recall,
39:09 meaning you'll get the right answer.
39:10 You'll get the, if you ask for the five closest vectors to your query, you'll get the five closest,
39:15 but it'll be slower than you probably want.
39:17 So then you have to index it.
39:19 And then what ends up happening is, the next time you might only get four of those five.
39:22 You'll get something else that snuck into that list.
39:24 - If you got time, you're willing to spend unlimited time, then you can get the right answer, the exact answer.
39:31 But I guess that's all sorts of heuristics, right?
39:33 You're like, I could spend three days or I could do a Monte Carlo thing and I could give you the answer in a fraction of a second.
39:40 But it's not deterministic.
39:42 All right, so then I won't go with my camera, so I turn it off, I don't know what's up with it,
39:45 but we'll, yeah.
39:46 So you wrote a cool blog post called, "What is a Custom GPT?" And we wanna talk some about building custom GPTs
39:54 and with SAPI and so on.
39:56 So let's talk about this.
39:57 Like one of the, I think one of the challenges in why it takes so much compute for these systems
40:02 is like, they're open-ended.
40:04 You can ask me any question about any knowledge in the world in the humankind, right?
40:08 You can ask about that, let's start talking.
40:11 Or it could be, you can ask me about genetics, right?
40:15 That seems like you could both get better answers if you actually only care about genetic responses.
40:21 You know, how tall is the Leaning Tower?
40:23 And probably make it smaller, right?
40:25 So is that kind of the idea of these custom GPTs or what is it?
40:28 - No, so custom GPTs are new capability from OpenAI.
40:32 And basically they are a wrapper around a very small subset, but it's still using the OpenAI ecosystem, okay?
40:40 And so what you do is you give it a name, you give it a logo, you give it a prompt.
40:44 And then from there, you can also give it knowledge.
40:47 You can upload PDF documents to it and it will actually slice and dice those PDF documents
40:51 using some sort of vector search.
40:53 We don't know how it actually works.
40:54 The GPT, the cool thing is the GPT will work on your phone, right?
40:58 So I have my phone, I can have a conversation with my phone.
40:59 I can take a picture, upload a picture and it will do vision analysis on it.
41:04 So I get all the capabilities of OpenAI GPT-4.
41:07 But a custom GPT is one that I can construct and give a custom prompt to, which basically then says,
41:12 okay, now you're, and to your point, I think maybe this is where you're going with it.
41:15 Like, hey, now you're an expert in genomics or you're an expert in something
41:18 and you're basically coaching the language model and what it can and can't do.
41:23 And so it's a targeted experience within the large language, within the ChatGPT ecosystem.
41:30 It has access to also the OpenAI tools.
41:32 Like so OpenAI has the ability to do code interpreter and Dolly, and it can also hit the web browser.
41:37 So you have access to everything.
41:39 But the interesting thing to me is the fact that you can actually tie this thing
41:42 to what are called actions.
41:44 So March, I think of last year, they actually had this capability called plugins
41:48 that they announced.
41:48 And plugins have kind of faded to the background.
41:51 I don't know if they're gonna deprecate them officially, but the basic gist with plugins is what was,
41:55 you could turn that on and it can then call your API.
41:58 And the cool thing about it was that it read your OpenAPI spec, right?
42:01 So you write an OpenAPI spec, which is Swagger, if you're familiar with Swagger,
42:06 and it basically defines what all the endpoints are, what the path is, what the inputs and outputs are,
42:11 including classes or field level information and any constraints or what have you.
42:16 So you can fully define your OpenAPI spec, it can then call that OpenAPI spec,
42:20 and it's basically giving it tools.
42:22 So like the example that they say in the documentation is get the weather, right?
42:25 So if you say, what's the weather in Boston?
42:27 Well, Chetchi BT doesn't know the weather in Boston.
42:29 All it knows how to do is call it, but you can call an API and it figures out
42:33 how to call the API, get that information, and then it can use that to redisplay.
42:36 And that's a very basic example.
42:38 You can do way more complicated things than that.
42:41 It's pretty powerful.
42:42 - Okay, that sounds really pretty awesome.
42:44 I thought a lot about different things that I might build.
42:47 On your blog post here, you've got some key benefits and you've got some risks.
42:51 You maybe wanna talk a bit about that?
42:53 - Yeah, so the first part with plugins that was wrong, was that didn't work as well,
42:56 is that there was no kind of overarching custom instruction that could actually teach it how to work with your plugin.
43:02 So if you couldn't put it in the API spec, then you couldn't like integrate it
43:05 with a bunch of other stuff or other capabilities, right?
43:08 So the custom instruction is really a key thing for making these custom APIs strong.
43:13 But one warning about the custom instruction, whatever you put in there, anybody can download, right?
43:17 Not just the folks at OpenAI, anybody.
43:18 Like basically there's GitHub projects where like thousands of these custom prompts
43:23 that people have put into their GPT.
43:25 So, and there are now knockoffs on GPT.
43:28 So it's all kind of a mess right now in the OpenAI store.
43:31 I'm sure they'll clean it up, but just recognize the custom instruction is not protected
43:35 and neither is the knowledge.
43:36 So if you upload a PDF, there have been people that have been figuring out how to like download those PDFs.
43:41 And I think that that might be a solved problem now or they're working on it, but it's something to know.
43:46 The other problem with plugins was, I can get a plugin working, but if they didn't approve my plugin
43:51 and put it in their plugin store, I couldn't share it with other people.
43:55 The way it works now is I can actually make a GPT and I can give it to you and you can use it directly,
44:00 even if it's not in the OpenAPI store or OpenAI store.
44:04 You know, it is super easy to get started.
44:05 They have like a tool to like help you generate your DALL picture and actually you don't even have to figure
44:10 out how to do the custom instructions yourself.
44:11 You can just kind of chat that into existence.
44:14 But the thing that I'm really excited about is that this is like free playing.
44:17 Like you could do, so the hosting cost is basically all on the client side.
44:22 You have to be a ChatGPT plus user right now to create these and use these.
44:26 But the cool thing as a developer, I don't have to pay those API fees that we were talking about, right?
44:31 And if I need to use GPT-4, which I kind of do for my business right now,
44:35 just because of how complicated it is, I don't have to pay those token fees
44:38 for folks using my custom GPT at this moment.
44:41 - Where's like the billing or whatever you call it for the custom GPT lies that in the person who's using it,
44:47 does it have to, it goes onto their account and whatever their account can do and afford?
44:50 - Yeah, right now, OpenAPI, OpenAI ChatGPT plus is $20 a month.
44:55 And then there's a team's version, which I think is either 25 or 30, depending on the number of users or how you pay for it.
45:01 That's the cost.
45:02 So right now, if you want to use custom GPTs, everyone needs to be a ChatGPT plus user.
45:08 There's no extra cost based on usage or anything like that.
45:11 In fact, there's talk about revenue sharing between OpenAI and developers of custom GPTs,
45:17 but that has not come out yet as far as like what those details are.
45:20 - It does have an App Store feel to it, doesn't it?
45:23 - There's risks too, right?
45:24 Obviously anybody can, there's already been like tons of copies up there.
45:28 OpenAI, they're looking for their business model too, right?
45:30 So they could, if someone has a very successful custom GPT, it's well within their right to kind of add that
45:36 to the base product as well.
45:38 Injection is still a thing.
45:39 So if you're doing anything in your actions that actually changes something,
45:43 that is consequential is what they call it, you better think very carefully,
45:47 like what's the worst thing that could happen, right?
45:49 'Cause whatever the worst thing that could happen is, that's what's gonna happen.
45:52 'Cause people can figure this stuff out and they can confuse the large language models
45:57 into calling them.
45:58 - And the more valuable it is that they can make that thing happen, the more effort they're gonna put into it as well.
46:03 - Yeah, yeah, yeah.
46:04 - For sure.
46:05 - I was gonna ask, do you think it's easy to solve SQL injection and other forms of injection, at least in principle, right?
46:13 There's a education problem, there's millions of people coming along as developers
46:18 and they see some demo that says, the query is like this plus the name.
46:23 Wait a minute, wait.
46:24 So it kind of recreates itself through not total awareness, but there is a very clear thing you do solve that,
46:32 you use parameters, you don't concatenate strings with user input, problem solved.
46:35 What about prompt injection though?
46:37 It's so vague how these AIs know what to do in the first place.
46:42 And so then how do you completely block that off?
46:45 - Unsolved problem.
46:46 I'm definitely stealing from Simon on this 'cause I've heard him say it on a few podcasts
46:50 is just basically there's no solution as far as we know.
46:53 So you have to design, and there's no solution to the hallucination problem either
46:57 'cause that's a feature, right?
46:59 That's actually what the thing is supposed to do.
47:01 So when you're building these systems, you have to recognize those two facts
47:05 along with some other facts that really limit what you can build with these things.
47:09 - So you shouldn't use it for like legal briefs, is that what you're saying?
47:12 - I think these things are great collaborative tools, right?
47:15 The human in the loop, and that's everything that I'm building, right?
47:18 So all the stuff that I'm building is assuming that the human's in the loop,
47:21 and what I'm trying to do is augment and amplify expertise.
47:25 I'm building tools for people that know about genomics and cancer and how to help cancer patients.
47:31 I'm not designing it for cancer patients who are gonna go operate on themselves, right?
47:35 Like that's not the goal.
47:36 The idea is there's a lot of information.
47:39 These tools are super valuable from like synthesizing a variety of info,
47:45 but you still need to look at the underlying citations.
47:47 And ChatGPT by itself can't give you citations.
47:50 Like it'll make some up.
47:51 It'll say, "Oh, I think there's probably "a Wikipedia page with this link."
47:54 But you actually have to, you definitely have to have an outside tool, either the web, Bing, which is, I would say,
48:00 subpar for a lot of use cases, or you have to have actions that can actually bring back references
48:06 and give you those links.
48:07 And then the expert will then say, "Oh, okay, great.
48:09 "Thanks for synthesizing this, giving me this info.
48:11 "Let me go validate this myself." Right, go click on the link and go validate it.
48:15 And that's really, I think that's really the sweet spot for these things, at least for the near future.
48:19 - Yeah, don't ask it for the answer.
48:21 Ask it to help you come up with the answer, right?
48:23 - Exactly right. - All right.
48:24 - And then have it criticize you when you do have something 'cause then it'll do a great job
48:28 of telling you everything you've done wrong.
48:30 - I'm feeling too good about myself.
48:31 I need you to insult me a lot.
48:32 Let's get going.
48:34 All right, speaking of talking about ourselves, you've got this project called PyPI GPT.
48:38 What's this about?
48:39 - I really wanted to tell people that FastAPI and Pydantic, 'cause Python, like we were saying earlier,
48:44 I don't know if it was on the call or not, but Python is the winning language, right?
48:49 And I think FastAPI and Pydantic are the winning libraries in their respective fields, and they're great.
48:54 And they're perfect for this space because you need an open API spec.
48:57 English is the new programming language, right?
48:59 So Andrej Koparthe, who used to work at Tesla and now works at OpenAI, has this pinned tweet
49:05 where he's basically like, "English is the hottest programming language,"
49:07 or something like that.
49:08 And that's really the truth, 'cause even in this space where I'm building an open API spec,
49:13 99% of the work is thinking about the description of the endpoints or the description of the fields
49:19 or codifying the constraints on different fields.
49:23 Like you can use these greater thans and less thans and regexes, right, to describe it.
49:28 And so what I did was I said, "Okay, let's build this thing in FastAPI,"
49:32 just to get an example out for folks.
49:34 And then I turned it on.
49:35 I actually use ngrok as my service layer 'cause you have to have HTTPS to make this thing work.
49:40 - Ngrok is so good.
49:41 - Yep. - Yeah.
49:42 - I turned that on with an nginx thing in front of it.
49:44 So this library, to actually use it, you'll have to actually set that stuff up yourself.
49:49 You have to download it, you have to run it, you have to either get it on a server with HTTPS
49:53 with Let's Encrypt or something.
49:55 Once you've turned it on, then you can actually see how it generates the OpenAPI spec, how to configure the GPT.
50:02 I didn't do much work with regards to the custom instructions that I came up with.
50:05 I just said, "Hey, call my API, figure it out." And it does.
50:08 And so what this GPT does is it basically says, "Okay, given a package name and a version number,
50:12 it's gonna go and grab this data from the SQLite database that I found that has this information and then bring it back to you."
50:17 It's the least interesting GPT I could come up with, I guess.
50:20 But it shows kind of the mechanics, right?
50:21 The mechanics of setting up the servers and the application within FastAPI,
50:27 the kind of the little things, the little bits that you have to flip to make sure that OpenAPIs or OpenAI
50:34 can understand your OpenAPI spec, bumble through OpenAI and OpenAPI all the time,
50:39 and make sure that they can talk to each other.
50:40 And then it will then do the right thing and call your server and bring the answers back.
50:45 And there's a bunch of little flags and information you need to know about actions
50:50 that are on the OpenAPI documentation.
50:53 And so I tried to pull that all together into one simple little project for people to look at.
50:58 - It's cool.
50:59 So you can ask it questions like, "Tell me about FastAPI, this version."
51:02 And it'll come back and-
51:03 - I was hoping to do something a little better, like, "Hey, here's my requirements file."
51:06 And go, "Tell me, am I on the latest version of everything?" Or whatever, something more interesting.
51:12 I just didn't have time.
51:13 - Can you ask it questions such as, "What's the difference between this version
51:17 and that version?" - You could, if that information's in the database.
51:19 I actually don't know if it is.
51:21 And then obviously you could also hit the PyPI server.
51:24 And I didn't do that.
51:24 I just wanted to, I don't wanna be hitting anybody's server indiscriminately at this point.
51:29 But that would be a great use case, right?
51:32 So someone could take this and certainly add some capabilities.
51:37 The thing that is valuable that I'm trying to showcase is the fact that ChatGPT and large language models,
51:43 while they do have the world's information kind of compressed at a point in time,
51:47 they are still not a database, right?
51:49 They don't do well when you're basically trying to make sure you have a comprehensive query
51:53 and you've brought back all the information.
51:55 And they're also not good from like a up-to-date perspective, right?
51:58 There's a cutoff date.
51:59 Thankfully, they finally updated that recently.
52:01 I think it's now April of 2023.
52:03 But at some point, it just doesn't know about newer things.
52:06 And so a GPT is a really interesting way of doing that.
52:09 I'm gonna put it out in the universe and hopefully someone will do it.
52:11 Make me a modern Python GPT, which is basically like, get the new version of Pydantic and Polars
52:18 and a few other libraries that ChatGPT does a bad job at just because they're in underactive development
52:24 during the time that ChatGPT was getting trained.
52:27 So that's the perfect use cases for these types of custom GPTs with knowledge
52:32 in a PDF file or an API backing it up.
52:35 - I think there's a ton of value in being able to feed a little bit of your information,
52:40 some of your documents or your code repository or something to a GPT and then be able to ask it questions about it, right?
52:47 - Yeah.
52:48 - Like, you know, tell me about the security vulnerabilities that you see in the code.
52:52 Like, is there anywhere where I'm missing some tests or I'm calling a function in a way that's known to be bad
52:59 and you know, like that kind of stuff is really tricky.
53:02 But it's also tricky because it doesn't, even if you paste in a little bit of code,
53:05 it's not the whole project, right?
53:07 So, you know, to put a little bit more in there is pretty awesome.
53:10 - Yeah, being able to give it all the code from some of these code repositories, right?
53:14 Like, and bringing back the relevant information.
53:16 So I think there is a kind of this race.
53:18 There's gonna be other, you know, cool, there's another cool project called SourceGraph and Cody
53:23 that we can talk about that will, you know, run on your local server and basically indexes
53:28 your code base and it'll bring back relevant snippets from your code base and answer questions kind of in context.
53:33 And, you know, long-term and the new project, I don't know how new, Codium,
53:38 they had a new paper where they talked about flow engineering and flow engineering is just basically
53:43 that same concept of the human in the loop with the LLM, with the code, that's the magic combination of kind of those people,
53:50 those entities kind of iterating with each other.
53:53 I think these, you know, these tools are definitely gonna evolve and you really wanna have the ability to have access
53:58 to your specific information to answer your specific questions.
54:02 - Cody is new to me, Cody.dev.
54:05 And this little subtitle or whatever is, Cody is a coding assistant that uses AI,
54:11 understand your code base, right?
54:13 It was saying, it was about your entire code base, APIs, implementations, and idioms.
54:17 Like that's kind of what I was suggesting, at least for code, right?
54:19 - Yeah, and Sourcegraph, those folks really understand code indexing and searching.
54:24 Like that's what the first product was.
54:26 They were kind of just teed up, ready for this large language model moment.
54:29 And then they said, "Oh, let's just put Cody on top of that.
54:31 "So this thing will run, it will understand your code "and it will kind of bring things together for you."
54:36 So these folks do podcasts all the time.
54:38 I'd reach out to them.
54:39 - Yeah, interesting.
54:40 It's quite neat looking.
54:41 Think I'm gonna give it a try.
54:42 It both plugs into HRM and VS Code.
54:45 That's pretty neat.
54:46 - Very cool.
54:47 - We're starting to get a little bit short on time here, but for people who wanna play with the PyPI GPT,
54:51 maybe as an example, to just cut the readme and it's easy to get from there.
54:55 What do you need to tell them?
54:56 - I put a make file in there, so you know exactly like the steps to kind of make the environment,
55:00 download the files and just ping me, follow me on Twitter, iMore, and ping me if you need anything there.
55:06 I'm also on LinkedIn and GitHub, right?
55:09 So you can certainly reach out if you have any challenges.
55:12 - Excellent.
55:12 - The last thing that folks that are actually in the medical space, right?
55:15 So the thing that I'm working on right now actively is how to integrate this thing with our knowledge base, right?
55:21 So I have a knowledge base of hand curated trials and curated therapies and other information,
55:27 built it so that my custom GPT can actually work with that, come up with some, I'd say novel.
55:33 I always haven't seen anybody else and I haven't seen any research approaching things the same way I am
55:38 that handles some of the other challenges that are out there, right?
55:40 So for instance, the context window is a challenge.
55:43 So the context window is the amount of text that's in there and how it gets processed.
55:49 If you're making decisions and you're changing course, the chatbot will lose track of those changes, right?
55:56 So if you're experimenting or going down one path of inquiry and then you switch to another path,
56:02 it can get confused and forget that you switched paths.
56:05 - Or just run out of space to hold all that information.
56:08 Like, well, it forgot the last three things or the first three things you told it.
56:12 It only knows four and you think it knows seven and it's working incomplete, right?
56:15 - Yep.
56:16 And one of the key things is you actually want it to forget some things as well, right?
56:20 So those are all interesting challenges.
56:23 And I'm actually working with these custom GPTs to kind of change the way that the collaboration works
56:29 between the human, the expert, the large language model or the assistant
56:34 and my backend, my actual retrieval model, the API that's actually doing stuff.
56:39 - So are researchers and MDs and PhDs at your company talking with this thing and making use of it?
56:46 - Yeah, I mean, we're in active development right now.
56:48 We have a few key opinion leaders that are working with us and collaborating with us,
56:52 but we're always looking for more folks that are in the field that actually...
56:55 And right now you need kind of the cutting edge people.
56:58 This stuff's not ready for prime time.
57:00 Clinical decision support is a really hard problem, but we need the folks that wanna get ahead of it
57:06 'cause we know that there are doctors and there are patients that are asking
57:09 ChatGPT questions right now.
57:11 And even if it says, I'm not a medical expert, blah, blah, blah, and at the end of the day,
57:14 we actually don't have enough doctors, right?
57:16 That's the other scary thing is we don't have enough doctors, patients want answers.
57:20 How do we build solutions that can allow this expertise to get more democratized and more into folks' hands?
57:27 And I'm hoping our tool along with these large language models can help relieve some of that burden.
57:33 - It might not be as 100% accurate, 100% precise, but neither are doctors, right?
57:38 They get stuff wrong.
57:40 You just need to be in the realm of as good as a doctor.
57:43 You don't need to be completely without making a mistake.
57:47 And that's, I think, a challenge that we're just gonna have to get used to in general.
57:52 I joked about the legal brief thing 'cause someone got in trouble for submitting a brief
57:57 that had hallucinations in it.
57:59 And there's certain circumstances where maybe it's just not acceptable, but AI self-driven cars, people crash,
58:05 but that's a human mistake.
58:08 But when a machine makes it, it's a pre-programmed, predetermined mistake.
58:12 Something like that.
58:13 It doesn't feel the same as if the machine made a mistake.
58:16 So if a machine makes a recommendation like you need this cancer treatment,
58:20 or you're fine, you don't need it, and it was wrong, people are not gonna be as forgiving.
58:25 But it doesn't mean there's not value to be gained from systems that can help you, right?
58:29 - I always appreciate those machine learning papers that'll show the tracking over time
58:34 of how the models have gotten better and better, and they put the human in there,
58:37 and you can see that the human has already gotten eclipsed by the models, and that's a specific problem, right?
58:42 'Cause it's also recognizing that a lot of this stuff, these models that are doing tasks
58:47 are doing one specific task.
58:48 They're not doing a whole job.
58:49 They're not doing an end-to-end process.
58:51 They're answering a medical question, or they're looking at an image and finding all the cats or whatever it's supposed to do.
58:58 So, and to your point, though, humans aren't perfect at these tasks either.
59:02 - I think mostly people are gonna be using this kind of stuff to help them
59:05 come up with these answers, right?
59:07 My weird Amazon description example is gonna be the edge case, not the go-to.
59:12 - Agreed.
59:13 - Yeah, you came in, you spoke to the chatbot, here's your diagnosis, have a good day, right?
59:17 Not so much, more like, I need some help thinking through this.
59:20 What are some studies that have addressed this, right?
59:24 And those kind of questions.
59:25 - And I hesitate to say it's just a better search engine, 'cause I actually think it's got
59:29 way more potential than that.
59:30 - I agree.
59:31 - It can have a conversation, it can iterate back and forth, and what I'm actually trying to do
59:35 is build some state into it, right?
59:37 Some structured way of kind of remembering what the conversation was, and using a lot of the techniques
59:44 that these large language models are good at to actually, to make that actually happen.
59:48 And so that you can actually build a system so that the human and the assistant and the backend
59:52 all kind of know what the other party is thinking about and that they all work together.
59:57 - Nice.
59:57 For your genomics custom GPT thing that you're making internally, is that gonna become a product eventually
01:00:04 if other people are interested?
01:00:05 Is there some way they can keep tabs on it, or is it just internal only?
01:00:08 - Definitely reach out to me.
01:00:09 So we're building different versions of GPTs.
01:00:11 Like we're gonna have a GPT for our curation team that curates knowledge,
01:00:15 and we're building a GPT that, you know, my hope is that it'll go to physicians,
01:00:18 to oncologists and genomic counselors and other providers that could actually use this thing.
01:00:24 Eventually, if it becomes robust enough and stable enough, and I don't feel like we're doing a disservice,
01:00:30 we could certainly make a version of that available for cancer patients as well.
01:00:33 I would, you know, I'd love to have that.
01:00:34 I just wanna make sure that it's done in a responsible way.
01:00:36 - Yeah, absolutely.
01:00:37 Well, I honestly hope that you actually do such a good job that we don't have to have cancer research anymore,
01:00:42 but that's a long, long-term goal, right?
01:00:47 - That is definitely the end goal.
01:00:48 And that's really exciting too, so is that the new drugs that are coming out,
01:00:51 new treatments that are coming out, it's really just about making sure people are aware of it,
01:00:56 making sure that they're getting the genetic testing that they need, right?
01:00:59 So if you have a loved one that has, unfortunately has cancer, make sure that they're at least asking their doctor
01:01:04 the question about genomic testing to make sure that they're getting the best possible treatment.
01:01:08 - Sounds good.
01:01:09 All right, well, quickly before we get out of here, recommendation on some libraries, some project
01:01:15 that maybe we haven't talked about yet, something you came across, people were like,
01:01:18 "Oh, this would be awesome." - We ran out of time.
01:01:20 I was gonna talk about some of these Pydantic projects.
01:01:22 So there's Marvin, Instructor, and Outlines.
01:01:25 So folks should definitely look at those.
01:01:27 So basically what you do is you take, you can describe stuff as Pydantic,
01:01:31 and then it'll actually just extract it right into that Pydantic model for you.
01:01:34 And that's, so Marvin and Outlines and Instructor.
01:01:37 So check those guys out, they're awesome.
01:01:38 And then the other one that I actually had teed up was VisiCalc.
01:01:42 So VisiCalc is like this crazy command line tool.
01:01:45 It's awesome.
01:01:46 Like you can basically look at giant CSV files all on the command line.
01:01:49 It has like these hotkeys that you can do.
01:01:51 And it, sorry, not VisiCalc, Visidata.
01:01:54 - Visidata, okay.
01:01:55 - And so basically it's just, it's basically Excel inside your terminal.
01:01:58 And this was before Rich and Textual project.
01:02:02 And it was just like, it was kind of mind blowing all the stuff that this person was able to figure out
01:02:06 how to make work.
01:02:07 - That's super amazing.
01:02:08 I just wanted to give a shout out one more thing, 'cause your Visidata reminded me of something
01:02:12 I just came across called Bpytop.
01:02:14 - Yep, yep, yep.
01:02:15 - People have servers out there and they need to know what's going on with their server.
01:02:18 Where's my, I need a picture for this.
01:02:20 But yeah, it's like a nice visualization.
01:02:24 There's also a Bpytop.
01:02:26 It's pretty amazing what people can do in the terminal.
01:02:28 Oh, there they are.
01:02:29 They're just responsive design themselves out.
01:02:31 But yeah, if you want a bunch of live graphs, every time I see stuff like this,
01:02:35 the Visidata or this or what textual folks are working on, it's just like, I can't believe they built this.
01:02:41 I'm working at the level of Colorama.
01:02:43 This string is red right here.
01:02:45 They're like, oh yeah, we rebuilt it.
01:02:47 - I got an emoji to show up, right?
01:02:49 I'm excited.
01:02:50 - Yes, exactly, yes.
01:02:51 A rocket ship is there, not just tech.
01:02:54 - Yeah, pretty excellent.
01:02:55 All right.
01:02:56 Well, Ian, thank you for being here and keep up the good work.
01:03:00 I know so many people are using LLMs, but not that many people are creating LLMs.
01:03:05 And as developers, we love to create things.
01:03:08 We already have the tools to do it.
01:03:09 People can check out your GitHub repo on the high PI GPT and use it as a starting place, right?
01:03:16 - Sounds great.
01:03:17 Yeah, and definitely reach out if you have any questions.
01:03:19 - Excellent.
01:03:20 Well, thanks for coming back on the show.
01:03:21 Catch you all later.
01:03:22 - Great, good to talk to you.
01:03:23 Bye bye.
01:03:23 - Bye.
01:03:25 This has been another episode of Talk Python to Me.
01:03:28 Thank you to our sponsors.
01:03:29 Be sure to check out what they're offering.
01:03:31 It really helps support the show.
01:03:33 Take some stress out of your life.
01:03:35 Get notified immediately about errors and performance issues in your web
01:03:39 or mobile applications with Sentry.
01:03:41 Just visit talkpython.fm/sentry and get started for free.
01:03:46 And be sure to use the promo code, talkpython, all one word.
01:03:50 It's time to stop asking relational databases to do more than they were made for
01:03:54 and simplify complex data models with graphs.
01:03:58 Check out the sample FastAPI project and see what Neo4j, a native graph database, can do for you.
01:04:04 Find out more at talkpython.fm/neo4j.
01:04:09 Want to level up your Python?
01:04:11 We have one of the largest catalogs of Python video courses over at Talk Python.
01:04:15 Our content ranges from true beginners to deeply advanced topics like memory and async.
01:04:20 And best of all, there's not a subscription in sight.
01:04:22 Check it out for yourself at training.talkpython.fm.
01:04:26 Be sure to subscribe to the show, open your favorite podcast app, and search for Python.
01:04:30 We should be right at the top.
01:04:32 You can also find the iTunes feed at /itunes, the Google Play feed at /play,
01:04:37 and the direct RSS feed at /rss on talkpython.fm.
01:04:41 We're live streaming most of our recordings these days.
01:04:44 If you want to be part of the show and have your comments featured on the air,
01:04:47 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.
01:04:52 This is your host, Michael Kennedy.
01:04:54 Thanks so much for listening.
01:04:55 I really appreciate it.
01:04:56 Now get out there and write some Python code.
01:04:59 (upbeat music)
01:05:17 --