WEBVTT

00:00:00.001 --> 00:00:02.260
Do you know what custom GPTs are?

00:00:02.260 --> 00:00:05.160
They're configurable and shareable chat experiences

00:00:05.160 --> 00:00:07.480
with the name, logo, custom instructions,

00:00:07.480 --> 00:00:10.680
conversation starters, access to open AI tools,

00:00:10.680 --> 00:00:12.580
and custom API actions.

00:00:12.580 --> 00:00:15.100
And you can build them with Python.

00:00:15.100 --> 00:00:17.820
Ian Moyer has been doing just that

00:00:17.820 --> 00:00:20.440
and is here to share his experience building them.

00:00:20.440 --> 00:00:23.380
This is Talk Python To Me, episode 456,

00:00:23.380 --> 00:00:26.020
recorded January 22nd, 2024.

00:00:26.020 --> 00:00:41.960
Welcome to Talk Python To Me,

00:00:41.960 --> 00:00:43.660
a weekly podcast on Python.

00:00:43.660 --> 00:00:45.400
This is your host, Michael Kennedy.

00:00:45.400 --> 00:00:48.040
Follow me on Mastodon, where I'm @mkennedy,

00:00:48.040 --> 00:00:50.440
and follow the podcast using @talkpython,

00:00:50.440 --> 00:00:52.880
both on fosstodon.org.

00:00:52.880 --> 00:00:55.460
Keep up with the show and listen to over seven years

00:00:55.460 --> 00:00:57.960
of past episodes at talkpython.fm.

00:00:57.960 --> 00:01:00.600
We've started streaming most of our episodes

00:01:00.600 --> 00:01:01.740
live on YouTube.

00:01:01.740 --> 00:01:03.300
Subscribe to our YouTube channel

00:01:03.300 --> 00:01:05.500
over at talkpython.fm/youtube

00:01:05.500 --> 00:01:07.840
to get notified about upcoming shows

00:01:07.840 --> 00:01:09.300
and be part of that episode.

00:01:09.300 --> 00:01:12.020
This episode is sponsored by Sentry.

00:01:12.020 --> 00:01:14.000
Don't let those errors go unnoticed.

00:01:14.000 --> 00:01:14.880
Use Sentry.

00:01:14.880 --> 00:01:17.940
Get started at talkpython.fm/sentry.

00:01:17.940 --> 00:01:20.680
And it's also brought to you by Neo4j.

00:01:20.680 --> 00:01:23.820
It's time to stop asking relational databases

00:01:23.820 --> 00:01:25.480
to do more than they were made for.

00:01:25.480 --> 00:01:28.060
Check out the sample FastAPI project

00:01:28.060 --> 00:01:31.160
and see what Neo4j, a native graph database,

00:01:31.160 --> 00:01:32.080
can do for you.

00:01:32.080 --> 00:01:36.500
Find out more at talkpython.fm/Neo4j.

00:01:36.500 --> 00:01:39.660
Ian, welcome to Talk Python To Me.

00:01:39.660 --> 00:01:40.460
Hey, Michael.

00:01:40.460 --> 00:01:41.160
Good to see you again.

00:01:41.300 --> 00:01:42.940
Yeah, great to see you again.

00:01:42.940 --> 00:01:44.800
It has been a little while.

00:01:44.800 --> 00:01:46.440
It seems like not so long ago.

00:01:46.440 --> 00:01:49.080
And yet, when I pull up the episode

00:01:49.080 --> 00:01:51.000
that we did together, sure enough,

00:01:51.000 --> 00:01:54.420
it says March 7th, 2018.

00:01:54.420 --> 00:01:55.340
Wow.

00:01:55.340 --> 00:01:56.400
Years are short.

00:01:56.400 --> 00:01:57.240
Years are short.

00:01:57.240 --> 00:01:58.140
They go by really fast.

00:01:58.140 --> 00:01:58.720
They sure do.

00:01:58.720 --> 00:02:02.160
So back then, we were talking about Python

00:02:02.160 --> 00:02:04.000
and biology and genomics.

00:02:04.280 --> 00:02:08.020
And it sounds like you're still doing genetic type things

00:02:08.020 --> 00:02:11.080
and still doing Python and all that kind of stuff.

00:02:11.080 --> 00:02:11.440
For sure.

00:02:11.440 --> 00:02:12.140
Yeah, definitely.

00:02:12.140 --> 00:02:14.540
We work for a company called Genome Oncology.

00:02:14.540 --> 00:02:16.600
We do precision oncology software,

00:02:16.600 --> 00:02:19.620
helping folks make sense of genomics

00:02:19.620 --> 00:02:20.800
and trying to help cancer patients.

00:02:20.800 --> 00:02:21.340
That's awesome.

00:02:21.340 --> 00:02:24.560
There's different levels of helping people with software.

00:02:24.960 --> 00:02:28.320
On one level, we probably have ad retargeting.

00:02:28.320 --> 00:02:32.980
On the other, we've got medical benefits

00:02:32.980 --> 00:02:36.100
and looking for helping people

00:02:36.100 --> 00:02:38.000
who are suffering socially or whatever.

00:02:38.000 --> 00:02:40.420
So it's got to feel good to write software

00:02:40.420 --> 00:02:43.220
that is making a difference in people's lives.

00:02:43.220 --> 00:02:43.700
That's right.

00:02:43.700 --> 00:02:46.500
I did spend a lot of the 2000s making e-commerce websites

00:02:46.500 --> 00:02:48.880
and that wasn't exactly the most fulfilling thing.

00:02:48.880 --> 00:02:50.880
I learned a lot, but it wasn't as exciting

00:02:50.880 --> 00:02:51.700
as what I'm doing now,

00:02:51.700 --> 00:02:53.820
or at least as fulfilling as what I'm doing now.

00:02:53.820 --> 00:02:55.280
Were those earlier websites in Python?

00:02:55.280 --> 00:02:57.420
I was all Java for the most part.

00:02:57.420 --> 00:02:59.040
And finally with this company,

00:02:59.040 --> 00:03:02.700
I knocked out a prototype in Django a few years ago.

00:03:02.700 --> 00:03:04.900
And my boss at the time was like,

00:03:04.900 --> 00:03:06.400
you did that so fast,

00:03:06.400 --> 00:03:08.020
you should do some more stuff in Python.

00:03:08.020 --> 00:03:10.420
So that's kind of how it evolved.

00:03:10.420 --> 00:03:13.860
And now basically most of our core backend is Python

00:03:13.860 --> 00:03:17.120
and we use a little bit of Svelte for the user interfaces.

00:03:17.120 --> 00:03:17.580
Beautiful.

00:03:17.580 --> 00:03:21.500
It's easy to forget, like five years ago, 10 years ago,

00:03:21.500 --> 00:03:24.980
people were questioning whether Python should be something you should use.

00:03:24.980 --> 00:03:25.900
Is it a real language?

00:03:25.900 --> 00:03:26.700
Should you really use it?

00:03:26.700 --> 00:03:27.600
Is it safe to use?

00:03:27.600 --> 00:03:31.220
Maybe you should use a Java or a C# or something like that

00:03:31.220 --> 00:03:33.120
because this is a real project.

00:03:33.120 --> 00:03:34.020
It's interesting.

00:03:34.020 --> 00:03:35.860
You don't hear that nearly as much anymore, do you?

00:03:35.860 --> 00:03:37.260
I grew up with Boston sports fans

00:03:37.260 --> 00:03:40.380
and it was like being a Boston sports fan was terrible for the longest time.

00:03:40.380 --> 00:03:41.240
And now it's like, okay,

00:03:41.240 --> 00:03:43.320
we don't want to hear about your problems right now.

00:03:43.320 --> 00:03:44.360
And same thing with Python.

00:03:44.360 --> 00:03:45.600
It's like, I like Python.

00:03:45.600 --> 00:03:46.380
It's like, yeah, great.

00:03:46.380 --> 00:03:47.800
So does everybody else in the world.

00:03:47.900 --> 00:03:50.200
So yeah, it's really not the issue anymore.

00:03:50.200 --> 00:03:51.760
It's now it's not the cool thing to play with.

00:03:51.760 --> 00:03:53.960
So now you got to go to Rust or something else.

00:03:53.960 --> 00:03:54.480
You know what?

00:03:54.480 --> 00:03:54.840
Shiny.

00:03:54.840 --> 00:03:56.060
LLMs are shiny.

00:03:56.060 --> 00:03:58.160
LLMs are very shiny for sure.

00:03:58.160 --> 00:03:58.440
Yeah.

00:03:58.440 --> 00:03:59.740
We can talk about them today.

00:03:59.740 --> 00:04:00.760
Yeah, that sounds great.

00:04:00.760 --> 00:04:01.440
Let's do it.

00:04:01.440 --> 00:04:04.120
First of all, we're going to talk about building applications

00:04:04.120 --> 00:04:08.760
that are basically powered by LLMs that you plug into, right?

00:04:08.760 --> 00:04:09.080
Yep.

00:04:09.080 --> 00:04:13.480
Before we get into creating LLMs, just for you, like what is,

00:04:13.480 --> 00:04:18.500
where do LLMs play a role for you in software development these days?

00:04:18.500 --> 00:04:18.880
Sure.

00:04:18.880 --> 00:04:22.300
So, you know, like everybody else, I mean, I had been playing with,

00:04:22.300 --> 00:04:25.080
so I do natural language processing as part of my job, right?

00:04:25.080 --> 00:04:29.680
So using spaCy was a big, a big part of the information extraction stack that we use

00:04:29.680 --> 00:04:31.980
because we have to deal with a lot of medical data

00:04:31.980 --> 00:04:35.980
and medical data is just unstructured and has to be cleaned up before it can be used.

00:04:35.980 --> 00:04:37.320
That was my exposure.

00:04:37.320 --> 00:04:41.100
I had seen GPTs and the idea of like generating text,

00:04:41.100 --> 00:04:44.020
just starting from that didn't really make much sense to me at the time.

00:04:44.020 --> 00:04:47.080
But then obviously like everybody else, when ChatGPT came out, I was like,

00:04:47.080 --> 00:04:48.320
oh, I get this now.

00:04:48.320 --> 00:04:52.080
Like this thing does, you know, it can basically learn in the context

00:04:52.080 --> 00:04:53.920
and it can actually produce something that's interesting

00:04:53.920 --> 00:04:56.020
and you can use it for things like information extraction.

00:04:56.020 --> 00:04:58.180
So just like everybody else, I kind of woke up to them,

00:04:58.180 --> 00:05:02.460
you know, around that time that they got released and I use them all the time, right?

00:05:02.460 --> 00:05:04.900
So ChatGPT 4 is really what I use.

00:05:04.900 --> 00:05:10.500
I would recommend if you can afford the $20 a month, it's still the best model that there is as of January 2024.

00:05:10.500 --> 00:05:12.080
And I use that for coding.

00:05:12.080 --> 00:05:17.780
I don't really like the coding tools, the co-pilots, but there, you know, there's definitely folks that swear by them.

00:05:18.120 --> 00:05:22.840
My workflow is more of, I have a problem, work with the chatbot to try to like, you know,

00:05:22.840 --> 00:05:27.220
think through all the edge cases and then think through the test case, the tests.

00:05:27.220 --> 00:05:28.980
And then I think through the code, right?

00:05:28.980 --> 00:05:32.940
And then the actual typing of the code, yeah, I'll have it do a lot of the boilerplate stuff,

00:05:32.940 --> 00:05:35.560
but then kind of shaping the APIs and things like that.

00:05:35.560 --> 00:05:37.040
I kind of like to do that myself still.

00:05:37.040 --> 00:05:38.560
I'm kind of old school, old school.

00:05:38.820 --> 00:05:41.700
I guess I'm old school as well because I'm like right there with you.

00:05:41.700 --> 00:05:47.560
But for me, I don't generally run co-pilot or those kinds of things in my editors.

00:05:47.560 --> 00:05:53.760
I do have some features turned on, but primarily it's just really nice autocomplete.

00:05:53.760 --> 00:05:54.740
You know what I mean?

00:05:54.740 --> 00:05:58.620
Like it seems like it almost just knows what I want to type anyway.

00:05:58.620 --> 00:05:59.400
And that's getting better.

00:05:59.400 --> 00:06:01.260
I don't know if anyone's noticed recently.

00:06:01.260 --> 00:06:05.900
One of the recent releases of PyCharm, it starts to autocomplete whole lines.

00:06:06.340 --> 00:06:10.000
And I don't know where it's getting this from, and I think I have the AI features turned off.

00:06:10.000 --> 00:06:11.580
At least it says I have no license.

00:06:11.580 --> 00:06:13.100
I'm guessing that means they're turned off.

00:06:13.100 --> 00:06:15.960
So it must be something more built into it.

00:06:15.960 --> 00:06:16.800
That's pretty excellent.

00:06:16.800 --> 00:06:20.440
But for me, I find I'm pretty content to just sit and write code.

00:06:20.440 --> 00:06:25.860
However, the more specific the unknowns are, the more willing I'm like,

00:06:25.860 --> 00:06:27.640
oh, I need to go to ChatGPT for this.

00:06:27.640 --> 00:06:30.200
Like, for example, like how do you use Pydantic?

00:06:30.200 --> 00:06:35.220
Like, well, I'll probably just go look at a quick code sample and see that so I can understand it.

00:06:35.220 --> 00:06:41.660
But if it's I have this time string with the date like this, the month like this,

00:06:41.660 --> 00:06:44.660
and then it has the time zone like that, how do I parse that?

00:06:44.660 --> 00:06:47.460
Or how do I generate another one like that in Python?

00:06:47.460 --> 00:06:48.600
And here's the answer.

00:06:48.600 --> 00:06:54.620
Or I have this giant weird string, and I want this part of it as extracted with a regular expression.

00:06:54.620 --> 00:06:55.920
And I want to...

00:06:55.920 --> 00:06:56.920
Regular expressions, I was just going to say that.

00:06:56.920 --> 00:06:57.560
Oh, my gosh.

00:06:57.560 --> 00:06:58.940
You don't have to write another one of those.

00:06:58.940 --> 00:06:59.420
Yeah, it's great.

00:06:59.540 --> 00:07:03.160
Yeah, it's pretty much like, do you need it to detect the end of a line straight to ChatGPT?

00:07:03.160 --> 00:07:03.700
Not really.

00:07:03.700 --> 00:07:07.400
But, you know, it's like almost any level of chat, a regular expression.

00:07:07.400 --> 00:07:11.340
I'm like, well, I need some AI for this because this is not time well spent for me.

00:07:11.340 --> 00:07:12.780
But yeah, it's interesting.

00:07:12.780 --> 00:07:12.980
Yeah.

00:07:12.980 --> 00:07:16.360
One big tip I would give people, though, is that these chatbots, they want to please you.

00:07:16.360 --> 00:07:19.060
So you have to ask it to criticize you.

00:07:19.060 --> 00:07:20.540
You have to say, here's some piece of code.

00:07:20.840 --> 00:07:21.980
Tell me all the ways it's wrong.

00:07:21.980 --> 00:07:28.680
And you have to also ask for lots of different examples because it just starts to get more creative, more things that it says.

00:07:28.680 --> 00:07:31.860
It really thinks by talking, which is a really weird thing to consider.

00:07:31.860 --> 00:07:35.000
But yeah, it's definitely some things to keep in mind when you're working with these things.

00:07:35.000 --> 00:07:37.180
And they do have these really weird things.

00:07:37.180 --> 00:07:42.000
Like if you compliment them or if you ask it, you sort of tell it, like, I really want you to tell me.

00:07:42.000 --> 00:07:43.740
It actually makes a difference, right?

00:07:43.740 --> 00:07:45.040
It's not just like a search engine.

00:07:45.040 --> 00:07:46.240
Like, well, of course, what does it care?

00:07:46.240 --> 00:07:48.100
You put these keywords in and they come out.

00:07:48.200 --> 00:07:51.080
Like, no, you've kind of got to, like, know how to talk to it just a little bit.

00:07:51.080 --> 00:07:58.680
I've seen people threatening them or, like, saying that someone's being held ransom or, you know, I like to say my boss is really mad at me.

00:07:58.680 --> 00:07:59.880
Like, help me out here, right?

00:07:59.880 --> 00:08:01.440
And, like, see if it'll generate some better code.

00:08:01.440 --> 00:08:02.680
You're not being a good user.

00:08:02.680 --> 00:08:04.380
You're trying to trick me.

00:08:04.380 --> 00:08:08.220
I've been a good chatbot and you've been a bad user and I'm not going to help you anymore.

00:08:08.220 --> 00:08:08.740
Yeah, right.

00:08:08.740 --> 00:08:12.840
That was actually basically a conversation from Bing in the early days.

00:08:12.840 --> 00:08:13.840
Yeah, the Sydney episode.

00:08:13.840 --> 00:08:14.980
Yeah, that was crazy, right?

00:08:14.980 --> 00:08:15.900
Super funny.

00:08:15.900 --> 00:08:16.780
How funny.

00:08:16.780 --> 00:08:17.340
All right.

00:08:17.340 --> 00:08:20.420
Well, I'm sure a lot of people out there are using AI these days.

00:08:20.420 --> 00:08:26.420
I think I saw a quote from, I think it was from GitHub saying over 50% of developers are using Copilot.

00:08:26.420 --> 00:08:26.840
For sure.

00:08:26.840 --> 00:08:29.640
Which is crazy, but, I mean, not that surprising.

00:08:29.640 --> 00:08:31.600
50% of the people are using Autocomplete.

00:08:31.600 --> 00:08:33.840
So, I guess it kind of, kind of like that, right?

00:08:33.840 --> 00:08:34.540
They're great tools.

00:08:34.540 --> 00:08:35.300
They're going to keep evolving.

00:08:35.300 --> 00:08:36.700
There's some other ones I'm keeping an eye on.

00:08:36.700 --> 00:08:39.420
There's one called Console, which just takes a different approach.

00:08:39.420 --> 00:08:40.700
They use some stronger models.

00:08:41.080 --> 00:08:46.780
And then there's a website called Find, P-H-I-N-D, that allows you to do some searching, that they've built their own custom model.

00:08:46.780 --> 00:08:49.760
Really interesting companies that are doing some really cool things.

00:08:49.760 --> 00:08:55.480
And then Perplexity is like the search replacement that a lot of folks are very excited about using instead of Google.

00:08:55.680 --> 00:08:57.280
So, there's a lot of different tools out there.

00:08:57.280 --> 00:09:02.300
You could spend all your day just kind of playing around and learning these things where you got to actually kind of get some stuff done, too.

00:09:02.300 --> 00:09:04.020
Yeah, you got to pick something and go, right?

00:09:04.020 --> 00:09:11.920
Because with all the churn and growth and experimentation we got, you probably could try a new tool every day and still not try them all, you know?

00:09:11.920 --> 00:09:13.460
Just be falling farther behind.

00:09:13.460 --> 00:09:15.080
So, you got to pick something and go.

00:09:15.080 --> 00:09:15.800
And go, yep.

00:09:15.800 --> 00:09:18.240
Let's talk about writing some code.

00:09:18.680 --> 00:09:24.660
Yeah, the next thing you're going to do after you, you know, use a chatbot is to, you know, hit an API.

00:09:24.660 --> 00:09:32.200
Like, if you're going to program an app and that app is going to have LLM inside of it, large language models inside of it, APIs are pretty much the next step, right?

00:09:32.200 --> 00:09:35.020
So, OpenAI has different models that are available.

00:09:35.020 --> 00:09:39.640
This is a web page that I just saw recently that will actually, you know, compare the different models that are out there.

00:09:39.640 --> 00:09:41.940
So, there's obviously the big guy, which is OpenAI.

00:09:41.940 --> 00:09:45.720
And you can get that through Azure as well if you have a Microsoft arrangement.

00:09:45.720 --> 00:09:53.680
And there's some security reasons or HIPAA compliance and, you know, some other reasons that you might want to talk through Azure instead of going directly to OpenAI.

00:09:53.680 --> 00:09:56.600
I'd defer to your IT department about that.

00:09:56.600 --> 00:10:03.040
Google has Gemini, which they just released the Pro version, which I believe is as strong as 3.5, roughly.

00:10:03.040 --> 00:10:12.340
That is interesting because if you don't care about them training on your data, if, like, whatever you're doing is just, like, not super proprietary or something you're trying to keep secret,

00:10:12.340 --> 00:10:17.540
they're offering free API access, I believe 60 words per minute, right?

00:10:17.540 --> 00:10:20.620
So, basically, one a second, you can call this thing and there's no charge.

00:10:20.620 --> 00:10:23.360
So, I don't know how long that's going to last.

00:10:23.360 --> 00:10:27.520
So, if you have an interesting project that you want to use in a large language model for, you might want to look at that.

00:10:27.520 --> 00:10:30.560
Yeah, especially if it's already open data that you're playing with.

00:10:30.560 --> 00:10:31.260
Exactly, right.

00:10:31.260 --> 00:10:36.020
Or data you've somehow published to the web that has certainly been consumed by these things.

00:10:36.020 --> 00:10:37.560
And these models are going to train on it, right?

00:10:37.700 --> 00:10:38.740
That's the trade, right?

00:10:38.740 --> 00:10:41.440
They're trying to get more tokens, is what they call it, right?

00:10:41.440 --> 00:10:44.500
The tokens are what they need to actually make these models smarter.

00:10:44.500 --> 00:10:47.060
So, everyone's just hunting for more tokens.

00:10:47.060 --> 00:10:48.960
And I think this is part of their strategy for that.

00:10:48.960 --> 00:10:51.520
And then there's also a clod by Anthropic.

00:10:51.520 --> 00:10:55.580
And then after that, you get into the, you know, kind of the open source APIs as well.

00:10:55.580 --> 00:10:57.860
There's some really powerful open source ones out there.

00:10:57.860 --> 00:11:02.020
Yeah, so this website, yeah, this is DocsBot for people listening.

00:11:02.020 --> 00:11:02.880
DocsBot.ai.

00:11:02.880 --> 00:11:07.300
And is it sole purpose just to tell you price comparisons and stuff like that?

00:11:07.300 --> 00:11:08.220
Or does it have more than it?

00:11:08.220 --> 00:11:10.300
I assume this company's got some product.

00:11:10.300 --> 00:11:11.560
Unfortunately, I don't know what it is.

00:11:11.560 --> 00:11:13.280
I saw this link that they put out there.

00:11:13.280 --> 00:11:14.660
And it's a calculator.

00:11:14.660 --> 00:11:17.460
So, you basically can put your tokens, how many tokens.

00:11:17.460 --> 00:11:19.600
There's input tokens and there's output tokens, right?

00:11:19.600 --> 00:11:22.200
So, they're going to charge more on the output tokens.

00:11:22.200 --> 00:11:23.840
That's for the most part.

00:11:23.840 --> 00:11:26.160
Some of the models are, you know, more equal.

00:11:26.780 --> 00:11:30.420
And then what they do is, if you can figure out, like, roughly how big a message is going

00:11:30.420 --> 00:11:33.960
to be, both the input and the output, how many calls you're going to make, you can use

00:11:33.960 --> 00:11:36.960
that to then calculate basically the cost.

00:11:36.960 --> 00:11:41.820
And the cost is always at, like, tokens per thousand, you know, or dollars or pennies, really.

00:11:41.820 --> 00:11:43.200
Pennies per thousand tokens.

00:11:43.200 --> 00:11:45.520
And then it's just a math equation at that point.

00:11:45.520 --> 00:11:48.780
And what you'll find is calling GPT-4 is going to be super expensive.

00:11:48.780 --> 00:11:53.880
And then calling, you know, a small 7, what's called the 7B model from Mistral is going to

00:11:53.880 --> 00:11:54.440
be the cheapest.

00:11:55.100 --> 00:11:57.020
And you're just going to look for these different providers.

00:11:57.020 --> 00:11:58.900
Well, the prices really are different.

00:11:58.900 --> 00:12:07.240
Like, for example, OpenAI Azure GPT-4 is a little over three cents per call, whereas GPT-3.5

00:12:07.240 --> 00:12:11.580
Turbo is one-tenth of one cent.

00:12:11.580 --> 00:12:13.220
It's a big difference there.

00:12:13.220 --> 00:12:16.880
It's 11 cents versus $3 to have a conversation with it.

00:12:16.960 --> 00:12:18.480
Yes, it's a very, very wide difference.

00:12:18.480 --> 00:12:21.800
And it's all based on, you know, how much compute do these models take, right?

00:12:21.800 --> 00:12:26.420
Because the bigger the model, the more accurate it is, but also the more expensive it is for

00:12:26.420 --> 00:12:26.960
them to run it.

00:12:26.960 --> 00:12:29.140
So that's why there's such a cost difference.

00:12:30.900 --> 00:12:33.540
This portion of Talk Python To Me is brought to you by Sentry.

00:12:33.540 --> 00:12:37.920
In the last episode, I told you about how we use Sentry to solve a tricky problem.

00:12:37.920 --> 00:12:43.060
This time, I want to talk about making your front-end and back-end code work more tightly together.

00:12:43.060 --> 00:12:48.580
If you're having a hard time getting a complete picture of how your app is working and how

00:12:48.580 --> 00:12:53.260
requests flow from the front-end JavaScript app back to your Python services down into

00:12:53.260 --> 00:12:58.420
database calls for errors and performance, you should definitely check out Sentry's distributed

00:12:58.420 --> 00:12:58.860
tracing.

00:12:58.860 --> 00:13:03.800
With distributed tracing, you'll be able to track your software's performance, measure metrics

00:13:03.800 --> 00:13:09.000
like throughput and latency, and display the impact of errors across multiple systems.

00:13:09.700 --> 00:13:14.320
Distributed tracing makes Sentry a more complete performance monitoring solution, helping you

00:13:14.320 --> 00:13:18.440
diagnose problems and measure your application's overall health more quickly.

00:13:18.440 --> 00:13:24.700
Tracing in Sentry provides insights such as what occurred for a specific event or issue, the

00:13:24.700 --> 00:13:29.320
conditions that cause bottlenecks or latency issues, and the endpoints and operations that

00:13:29.320 --> 00:13:30.340
consume the most time.

00:13:30.340 --> 00:13:33.860
Help your front-end and back-end teams work seamlessly together.

00:13:33.860 --> 00:13:39.540
Check out Sentry's distributed tracing at talkpython.fm/sentry-trace.

00:13:39.540 --> 00:13:43.360
That's talkpython.fm/sentry-trace.

00:13:43.360 --> 00:13:50.320
And when you sign up, please use our code TALKPYTHON, all caps, no spaces, to get more features and

00:13:50.320 --> 00:13:51.620
let them know that you came from us.

00:13:51.620 --> 00:13:53.740
Thank you to Sentry for supporting the show.

00:13:55.440 --> 00:14:00.040
Yeah, I recently interviewed, just released a while ago, interviewed because of time shifting

00:14:00.040 --> 00:14:05.280
on podcasts, Mark Rosinovich, CTO of Azure, and we talked about all the crazy stuff that

00:14:05.280 --> 00:14:10.400
they're doing for coming up with just running these computers that handle all of this compute,

00:14:10.400 --> 00:14:12.020
and it's really a lot.

00:14:12.020 --> 00:14:13.580
There was a GPU shortage for a while.

00:14:13.580 --> 00:14:14.880
I don't know if that's still going on.

00:14:14.880 --> 00:14:19.460
And obviously, you know, the big companies are buying hundreds of thousands of these GPUs

00:14:19.460 --> 00:14:21.180
to get the scale they need.

00:14:21.180 --> 00:14:25.620
And so once you figure out which API you want to use, then you want to talk about the

00:14:25.620 --> 00:14:25.940
library.

00:14:25.940 --> 00:14:30.140
So now, you know, most of these providers, they have, you know, a Python library that they

00:14:30.140 --> 00:14:30.400
offer.

00:14:30.400 --> 00:14:35.920
I know OpenAI does and Google with Gemini does, but there's also open source ones, right?

00:14:35.920 --> 00:14:38.680
Because they're not very complicated to talk to.

00:14:38.680 --> 00:14:40.700
It's just basically HTTP requests.

00:14:40.700 --> 00:14:44.440
So it's just really a matter of like, what's the ergonomics you're looking for as a developer

00:14:44.440 --> 00:14:46.400
to interact with these things?

00:14:46.560 --> 00:14:49.400
And most importantly, make sure you're maintaining optionality, right?

00:14:49.400 --> 00:14:54.480
Like, it's great to do a prototype with one of these models or recognize you might want

00:14:54.480 --> 00:14:58.120
to switch either for cost reasons or performance reasons or what have you.

00:14:58.120 --> 00:15:03.520
And, you know, LangChain, for instance, has a ton of the providers as part of you basically

00:15:03.520 --> 00:15:07.900
are just switching a few arguments when you're switching between them.

00:15:07.900 --> 00:15:13.580
And then Simon Willison has, you know, of Python fame, has an LLM project where he's defined,

00:15:13.940 --> 00:15:18.220
you know, basically a set of, and it's really clean just the way he's organized it, because

00:15:18.220 --> 00:15:20.120
you can just add plugins as you need them, right?

00:15:20.120 --> 00:15:22.960
So you don't have to install all the different libraries that are out there.

00:15:22.960 --> 00:15:25.040
And I think LangChain is kind of following a similar approach.

00:15:25.040 --> 00:15:29.660
I think they're coming up with a LangChain core capability where you can just kind of bring

00:15:29.660 --> 00:15:30.760
in things as you need them.

00:15:30.760 --> 00:15:36.660
And so the idea is you're now coding against these libraries and you're trying to bring

00:15:36.660 --> 00:15:40.820
together, you know, the text you need to have analyzed or whatever your use case is.

00:15:40.820 --> 00:15:42.940
And then it'll come back with the generation.

00:15:42.940 --> 00:15:45.040
And you can also not just use them on the cloud.

00:15:45.040 --> 00:15:48.500
You can use open source ones as well and run them locally on your local computer.

00:15:48.500 --> 00:15:54.500
I'd never really thought about my architectural considerations, I guess, of these sorts of things.

00:15:54.500 --> 00:15:58.080
But of course, you want to set up some kind of abstraction layer.

00:15:58.080 --> 00:16:01.600
So you're not completely tied into some provider.

00:16:01.600 --> 00:16:03.600
I mean, it could be that it becomes too expensive.

00:16:03.600 --> 00:16:05.140
It could be that it becomes too slow.

00:16:05.140 --> 00:16:07.260
But it also might just be something that's better.

00:16:07.640 --> 00:16:09.500
It could be something else that comes along that's better.

00:16:09.500 --> 00:16:11.200
And you're like, we could switch.

00:16:11.200 --> 00:16:12.540
It's 25% better.

00:16:12.540 --> 00:16:18.300
But it's like a week to pull all the details of this one LLM out and put the new ones in.

00:16:18.300 --> 00:16:19.360
And so it's not worth it.

00:16:19.360 --> 00:16:19.540
Right.

00:16:19.540 --> 00:16:24.580
So you like having being tied to a particular database rather than more general.

00:16:24.580 --> 00:16:25.520
It's a similar idea.

00:16:25.520 --> 00:16:27.280
And especially at this moment in time, right?

00:16:27.280 --> 00:16:28.460
Every couple of months, something.

00:16:28.860 --> 00:16:31.880
So something from the bottom up is getting better and better.

00:16:31.880 --> 00:16:36.680
Meaning, you know, Llama came out a year ago and then Llama 2 and Mistral and Mixtral.

00:16:36.680 --> 00:16:40.180
And, you know, Llama 3 is going to be coming out later this year, we believe.

00:16:40.180 --> 00:16:48.720
And so those models, which are smaller and cheaper and easier to use, are not easier to use, but they're just cheaper, is those things are happening all the time.

00:16:48.780 --> 00:16:54.240
So being able to be flexible and nimble and kind of change where you are is going to be crucial, at least for the next couple of years.

00:16:54.240 --> 00:16:54.480
Yeah.

00:16:54.480 --> 00:16:56.340
The example that I gave was databases, right?

00:16:56.340 --> 00:17:02.120
And databases have been kind of a known commodity since the 80s or what, 1980s?

00:17:02.120 --> 00:17:05.640
And of course, there's new ones that come along, but they're kind of all the same.

00:17:05.640 --> 00:17:10.640
And, you know, we've got, there was MySQL, now there's Postgres that people love and, right?

00:17:10.640 --> 00:17:13.980
So that is changing way, way slower than this.

00:17:13.980 --> 00:17:17.740
And people are like, well, we got to think about those kinds of like, don't get tied into that.

00:17:17.740 --> 00:17:18.480
Well, sure.

00:17:18.480 --> 00:17:19.520
It's way less stable.

00:17:19.520 --> 00:17:19.900
Right.

00:17:19.900 --> 00:17:23.620
And people, you know, create layers of abstraction there, too, is right.

00:17:23.620 --> 00:17:28.260
You got SQLAlchemy and then, you know, Sebastian from FastAPI has SQL model.

00:17:28.260 --> 00:17:34.560
That's a layer on top of SQLAlchemy, you know, and then there's also, you know, folks that just like writing clean NC SQL.

00:17:34.560 --> 00:17:37.940
And you can, you know, hopefully be able to port that from database to database as well.

00:17:37.940 --> 00:17:40.920
So it's the same principles, separation of concerns.

00:17:40.920 --> 00:17:42.340
So you can kind of be flexible.

00:17:42.340 --> 00:17:42.760
All right.

00:17:42.760 --> 00:17:44.280
So you talked about LangChain.

00:17:44.280 --> 00:17:46.660
Just give us a sense real quick of what LangChain is.

00:17:46.660 --> 00:17:49.540
This was a great project from a timing perspective.

00:17:49.540 --> 00:17:53.500
I believe they kind of invented it and released it right around the time ChatGPT came out.

00:17:53.500 --> 00:18:00.060
It's a very comprehensive library with lots of, I mean, the best part about LangChain to me is the documentation and the code samples.

00:18:00.060 --> 00:18:00.420
Right.

00:18:00.420 --> 00:18:09.440
Because if you want to learn how to interact with a different large language model or work with a vector database, there's another library called Lama Index that does a really good job at this as well.

00:18:09.440 --> 00:18:12.140
They have tons and tons of documentation and examples.

00:18:12.140 --> 00:18:14.840
So you can kind of look at those and try to understand it.

00:18:15.080 --> 00:18:20.080
The chaining part really came from the idea of like, okay, prompt the large language model gives a response.

00:18:20.080 --> 00:18:25.180
Now I'm going to take that response and prompt and prompt and, you know, again, with a new prompt using that output.

00:18:25.180 --> 00:18:28.900
The challenge with that is the reliability of these models, right?

00:18:28.900 --> 00:18:30.560
They're not going to get close.

00:18:30.560 --> 00:18:33.480
They're not close to 100% accurate on these types of tasks.

00:18:33.480 --> 00:18:37.860
You know, the idea of agents as well as another thing that you might build with a LangChain.

00:18:38.100 --> 00:18:47.740
And the idea there is basically the agent is, you know, getting a task, coming up with a plan of that for that task and then kind of, you know, stepping through those tasks to get the job done.

00:18:47.740 --> 00:18:53.620
Once again, we're just not there yet as far as those technologies just because of the reliability.

00:18:53.620 --> 00:18:59.080
And then there's also a bunch of security concerns that, you know, that are out there too that you should definitely be aware of.

00:18:59.080 --> 00:19:02.780
Like one term to Google and make sure you understand is prompt injection.

00:19:03.560 --> 00:19:05.380
And so Simon, once again, he's got a great blog.

00:19:05.380 --> 00:19:11.540
He's got a great blog article and, or just even that tag on his blog is, you know, tons of articles around prompt injection.

00:19:11.540 --> 00:19:14.280
And, and prompt injection is basically the idea.

00:19:14.280 --> 00:19:27.880
You have an app, a user says something in the app or like types into the, to the, whatever the input is and whatever text that they're sending through, just like with SQL injection, they kind of hijacks the conversation and causes the large language model to kind of do a different thing.

00:19:27.880 --> 00:19:31.080
Little Bobby Llama, we call him instead of little Bobby tables.

00:19:31.800 --> 00:19:42.240
And then the other wild one is like, you know, people are putting stuff up on the internet so that when the large language model browses for web pages and brings back text, it's, you know, reading the HTML or reading the text in the HTML.

00:19:42.240 --> 00:19:45.920
And it's causing the large language model to behave in some unexpected way.

00:19:45.920 --> 00:19:49.200
So there's lots of, lots of crazy challenges out there.

00:19:49.200 --> 00:19:55.540
I'm sure there's a lot of adversarial stuff happening to these things as they're both trying to gather data and then trying to run.

00:19:55.540 --> 00:19:55.780
Right.

00:19:56.080 --> 00:20:00.860
I saw the most insane, I guess it was an article, I saw it in RSS somewhere.

00:20:00.860 --> 00:20:10.220
And it was saying that on Amazon, there's all these knockoff brands that are trying to, you know, instead of Gucci, you have a Gucci or I don't know, whatever.

00:20:10.220 --> 00:20:10.700
Right.

00:20:10.980 --> 00:20:13.480
And they're getting so lazy.

00:20:13.480 --> 00:20:20.400
I don't know what the right word is that they're using LLMs to try to write a description that is sort of a, in the style of Gucci, let's say.

00:20:20.400 --> 00:20:20.740
Right.

00:20:20.740 --> 00:20:24.040
And it'll come back and say, I'm sorry, I'm a large language model.

00:20:24.040 --> 00:20:30.380
I'm not, my, my rules forbid me from doing brand trademark violation.

00:20:30.380 --> 00:20:31.080
Right.

00:20:31.220 --> 00:20:33.520
That's what the Amazon listing says on Amazon.

00:20:33.520 --> 00:20:35.480
They just take it and they just straight pump it straight.

00:20:35.480 --> 00:20:37.140
Whatever it says, it just goes straight into Amazon.

00:20:37.140 --> 00:20:37.600
Yeah.

00:20:37.600 --> 00:20:41.120
You have to like Google, like, sorry, I'm not, sorry as a large language model or sorry as a whatever.

00:20:41.120 --> 00:20:41.440
Yeah.

00:20:41.440 --> 00:20:42.040
Exactly.

00:20:42.040 --> 00:20:44.460
And there's like the product listings are full of that.

00:20:44.460 --> 00:20:45.280
It's amazing.

00:20:45.280 --> 00:20:46.260
It's amazing.

00:20:46.260 --> 00:20:46.860
It's crazy.

00:20:46.860 --> 00:20:52.400
Certainly the reliability of that is, you know, they could probably use some testing and those kinds of things.

00:20:52.400 --> 00:20:52.680
For sure.

00:20:52.680 --> 00:20:59.780
Oh, and out there asked, like, I wonder if the, for local LLM models, there's a similar site as DocSpot that show you like what you need to run it locally.

00:21:00.240 --> 00:21:01.820
So that's an interesting question.

00:21:01.820 --> 00:21:04.360
Also segue to maybe talk about like some local stuff.

00:21:04.360 --> 00:21:04.860
LLM studio.

00:21:04.860 --> 00:21:06.520
This is a new, a new product.

00:21:06.520 --> 00:21:12.760
I honestly haven't had a chance to like really dig in and understand who created this and, you know, make sure that the privacy stuff is up to snuff.

00:21:12.760 --> 00:21:14.320
But I've played around with it locally.

00:21:14.320 --> 00:21:15.320
It seems to work great.

00:21:15.320 --> 00:21:17.100
It's really slick, really nice user interface.

00:21:17.100 --> 00:21:22.460
So if you're just wanting to get your feet wet and try to understand some of these models, I download that and check it out.

00:21:22.460 --> 00:21:24.820
There's a ton of models up on Hugging Face.

00:21:24.820 --> 00:21:29.040
This product seems to just basically link right into the Hugging Face interface.

00:21:29.260 --> 00:21:30.900
And grabs models.

00:21:30.900 --> 00:21:34.980
And so some of the models you want to look for are right now as in January, right?

00:21:34.980 --> 00:21:38.660
There's Mistral 7B, you know, M-I-S-T-R-A-L.

00:21:38.660 --> 00:21:40.660
There's another one called Phi 2.

00:21:40.660 --> 00:21:49.740
Those are two of the smaller models that should run pretty well on, you know, like a commercial grade GPU or an M1 or an M2 Mac, if that's what you have.

00:21:50.380 --> 00:21:51.820
And start playing with them.

00:21:51.820 --> 00:21:59.780
And they're quantized, which means they're just kind of made a little, take a little bit less space, which is good from like a virtual RAM with regards to these GPUs.

00:21:59.780 --> 00:22:04.260
And, you know, there's a account on Hugging Face called The Bloke.

00:22:04.260 --> 00:22:08.940
If you look for him, you'll see all his different fine tunes and things like that.

00:22:09.140 --> 00:22:12.880
And there's a group called Noose, I think is how you pronounce it, N-O-U-S.

00:22:12.880 --> 00:22:18.760
And they've got some of the fine tunes that are basically the highest performing ones that are out there.

00:22:18.760 --> 00:22:26.520
So if you're really looking for a high performing local model that can actually, you know, help you with code or reasoning, those are definitely the way to get started.

00:22:26.660 --> 00:22:28.200
Yeah, this one seems pretty nice.

00:22:28.200 --> 00:22:29.720
I also haven't played with it.

00:22:29.720 --> 00:22:30.600
I just learned about it.

00:22:30.600 --> 00:22:32.420
But it's looking really good.

00:22:32.420 --> 00:22:36.520
I had played with, what was it, GPT for All, I think is what it was.

00:22:36.520 --> 00:22:36.980
Yep, yep.

00:22:36.980 --> 00:22:38.300
It was the one that I played with.

00:22:38.300 --> 00:22:42.980
Somehow this looks like, it looks a little bit nicer than that for some, I don't know how different it really is.

00:22:42.980 --> 00:22:47.360
But I mean, it's all the idea of like downloading these files and running them locally.

00:22:47.360 --> 00:22:49.740
And these are just user interfaces that make it a little easier.

00:22:49.740 --> 00:22:54.020
The original project that made this stuff kind of possible was a project called Llama CPP.

00:22:54.840 --> 00:22:57.900
There's a Python library that can work with that directly.

00:22:57.900 --> 00:23:04.140
There's another project called Llama File, where if you download the whole thing, it actually runs no matter where you are.

00:23:04.140 --> 00:23:08.540
I think it runs on Mac and Linux and Windows and BSD or whatever it is.

00:23:08.540 --> 00:23:12.600
And it's, I mean, it's an amazing technology that this one put together.

00:23:12.600 --> 00:23:13.420
It's really impressive.

00:23:13.420 --> 00:23:17.160
And then, you know, you can actually just use Google Colab too, right?

00:23:17.160 --> 00:23:19.700
So Google Colab has some GPUs with it.

00:23:19.700 --> 00:23:24.580
If you, I think if you upgrade it to the $10 a month version, I think you get some better GPUs access.

00:23:25.360 --> 00:23:27.480
So if you actually want to get a hand of like running.

00:23:27.480 --> 00:23:29.000
And so this is a little bit different, right?

00:23:29.000 --> 00:23:34.120
So instead of calling an API, when you're using Google Colab, you can actually use a library called Hugging Face.

00:23:34.120 --> 00:23:40.180
And then you can actually load these things directly into your memory and then into your actual Python environment.

00:23:40.180 --> 00:23:41.460
And then you're working with it directly.

00:23:41.740 --> 00:23:45.860
So it just takes a little bit of work to make sure you're running it on the GPU.

00:23:45.860 --> 00:23:48.420
Because if you're running it on the CPU, it's going to be a lot slower.

00:23:48.420 --> 00:23:49.920
Yeah, it definitely makes a big difference.

00:23:49.920 --> 00:23:55.500
There's a tool that I use that for a long time right on the CPU and they rewrote it to run on the GPU.

00:23:55.500 --> 00:23:59.480
Even on my M2 Pro, it was like three times faster or something.

00:23:59.480 --> 00:23:59.840
Yeah.

00:23:59.840 --> 00:24:00.240
For sure.

00:24:00.380 --> 00:24:01.500
It makes a big difference.

00:24:01.500 --> 00:24:08.700
So with the LM Studio, let's you run the LLMs offline and use models through an open AI.

00:24:08.700 --> 00:24:09.580
That's what I was looking for.

00:24:09.580 --> 00:24:11.940
The open AI compatible local server.

00:24:11.940 --> 00:24:12.560
Right.

00:24:12.560 --> 00:24:16.460
You could basically get an API for any of these and then start programming against it, right?

00:24:16.460 --> 00:24:17.040
Exactly right.

00:24:17.040 --> 00:24:19.280
And it's basically the same interface, right?

00:24:19.280 --> 00:24:25.660
So same APIs for posting in response of the JSON schema that's going back and forth.

00:24:25.660 --> 00:24:32.800
So you're programming against that interface and then you basically port it and move it to another, to the open AI models if you wanted to as well.

00:24:32.800 --> 00:24:36.760
So everyone's kind of coalescing around open AI as kind of like the quote unquote standard.

00:24:36.760 --> 00:24:41.640
But there's nothing, you know, there's really no, there's no mode around that standard as well, right?

00:24:41.640 --> 00:24:44.200
Because anybody can kind of adopt it and use it.

00:24:44.200 --> 00:24:46.900
There's not like a W3C committee choosing.

00:24:46.900 --> 00:24:48.120
Correct.

00:24:48.120 --> 00:24:49.880
The market will choose for us.

00:24:49.880 --> 00:24:50.320
Let's go.

00:24:50.320 --> 00:24:52.060
It seems to be working out well.

00:24:52.060 --> 00:24:55.480
And that's another benefit of Simon's LLM project, right?

00:24:55.480 --> 00:25:00.580
He's got the ability to kind of switch back and forth between these different libraries and APIs as well.

00:25:00.580 --> 00:25:05.540
This LM Studio says, this app does not collect data nor monitor your actions.

00:25:05.540 --> 00:25:07.260
Your data stays local on your machine.

00:25:07.260 --> 00:25:08.360
Free for personal use.

00:25:08.360 --> 00:25:09.100
All that sounds great.

00:25:09.100 --> 00:25:11.060
For business use, please get in touch.

00:25:11.060 --> 00:25:14.740
I always just like these, like, if you got to ask, it's too much type of thing.

00:25:14.740 --> 00:25:15.740
Probably.

00:25:15.740 --> 00:25:16.120
Yeah.

00:25:16.120 --> 00:25:18.680
I'm using it for personal use just so if anybody's watching, yes.

00:25:18.680 --> 00:25:19.060
Yeah.

00:25:19.060 --> 00:25:19.640
Just playing around.

00:25:19.640 --> 00:25:22.600
Either they just haven't thought it through and they just don't want to talk about it yet.

00:25:22.600 --> 00:25:24.220
Or it's really expensive.

00:25:24.220 --> 00:25:25.820
I just probably imagine it's price.

00:25:25.820 --> 00:25:27.240
Like, ah, we haven't figured out a business model.

00:25:27.240 --> 00:25:28.220
Just, I don't know.

00:25:28.220 --> 00:25:28.940
Shoot us a note.

00:25:28.940 --> 00:25:29.300
Nope.

00:25:29.300 --> 00:25:31.080
They're concentrating on the product, which makes sense.

00:25:31.080 --> 00:25:31.420
Yeah.

00:25:31.600 --> 00:25:35.120
So then the other one is Llamafile.ai that you mentioned.

00:25:35.280 --> 00:25:36.520
And this packages it up.

00:25:36.520 --> 00:25:40.040
I guess going back to the LM Studio real quick.

00:25:40.040 --> 00:25:45.520
One of the things that's cool about this is if it's the OpenAI API, right?

00:25:45.520 --> 00:25:47.160
With this little local server that you can play with.

00:25:47.400 --> 00:25:52.340
But then you can pick LLM such as Llama, Falcon, Repl, Replit.

00:25:52.340 --> 00:25:52.760
Replit.

00:25:52.760 --> 00:25:53.380
Replit.

00:25:53.380 --> 00:25:54.620
All the different ones, right?

00:25:54.620 --> 00:25:56.060
Star Coder and so on.

00:25:56.060 --> 00:26:01.800
It would let you write an app as if it was going to OpenAI and then just start swapping in models

00:26:01.800 --> 00:26:03.640
and go like, oh, we switch to this model.

00:26:03.640 --> 00:26:04.120
How'd that work?

00:26:04.120 --> 00:26:05.760
But you don't even have to change any code, right?

00:26:05.760 --> 00:26:09.820
Just probably maybe a string that says which model to initialize.

00:26:09.820 --> 00:26:11.980
One of the tricks, though, is then the prompts themselves.

00:26:11.980 --> 00:26:12.540
All right.

00:26:12.540 --> 00:26:13.280
Let's talk about it.

00:26:13.280 --> 00:26:13.460
Yeah.

00:26:13.460 --> 00:26:15.460
The models themselves act differently.

00:26:15.980 --> 00:26:19.660
And part of this whole world is what they call prompt engineering, right?

00:26:19.660 --> 00:26:25.860
So prompt engineering is really just exploring how to interact with these models, how to make

00:26:25.860 --> 00:26:28.900
sure that they're kind of in the right mind space to tackle your problem.

00:26:28.900 --> 00:26:32.620
A lot of the times that people get when they struggle with these things, it's really just

00:26:32.620 --> 00:26:36.160
they've really got to think more like a psychiatrist when they work with a model.

00:26:36.160 --> 00:26:38.860
They're basically getting them kind of prepared.

00:26:38.860 --> 00:26:44.420
One of the tricks people did figured out early was you're a genius at software development,

00:26:44.620 --> 00:26:50.320
like compliment the thing, make it feel like, oh, I'm going to behave like I'm a world rock

00:26:50.320 --> 00:26:51.700
star programmer, right?

00:26:51.700 --> 00:26:52.780
Well, it's going to give you average.

00:26:52.780 --> 00:26:54.900
But if you tell them I'm genius, then let's start.

00:26:54.900 --> 00:26:55.560
We'll do that.

00:26:55.560 --> 00:26:55.900
Yeah.

00:26:55.900 --> 00:26:59.920
And there was also a theory like that in December that the large language models were getting

00:26:59.920 --> 00:27:02.520
dumber because it was the holidays and people don't work as hard, right?

00:27:02.520 --> 00:27:06.460
Like it's really hard to know like which of these things are true or not.

00:27:06.460 --> 00:27:09.020
But it's definitely true that each model is a little bit different.

00:27:09.020 --> 00:27:13.480
And if you write a prompt that works really well on one model, even if it's a stronger

00:27:13.480 --> 00:27:18.040
model or a weaker model, and then you port it to another model and it's, you know, that

00:27:18.040 --> 00:27:20.480
then the stronger model works worse, right?

00:27:20.480 --> 00:27:22.880
It can be very counterintuitive at times.

00:27:22.880 --> 00:27:24.860
And you just got to you've got to test things out.

00:27:24.860 --> 00:27:27.460
And that really gets to the idea of evals, right?

00:27:27.520 --> 00:27:30.520
So evaluation is really a key problem, right?

00:27:30.520 --> 00:27:34.260
Making sure that if you're going to be writing prompts and you're going to be building, you

00:27:34.260 --> 00:27:39.540
know, different retrieval augmented generation solutions, you need to know about prompt injection

00:27:39.540 --> 00:27:43.800
and you need to know about prompt engineering and you need to know what these things can

00:27:43.800 --> 00:27:44.340
and can't do.

00:27:44.340 --> 00:27:49.140
One trick is what they call few shot prompting, which is, you know, if you wanted to do data

00:27:49.140 --> 00:27:54.400
extraction, you can say, OK, I want you to extract data from text that I give you in JSON.

00:27:54.400 --> 00:27:59.060
If you give it a few examples, like wildly different examples, because the giving it a

00:27:59.060 --> 00:28:02.860
bunch of similar stuff, it might kind of cause it to just coalesce around those similar examples.

00:28:02.860 --> 00:28:05.480
But you can give it a wildly different set of examples.

00:28:05.480 --> 00:28:08.840
That's called in context learning or few shot prompting.

00:28:08.840 --> 00:28:11.940
And it will do a better job at that specific task for you.

00:28:11.940 --> 00:28:12.500
That's super neat.

00:28:12.500 --> 00:28:18.900
When you're creating your apps, you do things like here's the input from the program or from

00:28:18.900 --> 00:28:20.380
the user or wherever it came from.

00:28:20.380 --> 00:28:23.440
But maybe before that, you give it like three or four prompts.

00:28:23.620 --> 00:28:25.600
And then let let it have the question.

00:28:25.600 --> 00:28:25.900
Right.

00:28:25.900 --> 00:28:30.880
Instead of just taking the text, like I'm going to ask you questions about biology and genetics,

00:28:30.880 --> 00:28:32.880
and it's going to be under this context.

00:28:32.880 --> 00:28:34.640
And I want you to favor these data sources.

00:28:34.640 --> 00:28:35.820
Now ask your question.

00:28:35.820 --> 00:28:36.600
Something like this.

00:28:36.600 --> 00:28:37.140
For sure.

00:28:37.140 --> 00:28:39.960
All those types of strategies are worth experimenting with.

00:28:39.960 --> 00:28:40.120
Right.

00:28:40.120 --> 00:28:42.320
Like what actually will work for your scenario?

00:28:42.320 --> 00:28:43.340
I can't tell you.

00:28:43.340 --> 00:28:43.480
Right.

00:28:43.480 --> 00:28:44.460
You got to dig in.

00:28:44.460 --> 00:28:45.300
You got to figure it out.

00:28:45.300 --> 00:28:47.400
And you got to try different different things.

00:28:47.400 --> 00:28:50.480
You're about to win the Nobel Prize in genetics for your work.

00:28:50.480 --> 00:28:52.180
Now I need to ask you some questions.

00:28:52.180 --> 00:28:52.860
For sure.

00:28:52.920 --> 00:28:53.580
That will definitely work.

00:28:53.580 --> 00:28:57.260
And then threatening it that your boss is mad at you is also going to help you too.

00:28:57.260 --> 00:28:57.480
Right.

00:28:57.480 --> 00:28:57.800
For sure.

00:28:57.800 --> 00:29:00.260
If I don't solve this problem, I'm going to get fired.

00:29:00.260 --> 00:29:03.520
As a large language model, I can't tell you, but I'm going to be fired.

00:29:03.520 --> 00:29:03.840
All right.

00:29:03.840 --> 00:29:04.800
Well, then the answer is.

00:29:04.800 --> 00:29:05.800
Exactly right.

00:29:05.920 --> 00:29:09.220
So for these, they run, like you said, they run pretty much locally.

00:29:09.220 --> 00:29:14.080
These, these different models on LM studio and others like the llama file and so on.

00:29:14.080 --> 00:29:16.180
If I had a laptop, I don't need a cluster.

00:29:16.180 --> 00:29:20.260
Llama CPP is really the project that should get all the credit for, for, for making this

00:29:20.260 --> 00:29:21.620
work on your, on your laptops.

00:29:21.620 --> 00:29:25.440
And then llama file and llama CPP all, all have servers.

00:29:25.440 --> 00:29:28.600
So I'm guessing LM studio is just exposing that server.

00:29:28.600 --> 00:29:28.920
Yeah.

00:29:28.920 --> 00:29:31.500
And that's in the base llama CPP project.

00:29:31.500 --> 00:29:32.560
That's really what it is.

00:29:32.560 --> 00:29:36.020
It's really just about now, now you can post your requests.

00:29:36.320 --> 00:29:41.060
It's handling all of the work with regards to the token generation on the backend using

00:29:41.060 --> 00:29:41.620
llama CPP.

00:29:41.620 --> 00:29:45.220
And then it's returning it to using the HTTP, you know, kind of processes.

00:29:45.220 --> 00:29:47.040
Is llama originally from meta?

00:29:47.040 --> 00:29:47.940
Is that where that came from?

00:29:47.940 --> 00:29:51.460
I think there were people that were kind of using that LLM, right?

00:29:51.460 --> 00:29:55.560
I think people were kind of keying off the llama thing at one point.

00:29:55.560 --> 00:29:59.500
I think a llama index, for instance, I think that project was originally called GPT index.

00:29:59.500 --> 00:30:02.920
And they decided, oh, I don't want to be like, I don't want to confuse myself with open

00:30:02.920 --> 00:30:04.440
AI or confuse my project with open AI.

00:30:04.540 --> 00:30:07.680
So they switched the llama index and then of course, meta released llama.

00:30:07.680 --> 00:30:11.820
So you can't, you kind of, and then everything from there is kind of evolved too, right?

00:30:11.820 --> 00:30:14.000
There's been alpacas and a bunch of other stuff as well.

00:30:14.000 --> 00:30:14.940
I didn't know your animals.

00:30:14.940 --> 00:30:15.260
Yeah.

00:30:15.260 --> 00:30:18.800
If you don't know your animals, you can't figure out the heritage of these, these projects.

00:30:18.800 --> 00:30:19.240
Correct.

00:30:19.240 --> 00:30:25.360
Llama from meta was the first open source, I'd say large language model of note, I guess,

00:30:25.360 --> 00:30:29.680
since ChatGPT, there were, there were certainly other, you know, I'm not a re so one,

00:30:29.680 --> 00:30:32.000
one thing, the caveat, I am not a researcher, right?

00:30:32.000 --> 00:30:34.520
So there's lots of folks in the ML research community that know way more

00:30:34.520 --> 00:30:39.160
than I do, but because there was like bloom and T5 and a few other large, you know, quote

00:30:39.160 --> 00:30:40.180
unquote, large language models.

00:30:40.180 --> 00:30:45.480
But Llama after ChatGPT, Llama was the big release that came from meta and I think March.

00:30:45.480 --> 00:30:47.740
And then, and that was from meta.

00:30:47.740 --> 00:30:51.300
And then they, they had it released under just like research use terms.

00:30:51.300 --> 00:30:53.300
And then only certain people could access to it.

00:30:53.340 --> 00:30:58.300
And then someone put a, I guess, put like a BitTorrent link or something on, on, on GitHub.

00:30:58.300 --> 00:31:00.000
And then basically the world had it.

00:31:00.000 --> 00:31:04.480
And then they did end up releasing Llama 2 a few months later with more friendly terms.

00:31:04.480 --> 00:31:07.820
So that, and that, and it was a much, a much stronger model as well.

00:31:07.820 --> 00:31:08.120
Nice.

00:31:08.120 --> 00:31:11.360
It's kind of the realization like, well, if it's going to be out there anyway, let's at least

00:31:11.360 --> 00:31:12.140
get credit for it.

00:31:12.140 --> 00:31:12.740
Then for sure.

00:31:12.900 --> 00:31:17.580
And I did read something where like basically Facebook approached open AI for access to

00:31:17.580 --> 00:31:18.820
their models to help them write code.

00:31:18.820 --> 00:31:21.600
But the cost was so high that they decided to just go build their own.

00:31:21.600 --> 00:31:21.800
Right.

00:31:21.800 --> 00:31:24.660
So it's kind of interesting how this stuff has evolved.

00:31:24.660 --> 00:31:26.980
Like, you know, we got a big cluster of computers too.

00:31:26.980 --> 00:31:29.700
Metaverse thing doesn't seem to be working yet.

00:31:29.700 --> 00:31:32.460
So let's go ahead and train a bunch of large language models.

00:31:32.460 --> 00:31:33.000
Yeah, exactly.

00:31:33.000 --> 00:31:36.160
We've got some spare capacity over in the metaverse data center.

00:31:36.160 --> 00:31:36.640
All right.

00:31:36.640 --> 00:31:42.300
So one of the things that people will maybe talk about in this space is RAG or retrieval augmented

00:31:42.300 --> 00:31:42.900
generation.

00:31:42.900 --> 00:31:43.780
What's this?

00:31:43.780 --> 00:31:48.760
One thing to recognize is that large language models, if it's not in the training set and

00:31:48.760 --> 00:31:51.380
it's not in the prompt, it really doesn't know about it.

00:31:51.380 --> 00:31:56.040
And the question of like, what's reasoning and what's, you know, generalizing and things

00:31:56.040 --> 00:31:56.300
like that.

00:31:56.300 --> 00:31:58.040
Those are big debates that people are having.

00:31:58.040 --> 00:31:58.680
What's intelligence?

00:31:58.680 --> 00:31:59.160
What have you.

00:31:59.160 --> 00:32:03.240
Recognizing the fact that you have this prompt and things you put in the prompt, the large

00:32:03.240 --> 00:32:06.100
language model can understand and extrapolate from is really powerful.

00:32:06.100 --> 00:32:08.440
So, and that's called in context learning.

00:32:08.440 --> 00:32:12.200
So retrieval augmented generation is the idea of, okay, I'm going to go.

00:32:12.200 --> 00:32:15.980
I'm going to maybe ask a, allow a person to ask a question.

00:32:15.980 --> 00:32:18.940
This is kind of like the common use case that I see.

00:32:18.940 --> 00:32:20.260
User ask a question.

00:32:20.260 --> 00:32:24.960
We're going to take that question, find the relevant content, put that content in the prompt

00:32:24.960 --> 00:32:26.580
and then do something with it.

00:32:26.580 --> 00:32:26.740
Right.

00:32:26.740 --> 00:32:30.040
So it might be something like summer, you know, ask a question about, you know, what,

00:32:30.040 --> 00:32:31.980
you know, how tall is the leaning tower of Pisa?

00:32:31.980 --> 00:32:32.280
Right.

00:32:32.280 --> 00:32:36.700
And so now it's going to go off and, and find that piece of content from Wikipedia or what

00:32:36.700 --> 00:32:38.700
have you, and then put that information in the prompt.

00:32:38.700 --> 00:32:43.780
And, and then, and then now that the model can then respond to that question based on that

00:32:43.780 --> 00:32:44.080
text.

00:32:44.080 --> 00:32:47.260
Obviously that's a pretty simple example, but you can get more complicated and it's going

00:32:47.260 --> 00:32:51.720
out and bringing back lots of different content, slicing it up, putting in the prompt and asking

00:32:51.720 --> 00:32:52.140
a question.

00:32:52.140 --> 00:32:56.780
So now the trick is, okay, how do you actually get that content and how do you do that?

00:32:56.780 --> 00:33:00.740
Well, you know, information retrieval, search engines and things like that.

00:33:00.740 --> 00:33:04.760
That's obviously the technique, but one of the key techniques that people have been, you

00:33:04.760 --> 00:33:09.460
know, kind of discovering, rediscovering, I guess, is this idea of word embeddings or vectors.

00:33:09.860 --> 00:33:13.760
And so word to VEC was this project that came out, I think 11 years ago or so.

00:33:13.760 --> 00:33:18.400
And, you know, there was a big, the big meme around that was you could take the embedding

00:33:18.400 --> 00:33:19.300
for the word King.

00:33:19.300 --> 00:33:24.000
You could then subtract the embedding for the word man, add the word embedding for woman.

00:33:24.000 --> 00:33:28.660
And then the end math result would actually be close to the embedding for the word queen.

00:33:28.660 --> 00:33:30.380
And so what is an embedding?

00:33:30.380 --> 00:33:30.860
What's a vector?

00:33:30.860 --> 00:33:37.260
It's basically this large floating point number that has semantic meaning inferred into it.

00:33:37.260 --> 00:33:39.440
And it's, and it's built just by training a model.

00:33:39.560 --> 00:33:43.880
So just like you train a large language model, they can trade these embedding models to basically

00:33:43.880 --> 00:33:48.920
take a word and then take a sentence and then take a, you know, a document is what, you know,

00:33:48.920 --> 00:33:55.960
OpenAI can do and turn that into this big giant 200, 800, 1500, you know, depending on the size

00:33:55.960 --> 00:34:01.020
of the embedding floating point numbers, and then use that as a, what's called, you know,

00:34:01.020 --> 00:34:02.480
semantic similarity search.

00:34:02.480 --> 00:34:05.420
So you're basically going off and asking for similar documents.

00:34:05.420 --> 00:34:08.160
And so you get those documents and then you make your prompt.

00:34:08.160 --> 00:34:08.880
It's really wild.

00:34:09.260 --> 00:34:15.100
So, you know, we're going to make an 800 dimensional space and each concept gets a location in that

00:34:15.100 --> 00:34:15.480
space.

00:34:15.480 --> 00:34:19.520
And then you're going to get another concept as a prompt and you say, what other things in

00:34:19.520 --> 00:34:20.580
this space are near it?

00:34:20.580 --> 00:34:24.000
The hard problems that remain are, well, first you got to figure out what you're trying to

00:34:24.000 --> 00:34:24.220
solve.

00:34:24.220 --> 00:34:27.260
So once you figure out what you're actually trying to solve, then you can start asking yourself

00:34:27.260 --> 00:34:31.620
questions like, okay, well, how do I chunk up the documents that I have?

00:34:31.620 --> 00:34:31.900
Right.

00:34:31.900 --> 00:34:35.620
And there's all these different, and there's another great place for Lama Index and LangChain.

00:34:35.620 --> 00:34:39.780
They have chunking strategies where they'll take a big giant document and break it down

00:34:39.780 --> 00:34:40.920
into sections.

00:34:40.920 --> 00:34:45.520
And then you chunk each section and then you're, and then you do the embedding on just that small

00:34:45.520 --> 00:34:45.880
section.

00:34:45.880 --> 00:34:51.340
Because the idea being, can you get, you know, finer and finer sets of text that you can then,

00:34:51.340 --> 00:34:54.560
when you're doing your retrieval, you get the right information back.

00:34:54.560 --> 00:34:58.200
And then the other challenge is really like the question answer problem, right?

00:34:58.200 --> 00:35:02.600
If a person's asking a question, how do you turn that question into the same kind of embedding

00:35:02.600 --> 00:35:03.540
space as the answer?

00:35:03.960 --> 00:35:06.580
And so there's lots of different strategies that are out there for that.

00:35:06.580 --> 00:35:10.480
And, and then another, you know, another problem is if you're looking at the Wikipedia page for

00:35:10.480 --> 00:35:15.140
the Tower of Pisa, it might actually have like a sentence in here that says it is X number

00:35:15.140 --> 00:35:19.420
of meters tall or feet tall, but it won't actually have the word, you know, Tower of Pisa in it.

00:35:19.420 --> 00:35:23.940
So, so there's another chunking strategy where they're, they call propositional chunking, where

00:35:23.940 --> 00:35:29.660
they basically use a large language model to actually redefine each word, each sentence so that

00:35:29.660 --> 00:35:34.200
it actually has those proper nouns baked into it so that when you do the embedding, it doesn't lose

00:35:34.200 --> 00:35:36.560
some of the detail with propositions.

00:35:36.560 --> 00:35:42.000
It's this tall, but it's something that replaces this tall with its actual height and things like

00:35:42.000 --> 00:35:42.220
that.

00:35:42.220 --> 00:35:42.800
Correct.

00:35:42.800 --> 00:35:43.340
Crazy.

00:35:43.340 --> 00:35:47.420
But fundamentally, you're working with unstructured data and it's kind of messy and it's not always

00:35:47.420 --> 00:35:48.980
going to work the way you want.

00:35:48.980 --> 00:35:52.440
And there's a lot of challenges and people are trying lots of different things to make it better.

00:35:52.440 --> 00:35:52.880
That's cool.

00:35:52.880 --> 00:35:55.380
It's not always deterministic or exactly the same.

00:35:55.380 --> 00:35:56.820
So that can be tricky as well.

00:35:58.320 --> 00:36:01.760
This portion of Talk Python To Me is brought to you by Neo4j.

00:36:01.760 --> 00:36:03.680
Do you know Neo4j?

00:36:03.680 --> 00:36:06.440
Neo4j is a native graph database.

00:36:06.440 --> 00:36:12.120
And if the slowest part of your data access patterns involves computing relationships, why

00:36:12.120 --> 00:36:17.780
not use a database that stores those relationships directly in the database, unlike your typical

00:36:17.780 --> 00:36:18.520
relational one?

00:36:18.520 --> 00:36:23.300
A graph database lets you model the data the way it looks in the real world, instead of forcing

00:36:23.300 --> 00:36:25.220
it into rows and columns.

00:36:25.620 --> 00:36:30.620
It's time to stop asking a relational database to do more than they were made for and simplify

00:36:30.620 --> 00:36:32.840
complex data models with graphs.

00:36:32.840 --> 00:36:37.920
If you haven't used a graph database before, you might be wondering about common use cases.

00:36:37.920 --> 00:36:38.780
What's it for?

00:36:38.780 --> 00:36:40.200
Here are just a few.

00:36:40.200 --> 00:36:41.320
Detecting fraud.

00:36:41.320 --> 00:36:42.840
Enhancing AI.

00:36:43.400 --> 00:36:44.660
Managing supply chains.

00:36:44.660 --> 00:36:47.520
Gaining a 360 degree view of your data.

00:36:47.520 --> 00:36:50.880
And anywhere else you have highly connected data.

00:36:50.880 --> 00:36:56.340
To use Neo4j from Python, it's a simple pip install Neo4j.

00:36:56.340 --> 00:37:01.340
And to help you get started, their docs include a sample web app demonstrating how to use it

00:37:01.340 --> 00:37:03.460
both from Flask and FastAPI.

00:37:03.460 --> 00:37:08.460
Find it in their docs or search GitHub for Neo4j movies application quick start.

00:37:09.020 --> 00:37:12.660
Developers are solving some of the world's biggest problems with graphs.

00:37:12.660 --> 00:37:13.860
Now it's your turn.

00:37:13.860 --> 00:37:18.160
Visit talkpython.fm/Neo4j to get started.

00:37:18.160 --> 00:37:22.240
That's talkpython.fm/Neo4j.

00:37:23.240 --> 00:37:26.180
Thank you to Neo4j for supporting Talk Python To Me.

00:37:26.180 --> 00:37:32.740
One of the big parts of at least this embedding stuff you're talking about are vector databases.

00:37:32.740 --> 00:37:36.280
And they used to be really rare and kind of their own specialized thing.

00:37:36.280 --> 00:37:38.660
Now they're starting to show up in lots of places.

00:37:38.660 --> 00:37:41.720
And you shared with us this link of vector DB comparison.

00:37:41.720 --> 00:37:43.480
I just saw that MongoDB added it.

00:37:43.480 --> 00:37:45.920
I'm like, I didn't know that had anything to do with that.

00:37:45.920 --> 00:37:47.600
And I'm probably not going to mess with it.

00:37:47.600 --> 00:37:51.600
But it's interesting that it's just like finding its way in all these different spaces, you know?

00:37:51.600 --> 00:37:54.880
It was weird there for a couple of years where people were basically like talking about vector

00:37:54.880 --> 00:37:56.620
databases like they're their own separate thing.

00:37:56.620 --> 00:38:01.280
The vector databases are now becoming their own fully fledged, either relational database

00:38:01.280 --> 00:38:03.440
or a graph database or a search engine, right?

00:38:03.440 --> 00:38:07.240
Those are kind of the three categories where all, I mean, I guess Redis is its own thing

00:38:07.240 --> 00:38:07.420
too.

00:38:07.420 --> 00:38:11.940
But for the most part, those new databases, quote unquote, are now kind of trying to be more

00:38:11.940 --> 00:38:12.660
fully fledged.

00:38:12.660 --> 00:38:16.060
And vectors and semantic search is really just one feature.

00:38:16.060 --> 00:38:18.600
I was just thinking that is, is this thing that you're talking about?

00:38:18.600 --> 00:38:22.200
Is it a product or is it a feature of a bigger product, right?

00:38:22.200 --> 00:38:22.580
Correct.

00:38:22.580 --> 00:38:25.120
If you already got a database, it's already doing a bunch of things.

00:38:25.120 --> 00:38:26.800
Could it just answer the vector question?

00:38:26.800 --> 00:38:27.860
Maybe, maybe not.

00:38:27.860 --> 00:38:28.220
I don't know.

00:38:28.220 --> 00:38:28.760
Exactly right.

00:38:28.760 --> 00:38:32.400
And the one thing to recognize is that, and then the other thing people do is they just

00:38:32.400 --> 00:38:34.880
take NumPy or what have you and just load them all into memory.

00:38:34.880 --> 00:38:38.460
And if you don't have that much data, that's actually probably going to be the fastest and

00:38:38.460 --> 00:38:39.500
simplest way to work.

00:38:39.900 --> 00:38:44.440
But the thing you got to recognize is the fact that there is precision and recall and

00:38:44.440 --> 00:38:47.200
cost trade-off that happens as well.

00:38:47.200 --> 00:38:53.120
So they have to index these vectors and there's different algorithms that are used and different

00:38:53.120 --> 00:38:54.940
algorithms do better than others.

00:38:54.940 --> 00:38:56.940
So you got to make sure you understand that as well.

00:38:56.940 --> 00:39:02.020
So, and one thing you can do is, for instance, PG vector, which comes as an extension for Postgres,

00:39:02.020 --> 00:39:04.800
you can start off by not indexing at all.

00:39:04.980 --> 00:39:08.900
And you should get, I believe, hopefully I'm not misspeaking, you should get perfect recall,

00:39:08.900 --> 00:39:10.100
meaning you'll get the right answer.

00:39:10.100 --> 00:39:15.080
You'll get the, if you ask for the five closest vectors to the, to your query, you'll get the

00:39:15.080 --> 00:39:17.760
five closest, but it'll be slower than you probably want.

00:39:17.760 --> 00:39:18.900
So then you have to index it.

00:39:18.900 --> 00:39:22.120
And then what ends up happening is, you know, the next time you might only get four of those

00:39:22.120 --> 00:39:24.760
five, you'll get something else that snuck into that list.

00:39:24.760 --> 00:39:29.780
If you got time, you're willing to spend unlimited time, then you can get the right answer.

00:39:29.780 --> 00:39:31.060
The exact answer.

00:39:31.060 --> 00:39:33.440
But I guess that's all sorts of heuristics, right?

00:39:33.460 --> 00:39:37.460
You're like, I could, I could spend three days or I could do a Monte Carlo thing and

00:39:37.460 --> 00:39:39.080
I can give you an answer in a fraction of a second.

00:39:39.080 --> 00:39:39.780
Right.

00:39:39.780 --> 00:39:41.880
But it's not, it's not deterministic.

00:39:41.880 --> 00:39:42.480
All right.

00:39:42.480 --> 00:39:43.660
So then we'll walk with my camera.

00:39:43.660 --> 00:39:44.300
So I turn it off.

00:39:44.300 --> 00:39:46.080
I don't know what's up with it, but we'll, yeah.

00:39:46.080 --> 00:39:51.160
So you wrote a cool blog post called, what is a custom GPT?

00:39:51.160 --> 00:39:56.300
And we'll want to talk some about building custom GPTs and with SAPI and so on.

00:39:56.300 --> 00:39:57.480
So let's talk about this.

00:39:57.480 --> 00:40:02.540
Like one of the, I think one of the challenges in why it takes so much compute for these systems

00:40:02.540 --> 00:40:03.820
is like they're open-ended.

00:40:03.820 --> 00:40:08.280
They're like, you can ask me any question about any knowledge in the world, in the humankind,

00:40:08.280 --> 00:40:08.820
right?

00:40:08.820 --> 00:40:10.440
You can, you can ask about that.

00:40:10.440 --> 00:40:11.600
Let's, let's start talking.

00:40:11.600 --> 00:40:14.760
Or it could be, you can ask me about genetics.

00:40:14.760 --> 00:40:15.180
Right.

00:40:15.180 --> 00:40:15.500
Right.

00:40:15.540 --> 00:40:19.980
That seems like you could both get better answers if you actually only care about genetic

00:40:19.980 --> 00:40:20.620
responses.

00:40:20.620 --> 00:40:24.600
You know, how tall is the landing tower and probably make it smaller.

00:40:24.600 --> 00:40:25.100
Right.

00:40:25.100 --> 00:40:28.400
So that's, is that kind of the idea of these custom GPTs or what is it?

00:40:28.400 --> 00:40:28.620
No.

00:40:28.620 --> 00:40:32.040
So custom GPTs are new capability from open AI.

00:40:32.040 --> 00:40:38.180
And basically they are a wrapper around a very small subset, but it's still using the open

00:40:38.180 --> 00:40:39.280
AI ecosystem.

00:40:39.280 --> 00:40:39.820
Okay.

00:40:39.820 --> 00:40:43.960
And so what you do is you give it a name, you give it a logo, you give it a prompt.

00:40:43.960 --> 00:40:47.400
And then from there, you can also give it knowledge.

00:40:47.400 --> 00:40:51.640
You can upload PDF documents to it and it will actually slice and dice those PDF documents

00:40:51.640 --> 00:40:53.340
using some sort of vector search.

00:40:53.340 --> 00:40:54.640
We don't know how it actually works.

00:40:54.640 --> 00:40:57.420
The GPT, the cool thing is the GPT will work on your phone, right?

00:40:57.420 --> 00:40:58.200
So I have my phone.

00:40:58.200 --> 00:40:59.780
I can have a conversation with my phone.

00:40:59.780 --> 00:41:03.760
I can, I can take a picture, upload a picture and it will do vision, vision analysis on it.

00:41:03.760 --> 00:41:09.420
So I get all the capabilities of open AI GPT four, but a custom GPT is one that I can

00:41:09.420 --> 00:41:13.940
construct and give a custom prompt to, which basically then says, okay, now you're into your

00:41:13.940 --> 00:41:14.180
point.

00:41:14.180 --> 00:41:15.440
I think maybe this is where you're going with it.

00:41:15.440 --> 00:41:18.760
Like, Hey, now you're an expert in genomics or you're an expert in something and you're

00:41:18.760 --> 00:41:22.420
basically coaching the language model and what it can and can't do.

00:41:22.420 --> 00:41:29.460
And so it's a targeted experience within the large language within the ChatGPT, you know,

00:41:29.460 --> 00:41:30.020
ecosystem.

00:41:30.020 --> 00:41:32.560
It has access to also the open AI tools.

00:41:32.560 --> 00:41:36.840
Like, so opening AI has the ability to do code interpreter and Dolly, and it can also hit

00:41:36.840 --> 00:41:37.600
the web browser.

00:41:37.600 --> 00:41:38.940
So you have access to everything.

00:41:39.240 --> 00:41:43.020
But the interesting thing to me is the fact that you can actually tie this thing to what

00:41:43.020 --> 00:41:43.760
are called actions.

00:41:43.760 --> 00:41:48.120
So March, I think of last year, they actually had this capability called plugins that they

00:41:48.120 --> 00:41:51.320
announced and plugins have kind of faded to the background.

00:41:51.320 --> 00:41:55.060
I don't know if they're going to deprecate them officially, but the basic gist with plugins

00:41:55.060 --> 00:41:56.560
is what was you could turn that on.

00:41:56.560 --> 00:41:57.540
It can then call your API.

00:41:57.740 --> 00:42:01.380
And the cool thing about it was that it read your open API spec, right?

00:42:01.380 --> 00:42:05.840
So you, you know, you write an open API spec, which is Swagger, if you're familiar with Swagger,

00:42:05.840 --> 00:42:10.840
and it basically defines what all the endpoints are, what the path is, what the inputs and outputs

00:42:10.840 --> 00:42:15.700
are, including classes or field level information and any constraints or what have you.

00:42:15.700 --> 00:42:18.440
So you can define, fully define your open API spec.

00:42:18.440 --> 00:42:20.320
It can then call that open API spec.

00:42:20.320 --> 00:42:22.000
And it's basically giving it tools.

00:42:22.220 --> 00:42:25.340
So like the example that they say in the documentation is get the weather, right?

00:42:25.340 --> 00:42:26.840
So if you say, what's the weather in Boston?

00:42:26.840 --> 00:42:29.320
Well, ChatGPT doesn't know the weather in Boston.

00:42:29.320 --> 00:42:33.560
All it knows how to do is call it, but you can call an API and figures out how to call the

00:42:33.560 --> 00:42:36.480
API, get that information, and then it can use that to redisplay.

00:42:36.480 --> 00:42:38.240
And that's a very basic example.

00:42:38.240 --> 00:42:41.020
You can do way more complicated things than that.

00:42:41.020 --> 00:42:41.820
It's pretty powerful.

00:42:41.820 --> 00:42:42.180
Okay.

00:42:42.180 --> 00:42:44.320
That sounds really pretty awesome.

00:42:44.320 --> 00:42:46.680
I thought a lot about different things that I might build.

00:42:46.680 --> 00:42:50.920
On your blog post here, you've got some key benefits and you've got some risks.

00:42:50.920 --> 00:42:52.900
You maybe want to talk a bit about that?

00:42:52.900 --> 00:42:53.160
Yeah.

00:42:53.160 --> 00:42:58.660
So the first part with plugins that didn't work as well is that there was no kind of overarching

00:42:58.660 --> 00:43:01.980
custom instruction that could actually teach it how to work with your plugin.

00:43:01.980 --> 00:43:05.660
So if you couldn't put it in the API spec, then you couldn't integrate it with a bunch of

00:43:05.660 --> 00:43:08.520
other stuff or other capabilities, right?

00:43:08.520 --> 00:43:12.680
So the custom instruction is really a key thing for making these custom APIs strong.

00:43:12.680 --> 00:43:16.780
But one warning about the custom instruction, whatever you put in there, anybody can download,

00:43:16.780 --> 00:43:17.120
right?

00:43:17.120 --> 00:43:18.740
Not just the folks at OpenAI, anybody.

00:43:18.740 --> 00:43:23.800
Like basically there's GitHub projects where like thousands of these custom prompts that

00:43:23.800 --> 00:43:25.780
people have put into their GPT.

00:43:25.780 --> 00:43:28.020
So, and there are now knockoffs on GPT.

00:43:28.020 --> 00:43:31.360
So it's all kind of a mess right now in the OpenAI store.

00:43:31.360 --> 00:43:35.760
I'm sure they'll clean it up, but just recognize the custom instruction is not protected and neither

00:43:35.760 --> 00:43:36.540
is the knowledge.

00:43:36.540 --> 00:43:40.560
So if you upload a PDF, there have been people that have been figuring out how to like download

00:43:40.560 --> 00:43:41.540
those PDFs.

00:43:41.580 --> 00:43:45.800
And I think that that might be a solved problem now or they're working on it, but something

00:43:45.800 --> 00:43:46.160
to know.

00:43:46.160 --> 00:43:50.800
The other problem with plugins was I can get a plugin working, but if they didn't approve

00:43:50.800 --> 00:43:55.100
my plugin and put it in their plugin store, I couldn't share it with other people.

00:43:55.100 --> 00:44:00.040
The way it works now is I can actually make a GPT and I can give it to you and you can use

00:44:00.040 --> 00:44:03.380
it directly, even if it's not in the OpenAI store or OpenAI store.

00:44:03.380 --> 00:44:05.500
You know, it is super easy to get started.

00:44:05.600 --> 00:44:09.520
They have like a tool to like help you generate your dolly picture and actually you don't

00:44:09.520 --> 00:44:11.460
even have to figure out how to do the custom instructions yourself.

00:44:11.460 --> 00:44:13.240
You can just kind of chat that into existence.

00:44:13.240 --> 00:44:17.540
But the thing that I'm really excited about is that this is like free playing.

00:44:17.540 --> 00:44:22.100
Like you could do, so the hosting cost is basically all on the client side.

00:44:22.100 --> 00:44:26.180
You have to be a ChatGPT plus user right now to create these and use these.

00:44:26.180 --> 00:44:30.560
But the cool thing as a developer, I don't have to pay those API fees that we were talking

00:44:30.560 --> 00:44:30.960
about, right?

00:44:30.960 --> 00:44:35.360
And if I need to use GPT for, which I kind of do for my business right now, just because

00:44:35.360 --> 00:44:40.420
of how complicated it is, I don't have to pay those token fees for folks using my custom

00:44:40.420 --> 00:44:41.320
GPT at this moment.

00:44:41.320 --> 00:44:45.480
Where's like the billing or whatever you call it for the custom GPT live?

00:44:45.480 --> 00:44:47.200
Is that in the person who's using it?

00:44:47.200 --> 00:44:50.660
Does it have to, it goes onto their account and whatever their account can do or afford?

00:44:50.660 --> 00:44:55.780
Yeah, right now, OpenAI, ChatGPT plus is $20 a month.

00:44:55.780 --> 00:45:00.080
And then there's a Teams version, which I think is either 25 or 30, depending on the number

00:45:00.080 --> 00:45:01.420
of users or how you pay for it.

00:45:01.420 --> 00:45:02.320
That's the cost.

00:45:02.320 --> 00:45:07.880
So right now, if you want to use custom GPTs, everyone needs to be a ChatGPT plus user.

00:45:07.880 --> 00:45:11.380
There's no extra cost based on usage or anything like that.

00:45:11.380 --> 00:45:17.340
In fact, there's talk about revenue sharing between OpenAI and developers of custom GPTs.

00:45:17.340 --> 00:45:20.620
But that has not come out yet as far as like what those details are.

00:45:20.620 --> 00:45:23.180
It does have an app store feel to it, doesn't it?

00:45:23.180 --> 00:45:24.360
There's risks too, right?

00:45:24.360 --> 00:45:27.600
Obviously, anybody can, there's already been like tons of copies up there.

00:45:28.260 --> 00:45:30.860
OpenAI, they're looking for their business model too, right?

00:45:30.860 --> 00:45:35.800
So they could, if someone has a very successful custom GPT, it's well within their right to

00:45:35.800 --> 00:45:37.840
kind of add that to the base product as well.

00:45:37.840 --> 00:45:39.520
Injection is still a thing.

00:45:39.520 --> 00:45:44.420
So if you're doing anything in your actions that actually changes something that is consequential

00:45:44.420 --> 00:45:45.180
is what they call it.

00:45:45.640 --> 00:45:49.680
You better think very carefully, like what's the worst thing that could happen, right?

00:45:49.680 --> 00:45:52.700
Because whatever the worst thing that could happen is, that's what's going to happen.

00:45:52.700 --> 00:45:58.580
Because people can figure this stuff out and they can confuse the large language models into calling them.

00:45:58.580 --> 00:46:03.680
And the more valuable it is that they can make that thing happen, the more effort they're going to put into it as well.

00:46:03.680 --> 00:46:03.980
Yeah.

00:46:03.980 --> 00:46:04.780
Yeah, yeah.

00:46:04.780 --> 00:46:05.260
For sure.

00:46:05.400 --> 00:46:13.580
I just ask, is you think it's easy to solve SQL injection and other forms of injection, at least in principle, right?

00:46:13.580 --> 00:46:15.080
There's an education problem.

00:46:15.080 --> 00:46:22.660
There's millions of people coming along as developers and they see some demo that says the query is like this plus the name.

00:46:22.660 --> 00:46:24.060
Wait a minute, wait a minute.

00:46:24.060 --> 00:46:28.560
So it kind of recreates itself through not total awareness.

00:46:28.560 --> 00:46:32.060
But there is a very clear thing you do solve that.

00:46:32.060 --> 00:46:32.760
You use parameters.

00:46:32.760 --> 00:46:34.520
You don't concatenate strings with user input.

00:46:34.520 --> 00:46:35.200
Problem solved.

00:46:35.200 --> 00:46:36.980
What about prompt injection, though?

00:46:36.980 --> 00:46:42.180
It's so vague how these AIs know what to do in the first place.

00:46:42.180 --> 00:46:45.120
And so then how do you completely block that off?

00:46:45.120 --> 00:46:46.060
Unsolved problem.

00:46:46.060 --> 00:46:49.800
I'm definitely stealing from Simon on this because I've heard him say it on a few podcasts.

00:46:49.800 --> 00:46:53.000
It's just basically there's no solution as far as we know.

00:46:53.000 --> 00:46:59.160
So you have to design and there's no solution to the hallucination problem either because that's, you know, that's a feature, right?

00:46:59.160 --> 00:47:01.000
That's actually what the thing is supposed to do.

00:47:01.000 --> 00:47:09.240
So when you're building these systems, you have to recognize those those two facts along with some other facts that really limit what you can build with these things.

00:47:09.240 --> 00:47:11.300
So you shouldn't use it for like legal briefs.

00:47:11.300 --> 00:47:11.820
Is that what you're saying?

00:47:11.820 --> 00:47:15.160
I think these things are great collaborative tools, right?

00:47:15.160 --> 00:47:15.540
Yeah.

00:47:15.540 --> 00:47:16.660
The human in the loop.

00:47:16.660 --> 00:47:17.880
And that's everything that I'm building, right?

00:47:17.880 --> 00:47:25.360
So all the stuff that I'm building is assuming that the humans in the loop and that the and what I'm trying to do is augment and amplify expertise, right?

00:47:25.360 --> 00:47:30.940
I'm building tools for people that know about genomics and cancer and how to help cancer patients.

00:47:30.940 --> 00:47:34.560
I'm not designing it for cancer patients who are going to go operate on themselves, right?

00:47:34.560 --> 00:47:36.220
That's not that's not the goal.

00:47:36.220 --> 00:47:38.880
The idea is there's a lot of information.

00:47:38.880 --> 00:47:44.780
There's these tools are super valuable from like synthesizing a variety of info.

00:47:44.780 --> 00:47:51.520
But you still need to look at the underlying citations and ChatGPT by itself can't give you citations like it'll make some up.

00:47:51.520 --> 00:47:54.320
It'll say, oh, I think there's probably a Wikipedia page with this link.

00:47:54.320 --> 00:48:02.360
But you actually have to you definitely have to have an outside tool either the web, you know, being which is I would say subpar for a lot of use cases.

00:48:02.360 --> 00:48:07.160
Or you have to have actions that can actually bring back references and give you those links.

00:48:07.160 --> 00:48:08.880
And then the expert will then say, oh, OK, great.

00:48:08.880 --> 00:48:11.100
Thanks for synthesizing this, giving me this info.

00:48:11.100 --> 00:48:13.220
Let me go validate this myself, right?

00:48:13.220 --> 00:48:15.560
Go click on the link and and go validate it.

00:48:15.560 --> 00:48:19.020
And that's really I think that's really the sweet spot for these things, at least for the near future.

00:48:19.020 --> 00:48:19.320
Yeah.

00:48:19.320 --> 00:48:20.800
Don't ask it for the answer.

00:48:20.800 --> 00:48:23.060
Ask it to help you come up with the answer.

00:48:23.060 --> 00:48:23.300
Right.

00:48:23.300 --> 00:48:24.020
Exactly right.

00:48:24.020 --> 00:48:24.480
All right.

00:48:24.480 --> 00:48:29.900
And then have you criticize you when you do have something because then it'll do a great job of telling you everything you've done wrong.

00:48:29.900 --> 00:48:31.500
I'm feeling too good about myself.

00:48:31.540 --> 00:48:32.880
I need you to insult me a lot.

00:48:32.880 --> 00:48:33.860
Let's get going.

00:48:33.860 --> 00:48:34.420
All right.

00:48:34.420 --> 00:48:38.720
Speaking to talk about ourselves, you've got this project called PyPI GPT.

00:48:38.720 --> 00:48:39.400
What's this about?

00:48:39.400 --> 00:48:46.340
I really wanted to tell people that FastAPI and Pydantic because Python, like we were saying earlier, I don't know if it was on the call or not.

00:48:46.340 --> 00:48:48.520
But Python is the winning language.

00:48:48.520 --> 00:48:48.960
Right.

00:48:48.960 --> 00:48:53.320
And I think FastAPI and Pydantic are the winning libraries in their respective fields.

00:48:53.320 --> 00:48:53.880
And they're great.

00:48:53.880 --> 00:48:57.540
And they're perfect for this space because you need an open API spec.

00:48:57.540 --> 00:48:59.480
English is the new programming language.

00:48:59.480 --> 00:48:59.700
Right.

00:48:59.740 --> 00:49:08.140
So Andre Caparthe, who used to work at Tesla and now works at OpenAI, has this pinned tweet where he's basically like, English is like the hottest programming language or something like that.

00:49:08.140 --> 00:49:09.600
And that's really the truth.

00:49:09.600 --> 00:49:23.100
Because even in this space where I'm building an open API spec, 99% of the work is like thinking about the description of the endpoints or the description of the fields or codifying the constraints on different fields.

00:49:23.100 --> 00:49:27.760
Like you can use these greater thans and less thans and regexes, right, to describe it.

00:49:27.940 --> 00:49:31.600
And so what I did was I said, okay, let's build this thing in FastAPI.

00:49:31.600 --> 00:49:33.660
It's just to get an example out for folks.

00:49:33.660 --> 00:49:35.440
And then I turned it on.

00:49:35.440 --> 00:49:40.320
I actually use ngrok as my service layer because you have to have HTTPS to make this thing work.

00:49:40.320 --> 00:49:41.340
Ngrok is so good.

00:49:41.340 --> 00:49:41.680
Yep.

00:49:41.680 --> 00:49:42.000
Yeah.

00:49:42.000 --> 00:49:44.240
I turned that on with an Nginx thing in front of it.

00:49:44.240 --> 00:49:48.820
So this library, to actually use it, you'll have to actually set that stuff up yourself.

00:49:49.080 --> 00:49:50.260
You have to download it.

00:49:50.260 --> 00:49:50.820
You have to run it.

00:49:50.820 --> 00:49:54.700
You have to get, you know, either get it on a server with HTTPS with Let's Encrypt or something.

00:49:55.060 --> 00:50:01.400
Once you've turned it on, then you can actually see how it generates the OpenAPI spec, how to configure the GPT.

00:50:01.400 --> 00:50:05.320
You know, I didn't do much work with regards to like the custom instructions that I came up with.

00:50:05.320 --> 00:50:07.060
I just said, hey, call my API, figure it out.

00:50:07.060 --> 00:50:07.900
And it does.

00:50:08.280 --> 00:50:17.520
And so what this GPT does is it basically says, OK, given a package name and a version number, it's going to go and grab this data from the SQLite database that I found that has this information and then bring it back to you.

00:50:17.520 --> 00:50:19.920
It's the least interesting GPT I could come up with, I guess.

00:50:19.920 --> 00:50:21.520
But it shows kind of the mechanics, right?

00:50:21.520 --> 00:50:36.300
The mechanics of setting up the servers and the application within FastAPI, the kind of the little, you know, things, the little bits that you have to flip to make sure that OpenAPIs or OpenAI can understand your OpenAPI spec,

00:50:36.300 --> 00:50:40.420
bumble through OpenAI and OpenAPI all the time, and make sure that they can talk to each other.

00:50:40.420 --> 00:50:45.380
And then it will then do the right thing and call your server and bring the answers back.

00:50:45.380 --> 00:50:53.300
And there's, you know, there's a bunch of little flags and information you need to know about actions that are, you know, on the OpenAPI documentation.

00:50:53.300 --> 00:50:58.540
And so I tried to try to pull that all together into, you know, one simple little project for people to look at.

00:50:58.540 --> 00:50:58.800
It's cool.

00:50:58.800 --> 00:51:03.160
So you can ask it questions like, tell me about FastAPI, this version, and it'll come back.

00:51:03.160 --> 00:51:12.280
I was hoping to do something a little better, like, hey, here's my requirements file and go, you know, tell me, like, am I on the latest version of everything or whatever, like something more interesting.

00:51:12.280 --> 00:51:13.360
I just didn't have time.

00:51:13.440 --> 00:51:17.840
Can you ask it questions such as what's the difference between this version and that version?

00:51:17.840 --> 00:51:20.940
You could, if that information is in the database, I actually don't know if it is.

00:51:20.940 --> 00:51:23.940
And then obviously you could also hit the PyPI server.

00:51:23.940 --> 00:51:24.720
And I didn't do that.

00:51:24.720 --> 00:51:29.040
I just wanted to, I don't want to be, you know, hitting anybody's server indiscriminately at this point.

00:51:29.040 --> 00:51:32.200
But the, but that would be a great use case, right?

00:51:32.200 --> 00:51:36.540
So like someone could take this and certainly add some, add some capabilities.

00:51:36.980 --> 00:51:49.460
The thing that is valuable that I'm trying to showcase is the fact that ChatGPT and large language models, while they do have the world's information kind of compressed, you know, at a point in time, they are still not a database, right?

00:51:49.500 --> 00:51:55.200
They don't do well when you're basically trying to make sure you have a comprehensive query and you've brought back all the information.

00:51:55.200 --> 00:51:58.380
And they're also not good from like a up-to-date perspective, right?

00:51:58.380 --> 00:51:59.140
There's a cutoff date.

00:51:59.140 --> 00:52:01.320
Thankfully, they finally updated that recently.

00:52:01.320 --> 00:52:02.800
I think it's now April of 2023.

00:52:03.300 --> 00:52:06.180
But at some point, it just doesn't know about newer things.

00:52:06.180 --> 00:52:09.580
And so a GPT is a really interesting way of doing that.

00:52:09.580 --> 00:52:11.540
I'm going to put it out in the universe and hopefully someone will do it.

00:52:11.540 --> 00:52:26.820
Make me a modern Python GPT, which is basically like get the new version of Pydantic and Polars and a few other libraries that ChatGPT does a bad job at just because they, you know, they're in under active development during the time that ChatGPT was getting trained.

00:52:26.820 --> 00:52:35.540
So that's the perfect use case for these types of, you know, custom GPTs with knowledge in a PDF file or an API backing it up.

00:52:35.540 --> 00:52:47.160
I think there's a ton of value in being able to feed a little bit of your information, some of your documents or your code repository or something to a GPT and then be able to ask it questions about it, right?

00:52:47.160 --> 00:52:47.440
Yeah.

00:52:47.440 --> 00:52:47.740
Yeah.

00:52:47.740 --> 00:52:51.900
Like, you know, tell me about the security vulnerabilities that you see in the code.

00:52:51.900 --> 00:52:58.920
Like, is there anywhere where I'm, I'm missing some test or I'm calling a function in a way that's known to be bad.

00:52:58.920 --> 00:53:06.900
And, you know, like that kind of stuff is really tricky, but it's also tricky because it doesn't, even if you paste in a little bit of code, it's not the whole project.

00:53:06.900 --> 00:53:07.200
Right.

00:53:07.200 --> 00:53:10.020
So, you know, to put a little bit more in there, it's pretty awesome.

00:53:10.020 --> 00:53:10.400
Yeah.

00:53:10.400 --> 00:53:14.480
Being able to give it all the code from some of these code repositories, right.

00:53:14.480 --> 00:53:16.660
Like, and bringing back the relevant information.

00:53:16.660 --> 00:53:18.280
So I think there is a kind of this race.

00:53:18.420 --> 00:53:33.140
There's going to be other, you know, cool, there's another cool project called Sourcegraph and Codi that we can talk about that will, you know, run on your local server and basically indexes your code base and will bring back relevant snippets from your code base and answer questions kind of in context.

00:53:33.420 --> 00:53:47.600
And, you know, long-term and then the new project around new Codeium, they had a new paper where they talked about flow engineering and flow engineering is just basically that I, that same concept of the human in the loop with the LLM with the code.

00:53:47.600 --> 00:53:52.420
That's the magic combination of kind of those people, those entities kind of iterating with each other.

00:53:52.640 --> 00:54:02.560
I think these, you know, these tools are definitely going to evolve and you really want to, you really want to have the ability to have access to your specific information to answer your specific questions.

00:54:02.560 --> 00:54:03.860
Codi is new to me.

00:54:03.860 --> 00:54:12.920
Codi.dev and it's a little subtitle or whatever is Codi as a coding assistant that uses AI, understand your code base, right.

00:54:12.920 --> 00:54:16.880
It was saying it was about your entire code base, APIs, implementations and idioms.

00:54:16.880 --> 00:54:19.580
Like that's, it's kind of what I was suggesting, at least for code, right.

00:54:19.580 --> 00:54:24.300
Yeah. And source graph, those folks really understand code indexing and searching.

00:54:24.300 --> 00:54:26.060
Like that's what the first product was.

00:54:26.060 --> 00:54:28.620
They were kind of just teed up ready for this large language model moment.

00:54:28.620 --> 00:54:31.240
And then they said, oh, let's just put Codi on top of that.

00:54:31.240 --> 00:54:36.000
So this thing will run, it will understand your code and it will kind of bring things together for you.

00:54:36.000 --> 00:54:38.180
So these folks do, do podcasts all the time.

00:54:38.180 --> 00:54:39.220
I'd, I'd reach out to them.

00:54:39.220 --> 00:54:41.040
Yeah. Interesting. It's, it's quite neat looking.

00:54:41.040 --> 00:54:42.340
I think I'm going to give it a try.

00:54:42.340 --> 00:54:45.320
It both plugs into a charm and VS Code.

00:54:45.320 --> 00:54:46.300
That's pretty neat.

00:54:46.300 --> 00:54:46.800
Very cool.

00:54:46.800 --> 00:54:55.180
We're starting to get a little bit short on time here, but for people who want to play with the PyPI GPT, maybe as an example, to just cut the readme and it's easy to get from there.

00:54:55.180 --> 00:54:56.400
What do you need to tell them?

00:54:56.400 --> 00:54:57.540
I put a make file in there.

00:54:57.540 --> 00:55:03.420
So, you know, exactly like the steps to kind of make the environment, download the files and just ping, ping me, follow me on Twitter.

00:55:03.420 --> 00:55:05.940
I'm more and ping me if you need anything there.

00:55:06.200 --> 00:55:08.480
I'm also on LinkedIn and, and get up.

00:55:08.480 --> 00:55:08.880
Right.

00:55:08.880 --> 00:55:11.820
So you can certainly reach out if you, if you have any challenges.

00:55:11.820 --> 00:55:12.240
Excellent.

00:55:12.240 --> 00:55:15.260
The last thing that folks that are actually in the medical space, right?

00:55:15.260 --> 00:55:21.220
So the thing that I'm working on right now actively is how to integrate this thing with our knowledge base.

00:55:21.220 --> 00:55:21.420
Right.

00:55:21.420 --> 00:55:30.720
So I have a knowledge base of hand curated trials and curated therapies and other information built it so that my custom GPT can actually work with that.

00:55:30.720 --> 00:55:32.540
Come up with some, I'd say novel.

00:55:32.540 --> 00:55:40.460
At least I haven't seen anybody else and I haven't seen any research approaching things the same way I am that handles some of the other challenges that are out there.

00:55:40.460 --> 00:55:40.860
Right.

00:55:40.860 --> 00:55:43.640
So for instance, the context window is a challenge.

00:55:43.640 --> 00:55:48.940
So the context window is the amount of text that's in there and, and how, and how it gets processed.

00:55:48.940 --> 00:55:56.240
If you're making decisions and you're changing course, the chat bot will lose track of, of those changes.

00:55:56.240 --> 00:55:56.640
Right.

00:55:56.640 --> 00:56:05.180
So if you're, you know, experimenting or, or going down one path of inquiry and then you switch to another path, it can get confused and forget that you switch paths.

00:56:05.180 --> 00:56:08.080
Or just run out of space to hold all that information.

00:56:08.080 --> 00:56:08.500
Like, well.

00:56:08.500 --> 00:56:08.940
For sure.

00:56:08.940 --> 00:56:11.880
It forgot the last three things, the first three things you told it.

00:56:11.880 --> 00:56:14.780
It only knows four and you think it knows seven and it's working incomplete.

00:56:14.780 --> 00:56:15.180
Right.

00:56:15.180 --> 00:56:15.540
Yep.

00:56:15.540 --> 00:56:20.060
And, and, you know, one of the key things is you actually want it to forget some things as well.

00:56:20.060 --> 00:56:20.420
Right.

00:56:20.420 --> 00:56:22.560
So those are, that's, those are all interesting challenges.

00:56:22.560 --> 00:56:37.420
And I'm actually working with these custom GPTs to kind of change the way that the collaboration works between the human, the expert, the large language model or the assistant and my backend, my actual, the retrieval model.

00:56:37.420 --> 00:56:39.560
The, the API that's actually doing stuff.

00:56:39.560 --> 00:56:46.120
So are researchers and MDs and PhDs at your company talking with this thing and making use of it?

00:56:46.120 --> 00:56:46.360
Yeah.

00:56:46.360 --> 00:56:47.820
I mean, we're in active development right now.

00:56:47.820 --> 00:56:54.860
We have a few key opinion leaders that are, that are working with us and collaborating with us, but we're always looking for more folks that, that are in the field that actually.

00:56:55.260 --> 00:56:58.000
And right now you need kind of the cutting edge people.

00:56:58.000 --> 00:56:59.780
This stuff's not ready for prime time.

00:56:59.780 --> 00:57:02.580
Clinical decision support is a really hard problem.

00:57:02.580 --> 00:57:11.120
And we, but we need the folks that are, that want to get ahead of it because, because we know that there are doctors and there are patients that are asking ChatGPT questions right now.

00:57:11.120 --> 00:57:13.580
And even if it says I'm not a medical expert, blah, blah, blah.

00:57:13.580 --> 00:57:16.460
And at the end of the day, we actually don't have enough doctors, right?

00:57:16.460 --> 00:57:19.120
That's the other scary thing is we don't have enough doctors.

00:57:19.120 --> 00:57:20.300
Patients want answers.

00:57:20.300 --> 00:57:27.240
How do we build solutions that can allow this expertise to get more democratized and more, you know, more into folks' hands?

00:57:27.260 --> 00:57:33.760
And, and I'm hoping, hoping our tool along with these large language models can help, help relieve some of that burden.

00:57:33.760 --> 00:57:38.820
It might not be as a hundred percent accurate, a hundred percent precise, but neither are doctors, right?

00:57:38.820 --> 00:57:39.700
They get stuff wrong.

00:57:39.700 --> 00:57:43.600
You just need to be in the realm of as good as a doctor.

00:57:43.600 --> 00:57:47.440
You don't need to be, you know, completely without making a mistake.

00:57:47.440 --> 00:57:52.120
And that's a, I think a challenge that we're just going to have to get used to in general.

00:57:52.300 --> 00:57:59.280
I joked about the legal brief thing because someone got in trouble for submitting a brief that had hallucinations in it.

00:57:59.280 --> 00:58:07.920
And there's certain circumstances where maybe it's just not acceptable, but AI, self-driven cars, people crash, but that's a, like a human mistake.

00:58:07.920 --> 00:58:12.100
But when a machine makes it, it's a pre-programmed, pre-determined mistake.

00:58:12.100 --> 00:58:15.900
You know, something like that, like it doesn't feel the same as if the machine made a mistake.

00:58:15.900 --> 00:58:21.620
So if a machine makes a recommendation, like you need this cancer treatment or you're fine, you don't need it.

00:58:21.680 --> 00:58:22.580
And it was wrong.

00:58:22.580 --> 00:58:29.240
People are not going to be as forgiving, but it doesn't mean there's not value to be gained from systems that can help you.

00:58:29.240 --> 00:58:29.440
Right.

00:58:29.440 --> 00:58:36.080
I always appreciate those, those machine learning papers that I'll like, you know, there'll be show the tracking of over time of like how the models have gotten better and better.

00:58:36.080 --> 00:58:41.380
And they put the human in there and you can see that the human has already gotten eclipsed by the, by the models.

00:58:41.380 --> 00:58:42.700
And that specific problem, right?

00:58:42.700 --> 00:58:48.380
Because it's not also recognizing that a lot of this stuff, these models that are doing tasks are doing one specific task.

00:58:48.380 --> 00:58:49.480
They're not doing a whole job.

00:58:49.580 --> 00:58:51.560
They're not, they're not doing an end to end process.

00:58:51.560 --> 00:58:57.880
They're answering a medical question or they're, you know, looking at an image and finding all the cats or whatever it's supposed to do.

00:58:57.880 --> 00:59:01.940
So, and to your point though, you know, humans aren't perfect at these tasks either.

00:59:02.180 --> 00:59:06.460
I think mostly people are going to be using this kind of stuff to help them come up with these answers.

00:59:06.460 --> 00:59:06.900
Right.

00:59:06.900 --> 00:59:12.340
The, my weird Amazon description example is going to be the edge case, not the go-to like.

00:59:12.340 --> 00:59:12.720
Agreed.

00:59:12.720 --> 00:59:13.040
Yeah.

00:59:13.040 --> 00:59:15.140
You came in, you spoke to the chat bot.

00:59:15.140 --> 00:59:16.060
Here's your diagnosis.

00:59:16.060 --> 00:59:16.920
Have a good day.

00:59:16.920 --> 00:59:17.160
Right.

00:59:17.160 --> 00:59:20.320
Not so much more like, I need some help thinking through this.

00:59:20.320 --> 00:59:24.280
What are some of historic, what are some studies that have like addressed this?

00:59:24.280 --> 00:59:24.500
Right.

00:59:24.500 --> 00:59:25.740
And like those kinds of questions.

00:59:25.740 --> 00:59:30.860
And I hesitate to say it's just a better search engine because that's, I actually think it's got way more potential than that.

00:59:30.860 --> 00:59:31.260
I agree.

00:59:31.260 --> 00:59:32.260
It's a conversation.

00:59:32.260 --> 00:59:34.300
It can iterate back and forth.

00:59:34.300 --> 00:59:37.560
And what I'm actually trying to do is build some state into it.

00:59:37.560 --> 00:59:37.760
Right.

00:59:37.760 --> 00:59:47.740
Some, some structured way of kind of remembering what the conversation was and using a lot of the techniques that these large language models are good at to actually, to make that actually happen.

00:59:47.740 --> 00:59:56.800
And so that you can actually build a system so that the human and the assistant and the backend all kind of know what the other party is thinking about and that they all work together.

00:59:56.800 --> 00:59:57.200
Nice.

00:59:57.200 --> 01:00:03.780
For your genomics custom GPT thing that you're making internally, is that going to become a product eventually?

01:00:03.780 --> 01:00:08.440
If other people are interested, is there some way they can keep tabs on it or is it just internal only?

01:00:08.440 --> 01:00:09.340
Definitely reach out to me.

01:00:09.340 --> 01:00:11.240
So we're building different versions of GPTs.

01:00:11.240 --> 01:00:23.880
Like we're going to have a GPT for our curation team that curates knowledge and we're building a GPT that, you know, my hope is that it'll go to physicians, to oncologists and genomic counselors and other providers that could actually use this thing.

01:00:23.880 --> 01:00:32.960
Eventually, if it becomes robust enough and stable enough, and I don't feel like we're doing a disservice, we could certainly make a version of that available for cancer patients as well.

01:00:32.960 --> 01:00:34.500
I would, you know, I'd love to have that.

01:00:34.500 --> 01:00:36.660
I just want to make sure that it's done in a responsible way.

01:00:36.660 --> 01:00:37.340
Yeah, absolutely.

01:00:37.800 --> 01:00:46.720
Well, I honestly hope that you actually do such a good job that we don't have to have cancer research anymore, but that's a long, long term goal, right?

01:00:46.720 --> 01:00:48.220
That is definitely the end goal.

01:00:48.220 --> 01:00:49.560
And that's really exciting too.

01:00:49.560 --> 01:00:59.040
So is that the new drugs that are coming out, new treatments that are coming out, it's really just about making sure people are aware of it, making sure that they're getting the genetic testing that they need, right?

01:00:59.040 --> 01:01:08.580
So if you have a loved one that has, unfortunately has cancer, make sure that they're at least asking their doctor the question about genomic testing to make sure that they're getting the best possible treatment.

01:01:08.580 --> 01:01:09.080
Sounds good.

01:01:09.080 --> 01:01:09.720
All right.

01:01:09.720 --> 01:01:16.540
Well, quickly, before we get out of here, recommendation on some libraries, some project that maybe we haven't talked about yet.

01:01:16.540 --> 01:01:18.780
Something came across, people were like, oh, this would be awesome.

01:01:18.780 --> 01:01:19.860
We ran out of time.

01:01:19.860 --> 01:01:21.840
I was going to talk about some of these Pydantic projects.

01:01:21.840 --> 01:01:24.500
So there's Marvin, Instructor, and Outlines.

01:01:24.500 --> 01:01:27.200
So folks should definitely look at those.

01:01:27.200 --> 01:01:34.020
So basically what you do is you can describe stuff as Pydantic, and then it'll actually just extract it right into that Pydantic model for you.

01:01:34.020 --> 01:01:36.820
And that's some Marvin and Outlines and Instructor.

01:01:36.820 --> 01:01:37.940
So check those guys out.

01:01:37.940 --> 01:01:38.400
They're awesome.

01:01:38.400 --> 01:01:41.700
And then the other one that I actually had teed up was VisiCalc.

01:01:41.700 --> 01:01:45.060
So VisiCalc is like this crazy command line tool.

01:01:45.060 --> 01:01:45.640
It's awesome.

01:01:45.640 --> 01:01:49.620
You can basically look at giant CSV files all on the command line.

01:01:49.620 --> 01:01:51.200
It has these hotkeys that you can do.

01:01:51.780 --> 01:01:54.240
And it's, sorry, not VisiCalc, VisiData.

01:01:54.240 --> 01:01:54.940
VisiData, okay.

01:01:54.940 --> 01:01:58.260
And so basically it's just, it's basically Excel inside your terminal.

01:01:58.260 --> 01:02:02.160
And this was before Rich and Textual Project.

01:02:02.160 --> 01:02:07.000
And it was just like, it was kind of mind-blowing all the stuff that this person was able to figure out how to make work.

01:02:07.000 --> 01:02:07.840
That's super amazing.

01:02:07.840 --> 01:02:14.160
I just wanted to give a shout out one more thing because your VisiData reminded me of something I just came across called BTOP.

01:02:14.160 --> 01:02:18.180
I don't know if you have servers out there and they need to know what's going on with their server.

01:02:18.180 --> 01:02:19.240
Where's mine?

01:02:19.240 --> 01:02:20.320
I need a picture for this.

01:02:20.680 --> 01:02:23.660
But yeah, it's like a nice visualization.

01:02:23.660 --> 01:02:26.200
There's also B-HITOP.

01:02:26.200 --> 01:02:28.540
It's pretty amazing what people can do in the terminal, right?

01:02:28.540 --> 01:02:29.380
Oh, there they are.

01:02:29.380 --> 01:02:31.560
They're just responsive design themselves out.

01:02:31.560 --> 01:02:33.980
But yeah, if you want a bunch of live graphs.

01:02:33.980 --> 01:02:41.180
Every time I see stuff like this, the VisiData or this or what textual folks are working on, it's just like, I can't believe they built this, right?

01:02:41.180 --> 01:02:43.640
Like, I'm working at the level of colorama.

01:02:43.640 --> 01:02:45.360
This string is red right here.

01:02:45.360 --> 01:02:47.760
They're like, oh, yeah, we rebuilt it.

01:02:47.760 --> 01:02:49.140
I got an emoji to show up, right?

01:02:49.140 --> 01:02:49.560
I'm excited.

01:02:49.560 --> 01:02:49.940
Yes, exactly.

01:02:49.940 --> 01:02:50.480
Yes.

01:02:50.480 --> 01:02:53.060
A rocket ship is there, not just tech.

01:02:53.060 --> 01:02:54.220
Yeah.

01:02:54.220 --> 01:02:55.000
Pretty excellent.

01:02:55.000 --> 01:02:55.420
All right.

01:02:55.420 --> 01:02:58.120
Well, Ian, thank you for being here.

01:02:58.120 --> 01:02:59.960
And keep up the good work.

01:03:00.220 --> 01:03:05.620
I know so many people are using LLMs, but not that many people are creating LLMs.

01:03:05.620 --> 01:03:08.000
And as developers, you know, we love to create things.

01:03:08.000 --> 01:03:09.260
We already have the tools to do it.

01:03:09.260 --> 01:03:15.780
People can check out your GitHub repo on the PyPI and GPT and use it as a starting place, right?

01:03:15.780 --> 01:03:16.640
Sounds great.

01:03:16.640 --> 01:03:16.920
Yeah.

01:03:16.920 --> 01:03:19.200
And definitely reach out if you have any questions.

01:03:19.200 --> 01:03:19.740
Excellent.

01:03:19.740 --> 01:03:21.020
Well, thanks for coming back on the show.

01:03:21.020 --> 01:03:21.680
See you later.

01:03:21.680 --> 01:03:22.020
Great.

01:03:22.020 --> 01:03:22.900
Good to talk to you.

01:03:22.900 --> 01:03:23.220
Bye-bye.

01:03:23.220 --> 01:03:23.760
Yeah, you bet.

01:03:23.760 --> 01:03:24.020
Bye.

01:03:24.020 --> 01:03:27.700
This has been another episode of Talk Python To Me.

01:03:27.700 --> 01:03:29.460
Thank you to our sponsors.

01:03:29.460 --> 01:03:31.120
Be sure to check out what they're offering.

01:03:31.120 --> 01:03:32.540
It really helps support the show.

01:03:32.540 --> 01:03:34.880
Take some stress out of your life.

01:03:34.880 --> 01:03:40.660
Get notified immediately about errors and performance issues in your web or mobile applications with Sentry.

01:03:40.660 --> 01:03:45.660
Just visit talkpython.fm/sentry and get started for free.

01:03:45.660 --> 01:03:49.240
And be sure to use the promo code TALKPYTHON, all one word.

01:03:49.240 --> 01:03:57.520
It's time to stop asking relational databases to do more than they were made for and simplify complex data models with graphs.

01:03:57.820 --> 01:04:04.340
Check out the sample FastAPI project and see what Neo4j, a native graph database, can do for you.

01:04:04.340 --> 01:04:08.800
Find out more at talkpython.fm/Neo4j.

01:04:09.480 --> 01:04:10.640
Want to level up your Python?

01:04:10.640 --> 01:04:10.700
Want to level up your Python?

01:04:10.700 --> 01:04:14.700
We have one of the largest catalogs of Python video courses over at Talk Python.

01:04:14.700 --> 01:04:19.880
Our content ranges from true beginners to deeply advanced topics like memory and async.

01:04:19.880 --> 01:04:22.540
And best of all, there's not a subscription in sight.

01:04:22.540 --> 01:04:25.460
Check it out for yourself at training.talkpython.fm.

01:04:25.820 --> 01:04:27.560
Be sure to subscribe to the show.

01:04:27.560 --> 01:04:30.340
Open your favorite podcast app and search for Python.

01:04:30.340 --> 01:04:31.640
We should be right at the top.

01:04:31.640 --> 01:04:36.820
You can also find the iTunes feed at /itunes, the Google Play feed at /play,

01:04:36.820 --> 01:04:41.020
and the direct RSS feed at /rss on talkpython.fm.

01:04:41.440 --> 01:04:43.980
We're live streaming most of our recordings these days.

01:04:43.980 --> 01:04:47.380
If you want to be part of the show and have your comments featured on the air,

01:04:47.380 --> 01:04:51.820
be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:04:51.820 --> 01:04:53.860
This is your host, Michael Kennedy.

01:04:53.860 --> 01:04:55.160
Thanks so much for listening.

01:04:55.160 --> 01:04:56.320
I really appreciate it.

01:04:56.580 --> 01:04:58.220
Now get out there and write some Python code.

01:04:58.220 --> 01:05:19.360
I'll see you next time.

