WEBVTT

00:00:00.001 --> 00:00:01.860
AI has taken the world by storm.

00:00:01.860 --> 00:00:05.240
It's gone from near zero to amazing in just a few years.

00:00:05.240 --> 00:00:07.880
We have chat GDP, we have stable diffusion.

00:00:07.880 --> 00:00:10.160
What about Jupyter Notebooks and Pandas?

00:00:10.160 --> 00:00:13.540
In this episode, we meet Justin Wagg, the creator of Sketch.

00:00:13.540 --> 00:00:18.100
Sketch adds the ability to have conversational AI interactions

00:00:18.100 --> 00:00:21.620
about your Pandas' data frames, code, and data

00:00:21.620 --> 00:00:23.020
right inside of your notebook.

00:00:23.020 --> 00:00:26.320
It's pretty powerful, and I know you'll enjoy the conversation.

00:00:26.600 --> 00:00:31.940
This is Talk Python To Me, episode 410, recorded April 2nd, 2023.

00:00:31.940 --> 00:00:47.880
Welcome to Talk Python To Me, a weekly podcast on Python.

00:00:47.880 --> 00:00:49.620
This is your host, Michael Kennedy.

00:00:49.620 --> 00:00:52.280
Follow me on Mastodon, where I'm @mkennedy,

00:00:52.280 --> 00:00:54.740
and follow the podcast using @talkpython,

00:00:55.060 --> 00:00:57.100
both on fosstodon.org.

00:00:57.100 --> 00:00:59.720
Be careful with impersonating accounts on other instances.

00:00:59.720 --> 00:01:00.660
There are many.

00:01:00.660 --> 00:01:03.200
Keep up with the show and listen to over seven years

00:01:03.200 --> 00:01:05.740
of past episodes at talkpython.fm.

00:01:05.740 --> 00:01:09.720
We've started streaming most of our episodes live on YouTube.

00:01:09.720 --> 00:01:12.840
Subscribe to our YouTube channel over at talkpython.fm

00:01:12.840 --> 00:01:15.840
slash YouTube to get notified about upcoming shows

00:01:15.840 --> 00:01:17.280
and be part of that episode.

00:01:17.280 --> 00:01:20.480
This episode is brought to you by Brilliant.org

00:01:20.480 --> 00:01:24.240
and us with our online courses over at Talk Python Training.

00:01:24.240 --> 00:01:27.700
Justin, welcome to Talk Python To Me.

00:01:27.700 --> 00:01:28.660
Thanks for having me.

00:01:28.660 --> 00:01:29.800
It's great to have you here.

00:01:29.800 --> 00:01:30.640
I'm a little suspicious.

00:01:30.640 --> 00:01:32.700
I got to know, I really know how to test

00:01:32.700 --> 00:01:37.980
whether you're actually Justin or an AI speaking as Justin.

00:01:37.980 --> 00:01:40.080
What's the deal here?

00:01:40.080 --> 00:01:41.260
Yeah, there's no way to know now.

00:01:41.260 --> 00:01:42.340
No, there's not.

00:01:42.340 --> 00:01:44.500
Well, apparently I've recently learned from you

00:01:44.500 --> 00:01:46.140
that I can give you a bunch of Xs

00:01:46.140 --> 00:01:47.740
and other arbitrary characters.

00:01:47.740 --> 00:01:49.640
This is like the test.

00:01:49.640 --> 00:01:52.060
It's like asking the Germans to say squirrel

00:01:52.060 --> 00:01:54.460
in World War II sort of thing.

00:01:54.460 --> 00:01:55.660
Like it's the test.

00:01:55.660 --> 00:01:56.640
It's the tell.

00:01:56.640 --> 00:01:58.040
There's always going to be something.

00:01:58.040 --> 00:01:59.960
It's some sort of adversarial attack.

00:01:59.960 --> 00:02:01.260
Exactly.

00:02:01.260 --> 00:02:03.600
It's only going to get more interesting

00:02:03.600 --> 00:02:05.400
with this kind of stuff for sure.

00:02:05.400 --> 00:02:10.140
So we're going to talk about using generative AI

00:02:10.140 --> 00:02:13.600
and large language models paired with things like Pandas

00:02:13.600 --> 00:02:15.440
or consumed with straight Python

00:02:15.440 --> 00:02:16.720
with a couple of your projects,

00:02:16.720 --> 00:02:18.840
which are super exciting.

00:02:18.840 --> 00:02:21.060
I think it's going to empower a lot of people

00:02:21.060 --> 00:02:24.480
in ways that it hasn't really been done yet.

00:02:24.480 --> 00:02:25.900
So awesome on that.

00:02:25.900 --> 00:02:27.900
But before we get to it, let's start with your story.

00:02:27.900 --> 00:02:30.100
How did you get into programming in Python and AI?

00:02:30.100 --> 00:02:30.760
Let's see.

00:02:30.760 --> 00:02:34.540
I got into programming in just like when I was a kid,

00:02:34.540 --> 00:02:36.760
TI-83, learning to code on that.

00:02:36.760 --> 00:02:39.420
And then sort of just kept it up as a side hobby

00:02:39.420 --> 00:02:40.400
my whole life.

00:02:40.400 --> 00:02:43.300
Didn't ever sort of choose it as my career path

00:02:43.300 --> 00:02:44.040
or anything for a while.

00:02:44.040 --> 00:02:44.800
It chose you.

00:02:44.800 --> 00:02:46.340
Yeah, it chose me.

00:02:46.340 --> 00:02:47.860
It just, I dragged it along with me everywhere.

00:02:47.860 --> 00:02:49.020
It's just like the toolkit.

00:02:49.020 --> 00:02:52.360
I got a, went to undergrad

00:02:52.360 --> 00:02:54.440
and for physics, electrical engineering,

00:02:54.440 --> 00:02:57.400
then did a physics PhD, experimental physics.

00:02:57.400 --> 00:03:00.880
During that, I did a lot of non-traditional languages,

00:03:00.880 --> 00:03:02.700
things like LabVIEW, Igor Pro,

00:03:02.700 --> 00:03:06.080
just weird Windows, Windows hotkey

00:03:06.080 --> 00:03:08.320
for like just trying to like automate things.

00:03:08.320 --> 00:03:08.760
Yeah, sure.

00:03:08.760 --> 00:03:11.120
So just was sort of dragging that along.

00:03:11.120 --> 00:03:12.080
But along that path,

00:03:12.080 --> 00:03:14.040
sort of came across GPUs

00:03:14.040 --> 00:03:16.100
and used it for accelerating processing,

00:03:16.100 --> 00:03:17.560
specifically like particle detection.

00:03:17.560 --> 00:03:20.100
So it was doing some like electron counting

00:03:20.100 --> 00:03:23.120
in some just detector experiments.

00:03:23.120 --> 00:03:25.920
Is this like CUDA cores on NVIDIA type thing?

00:03:25.920 --> 00:03:26.200
Precisely.

00:03:26.200 --> 00:03:26.920
Stuff like that.

00:03:26.920 --> 00:03:27.120
Okay.

00:03:27.120 --> 00:03:28.280
And was that with Python

00:03:28.280 --> 00:03:29.680
or was that with C++ or what?

00:03:29.680 --> 00:03:31.260
At the time it was C++

00:03:31.260 --> 00:03:32.460
and I made like a DLL

00:03:32.460 --> 00:03:33.640
and then called it from LabVIEW.

00:03:33.640 --> 00:03:35.160
Wow, that's some crazy integration.

00:03:35.160 --> 00:03:37.400
It's like drag and drop programming too

00:03:37.400 --> 00:03:39.480
on the memory GPU.

00:03:39.480 --> 00:03:40.080
Exactly.

00:03:40.080 --> 00:03:41.500
It was all over the place.

00:03:41.500 --> 00:03:42.160
Also had,

00:03:42.160 --> 00:03:43.640
it was a distributed LabVIEW project.

00:03:43.640 --> 00:03:45.160
We had multiple machines

00:03:45.160 --> 00:03:46.900
that were coordinating and doing this

00:03:46.900 --> 00:03:49.100
all just to move some motors

00:03:49.100 --> 00:03:50.460
and measure electrons.

00:03:50.460 --> 00:03:52.820
But it got me into CUDA stuff,

00:03:52.820 --> 00:03:53.740
which then at the time

00:03:53.740 --> 00:03:55.100
was around the time

00:03:55.100 --> 00:03:57.180
that the like AlexNet,

00:03:57.180 --> 00:03:58.780
some of these like very first neural net stuff

00:03:58.780 --> 00:03:59.180
was happening.

00:03:59.180 --> 00:04:01.380
And so those same convolutional kernels

00:04:01.380 --> 00:04:02.640
were the same exact code

00:04:02.640 --> 00:04:03.320
that I was trying to write

00:04:03.320 --> 00:04:04.340
to run like convolutions

00:04:04.340 --> 00:04:04.980
on these images.

00:04:04.980 --> 00:04:05.700
And so it's like,

00:04:05.700 --> 00:04:06.780
oh, look at this like paper.

00:04:06.780 --> 00:04:07.700
Oh, let me go read it.

00:04:07.700 --> 00:04:09.100
It seems like it's got so many citations.

00:04:09.100 --> 00:04:09.720
This is interesting.

00:04:09.720 --> 00:04:11.100
And then like that sent me down

00:04:11.100 --> 00:04:11.860
the rabbit hole of like,

00:04:11.860 --> 00:04:12.640
oh, this AI stuff.

00:04:12.640 --> 00:04:13.060
Oh, okay.

00:04:13.060 --> 00:04:14.820
Let me go deep dive into this.

00:04:14.820 --> 00:04:15.540
And then that just,

00:04:15.540 --> 00:04:16.800
I'd say that like became

00:04:16.800 --> 00:04:18.380
the obsession from them.

00:04:18.380 --> 00:04:19.980
So it's been like eight years

00:04:19.980 --> 00:04:20.740
of doing that.

00:04:20.740 --> 00:04:21.560
Then sort of just

00:04:21.560 --> 00:04:22.960
after I left academia,

00:04:22.960 --> 00:04:24.000
tried my own startup,

00:04:24.000 --> 00:04:26.020
then joined multiple others

00:04:26.020 --> 00:04:26.920
and just sort of have been

00:04:26.920 --> 00:04:27.640
bouncing around

00:04:27.640 --> 00:04:29.000
as the sort of like

00:04:29.000 --> 00:04:30.420
founding engineer,

00:04:30.640 --> 00:04:32.300
early engineer at startups

00:04:32.300 --> 00:04:32.860
for a while now.

00:04:32.860 --> 00:04:34.020
And yeah,

00:04:34.020 --> 00:04:35.400
Python has been the choice

00:04:35.400 --> 00:04:37.340
ever since like late grad school

00:04:37.340 --> 00:04:38.020
and on.

00:04:38.020 --> 00:04:39.500
I would say it sort of like

00:04:39.500 --> 00:04:41.520
came through the pandas

00:04:41.520 --> 00:04:42.240
and NumPy part,

00:04:42.240 --> 00:04:44.980
but then stuck for the scripting,

00:04:44.980 --> 00:04:45.860
like just power,

00:04:45.860 --> 00:04:46.920
just can throw anything together

00:04:46.920 --> 00:04:47.380
at any time.

00:04:47.380 --> 00:04:48.900
So it seems like

00:04:48.900 --> 00:04:50.280
there were two groups

00:04:50.280 --> 00:04:50.860
that were just

00:04:50.860 --> 00:04:52.660
hammering GPUs,

00:04:52.660 --> 00:04:53.840
hammering them,

00:04:53.840 --> 00:04:55.140
crypto miners

00:04:55.140 --> 00:04:56.700
and AI people.

00:04:57.440 --> 00:04:58.640
but the physicists

00:04:58.640 --> 00:04:59.820
and some of those

00:04:59.820 --> 00:05:00.860
people doing large scale

00:05:00.860 --> 00:05:01.600
research like that,

00:05:01.600 --> 00:05:02.440
they were the OG

00:05:02.440 --> 00:05:04.400
graphics card users,

00:05:04.400 --> 00:05:04.640
right?

00:05:04.640 --> 00:05:05.300
Way before

00:05:05.300 --> 00:05:07.300
crypto mining existed

00:05:07.300 --> 00:05:08.540
and really before AI

00:05:08.540 --> 00:05:09.920
was using graphics cards

00:05:09.920 --> 00:05:10.440
all that much.

00:05:10.440 --> 00:05:11.200
When I was like looking

00:05:11.200 --> 00:05:11.760
at some of the code,

00:05:11.760 --> 00:05:12.400
like pre-CUDA,

00:05:12.400 --> 00:05:13.240
there were some like

00:05:13.240 --> 00:05:13.800
quant traders

00:05:13.800 --> 00:05:14.320
that were doing

00:05:14.320 --> 00:05:15.620
some like crazy stuff

00:05:15.620 --> 00:05:17.180
off of shaders.

00:05:17.180 --> 00:05:18.440
Like it wasn't even CUDA yet,

00:05:18.440 --> 00:05:19.120
but it was shaders

00:05:19.120 --> 00:05:19.980
and they were trying to like

00:05:19.980 --> 00:05:22.020
extract the compute power

00:05:22.020 --> 00:05:22.940
out of them from that.

00:05:22.940 --> 00:05:23.760
So...

00:05:23.760 --> 00:05:24.020
Look,

00:05:24.020 --> 00:05:24.720
if we could shave

00:05:24.720 --> 00:05:25.860
one millisecond off this,

00:05:25.920 --> 00:05:27.320
we can short them all day,

00:05:27.320 --> 00:05:28.020
let's do it.

00:05:28.020 --> 00:05:29.240
But yeah.

00:05:29.240 --> 00:05:29.600
Yeah.

00:05:29.600 --> 00:05:30.700
The physicists,

00:05:30.700 --> 00:05:31.020
I mean,

00:05:31.020 --> 00:05:32.020
it's always been like,

00:05:32.020 --> 00:05:32.160
yeah,

00:05:32.160 --> 00:05:33.040
it's always the get

00:05:33.040 --> 00:05:33.720
as much compute

00:05:33.720 --> 00:05:34.140
as you can

00:05:34.140 --> 00:05:34.680
out of the,

00:05:34.680 --> 00:05:35.080
you know,

00:05:35.080 --> 00:05:35.880
devices you have

00:05:35.880 --> 00:05:37.860
because simulations are slow.

00:05:37.860 --> 00:05:38.140
Yeah.

00:05:38.140 --> 00:05:39.020
I remember when I was

00:05:39.020 --> 00:05:39.480
in grad school

00:05:39.480 --> 00:05:40.260
studying math,

00:05:40.260 --> 00:05:41.740
actually senior year,

00:05:41.740 --> 00:05:42.760
regular college,

00:05:42.760 --> 00:05:43.460
my bachelor's,

00:05:43.460 --> 00:05:45.140
the research team

00:05:45.140 --> 00:05:45.860
that I was on

00:05:45.860 --> 00:05:47.740
had gotten a used

00:05:47.740 --> 00:05:49.640
silicon graphics computer

00:05:49.640 --> 00:05:51.020
for a quarter million dollars

00:05:51.020 --> 00:05:53.100
and some Onyx workstations

00:05:53.100 --> 00:05:54.100
that we all were given to.

00:05:54.100 --> 00:05:54.420
I'm like,

00:05:54.420 --> 00:05:55.360
this thing is so awesome.

00:05:56.000 --> 00:05:56.880
A couple years later,

00:05:56.880 --> 00:05:59.100
like an NVIDIA graphics card

00:05:59.100 --> 00:06:00.840
and like a simple PC

00:06:00.840 --> 00:06:01.720
would crush it.

00:06:01.720 --> 00:06:03.000
Like that's $2,000.

00:06:03.000 --> 00:06:03.560
It's just,

00:06:03.560 --> 00:06:03.980
yeah,

00:06:03.980 --> 00:06:04.860
there's so much power

00:06:04.860 --> 00:06:05.480
in those things

00:06:05.480 --> 00:06:06.680
to be able to harness them

00:06:06.680 --> 00:06:07.100
for whatever,

00:06:07.100 --> 00:06:07.560
I guess.

00:06:07.560 --> 00:06:07.900
Yeah.

00:06:07.900 --> 00:06:08.820
As long as you don't have

00:06:08.820 --> 00:06:09.460
too much branching,

00:06:09.460 --> 00:06:10.240
it works really well.

00:06:10.240 --> 00:06:11.580
Awesome.

00:06:11.580 --> 00:06:14.180
So let's jump in

00:06:14.180 --> 00:06:15.960
and start talking about,

00:06:15.960 --> 00:06:17.460
let's start to talk about

00:06:17.460 --> 00:06:18.000
ChatGP

00:06:18.000 --> 00:06:20.560
and some of this AI stuff

00:06:20.560 --> 00:06:22.640
before we totally get into

00:06:22.640 --> 00:06:23.880
the projects

00:06:23.880 --> 00:06:24.680
that you're working on,

00:06:24.680 --> 00:06:25.280
which brings

00:06:25.280 --> 00:06:27.000
that type of

00:06:27.000 --> 00:06:28.000
conversational

00:06:28.000 --> 00:06:29.380
generative AI

00:06:29.380 --> 00:06:30.840
to things like Pandas,

00:06:30.840 --> 00:06:31.580
as you said.

00:06:31.580 --> 00:06:33.160
But to me,

00:06:33.160 --> 00:06:34.020
I don't know how,

00:06:34.020 --> 00:06:35.120
maybe you've been more

00:06:35.120 --> 00:06:36.260
on the inside than I have,

00:06:36.260 --> 00:06:37.320
but to me,

00:06:37.320 --> 00:06:38.120
it looks like

00:06:38.120 --> 00:06:40.520
AI has been one of those things

00:06:40.520 --> 00:06:41.220
that's 30 years

00:06:41.220 --> 00:06:42.080
in the future forever,

00:06:42.080 --> 00:06:42.580
right?

00:06:42.580 --> 00:06:43.140
It was like

00:06:43.140 --> 00:06:44.160
the Turing test

00:06:44.160 --> 00:06:44.400
and,

00:06:44.400 --> 00:06:44.980
oh,

00:06:44.980 --> 00:06:45.740
here's a chat,

00:06:45.740 --> 00:06:46.860
I'm going to talk to this thing

00:06:46.860 --> 00:06:48.720
and see if it feels human or not.

00:06:48.720 --> 00:06:49.100
And then,

00:06:49.100 --> 00:06:50.020
you know,

00:06:50.020 --> 00:06:51.660
there was like OCR

00:06:51.660 --> 00:06:53.300
and then all of a sudden

00:06:53.300 --> 00:06:55.000
we got self-driving cars,

00:06:55.000 --> 00:06:55.220
like,

00:06:55.220 --> 00:06:55.600
wait a minute,

00:06:55.600 --> 00:06:56.400
that's actually solving

00:06:56.400 --> 00:06:57.040
real problems.

00:06:57.040 --> 00:06:58.660
And then we got things

00:06:58.660 --> 00:06:59.540
like ChatGP

00:06:59.540 --> 00:07:00.700
where people are like,

00:07:00.700 --> 00:07:01.000
wait,

00:07:01.000 --> 00:07:02.060
this can do my job.

00:07:02.060 --> 00:07:03.760
It seems like it,

00:07:03.760 --> 00:07:05.220
just in the last couple of years,

00:07:05.220 --> 00:07:07.240
there's been some inflection point

00:07:07.240 --> 00:07:08.420
in this world.

00:07:08.420 --> 00:07:09.220
What do you think?

00:07:09.300 --> 00:07:09.440
Yeah,

00:07:09.440 --> 00:07:10.480
I think there's sort of like

00:07:10.480 --> 00:07:11.580
two key things

00:07:11.580 --> 00:07:12.080
that have sort of happened

00:07:12.080 --> 00:07:12.540
in the past,

00:07:12.540 --> 00:07:13.420
I guess,

00:07:13.420 --> 00:07:14.420
four or five years,

00:07:14.420 --> 00:07:15.020
four years,

00:07:15.020 --> 00:07:15.340
roughly.

00:07:15.340 --> 00:07:16.480
One is the

00:07:16.480 --> 00:07:17.420
attention is all you need

00:07:17.420 --> 00:07:18.360
paper from Google,

00:07:18.360 --> 00:07:19.740
sort of this transformer

00:07:19.740 --> 00:07:20.660
architecture came out

00:07:20.660 --> 00:07:21.620
and it's sort of a good,

00:07:21.620 --> 00:07:23.120
very hungry model

00:07:23.120 --> 00:07:23.780
that can just sort of

00:07:23.780 --> 00:07:24.820
absorb a lot of facts

00:07:24.820 --> 00:07:26.120
and just like a nice

00:07:26.120 --> 00:07:27.180
learnable key value store

00:07:27.180 --> 00:07:28.180
almost that's stuck.

00:07:28.180 --> 00:07:28.820
So,

00:07:28.820 --> 00:07:29.820
and then the other thing

00:07:29.820 --> 00:07:30.000
is,

00:07:30.000 --> 00:07:31.000
is the GPUs.

00:07:31.000 --> 00:07:31.760
We were sort of just talking

00:07:31.760 --> 00:07:32.480
about GPU compute,

00:07:32.480 --> 00:07:34.160
but this has just been

00:07:34.160 --> 00:07:34.880
really,

00:07:34.880 --> 00:07:36.920
GPU compute has really been

00:07:36.920 --> 00:07:38.500
growing so fast.

00:07:38.500 --> 00:07:38.900
If you like,

00:07:39.000 --> 00:07:39.720
look at the like

00:07:39.720 --> 00:07:40.640
Moore's law equivalent

00:07:40.640 --> 00:07:41.140
type things,

00:07:41.140 --> 00:07:41.900
like it's just,

00:07:41.900 --> 00:07:43.120
it's faster how much

00:07:43.120 --> 00:07:43.780
we're getting flops

00:07:43.780 --> 00:07:44.360
out of these things

00:07:44.360 --> 00:07:45.100
like faster and faster.

00:07:45.100 --> 00:07:45.420
So,

00:07:45.420 --> 00:07:46.980
it's been really nice.

00:07:46.980 --> 00:07:47.420
I mean,

00:07:47.420 --> 00:07:47.940
obviously there'll be

00:07:47.940 --> 00:07:48.480
a wall eventually,

00:07:48.480 --> 00:07:50.220
but it's been good

00:07:50.220 --> 00:07:51.020
riding this like

00:07:51.020 --> 00:07:52.040
exponential curve for a bit.

00:07:52.040 --> 00:07:52.240
Yeah,

00:07:52.240 --> 00:07:53.640
is the benefit

00:07:53.640 --> 00:07:54.260
that we're getting

00:07:54.260 --> 00:07:55.820
from the faster GPUs,

00:07:55.820 --> 00:07:57.000
is that because

00:07:57.000 --> 00:07:58.280
people are able

00:07:58.280 --> 00:07:59.160
to program it better

00:07:59.160 --> 00:07:59.660
and the frameworks

00:07:59.660 --> 00:08:00.180
are getting better

00:08:00.180 --> 00:08:01.000
or because just

00:08:01.000 --> 00:08:02.300
the raw processing power

00:08:02.300 --> 00:08:03.580
is getting better?

00:08:03.580 --> 00:08:04.320
All of the above.

00:08:04.320 --> 00:08:04.620
Okay.

00:08:04.620 --> 00:08:05.200
I think that

00:08:05.200 --> 00:08:06.620
there was a paper

00:08:06.620 --> 00:08:07.700
that tried to dissect this.

00:08:07.700 --> 00:08:08.460
I wish I knew

00:08:08.460 --> 00:08:08.800
the reference,

00:08:08.800 --> 00:08:09.880
but I believe

00:08:09.880 --> 00:08:10.600
that their argument

00:08:10.600 --> 00:08:11.080
was that it was

00:08:11.080 --> 00:08:11.680
actually more

00:08:11.680 --> 00:08:12.720
the processing power

00:08:12.720 --> 00:08:13.540
was getting better.

00:08:13.540 --> 00:08:14.180
The actual like

00:08:14.180 --> 00:08:14.920
physical silicon

00:08:14.920 --> 00:08:15.660
were getting better

00:08:15.660 --> 00:08:16.260
at making that

00:08:16.260 --> 00:08:17.120
for specifically

00:08:17.120 --> 00:08:17.920
this type of stuff.

00:08:17.920 --> 00:08:18.640
But like

00:08:18.640 --> 00:08:19.960
on exponentials,

00:08:19.960 --> 00:08:20.320
but yeah.

00:08:20.460 --> 00:08:21.680
the power that

00:08:21.680 --> 00:08:22.520
those things take,

00:08:22.520 --> 00:08:24.520
I have a gaming system

00:08:24.520 --> 00:08:25.140
over there

00:08:25.140 --> 00:08:26.420
and it has a

00:08:26.420 --> 00:08:29.260
GeForce 2070 Super.

00:08:29.260 --> 00:08:29.980
I don't know

00:08:29.980 --> 00:08:30.460
what the Super

00:08:30.460 --> 00:08:31.140
really gets me,

00:08:31.140 --> 00:08:31.960
but it's better

00:08:31.960 --> 00:08:32.720
than the not Super,

00:08:32.720 --> 00:08:33.140
I guess.

00:08:33.140 --> 00:08:33.960
Anyway,

00:08:33.960 --> 00:08:36.280
that one still plugs

00:08:36.280 --> 00:08:37.560
into the wall normal,

00:08:37.560 --> 00:08:39.040
but the newer ones,

00:08:39.040 --> 00:08:40.160
like the 4090s,

00:08:40.160 --> 00:08:40.960
those things,

00:08:40.960 --> 00:08:42.060
the amount of power

00:08:42.060 --> 00:08:42.680
they consume,

00:08:42.680 --> 00:08:44.360
it's like space heater

00:08:44.360 --> 00:08:45.200
level of power.

00:08:45.200 --> 00:08:45.980
Like,

00:08:45.980 --> 00:08:46.800
I don't know,

00:08:46.800 --> 00:08:47.760
800 watts or something

00:08:47.760 --> 00:08:48.620
just for the GPU.

00:08:48.620 --> 00:08:50.720
You're going to

00:08:50.720 --> 00:08:51.500
brown out the house

00:08:51.500 --> 00:08:52.560
if you plug in

00:08:52.560 --> 00:08:53.260
too many of those.

00:08:53.260 --> 00:08:53.660
Yeah.

00:08:53.660 --> 00:08:54.800
Go look at those

00:08:54.800 --> 00:08:57.580
DGX A100 clusters

00:08:57.580 --> 00:08:58.180
and they've got

00:08:58.180 --> 00:08:59.260
like eight of those

00:08:59.260 --> 00:09:00.360
A100s just stacked

00:09:00.360 --> 00:09:00.820
right in there.

00:09:00.820 --> 00:09:02.160
They take really

00:09:02.160 --> 00:09:03.280
beefy powers of mine.

00:09:03.280 --> 00:09:04.260
It's built right

00:09:04.260 --> 00:09:05.280
directly attached

00:09:05.280 --> 00:09:06.600
to the power plant,

00:09:06.600 --> 00:09:08.520
electrical power plant.

00:09:08.520 --> 00:09:09.560
Nuts.

00:09:09.560 --> 00:09:09.800
Okay,

00:09:09.800 --> 00:09:10.220
so yeah,

00:09:10.220 --> 00:09:10.740
so those things

00:09:10.740 --> 00:09:11.440
are getting really,

00:09:11.440 --> 00:09:12.000
really massive.

00:09:12.000 --> 00:09:12.840
Here's the paper

00:09:12.840 --> 00:09:14.540
Attention is All You Need

00:09:14.540 --> 00:09:15.620
from Google Research.

00:09:15.620 --> 00:09:17.500
What was the story of that?

00:09:18.080 --> 00:09:19.320
How's that play into things?

00:09:19.320 --> 00:09:19.640
Yeah,

00:09:19.640 --> 00:09:21.040
so this came up

00:09:21.040 --> 00:09:22.120
during like machine translation

00:09:22.120 --> 00:09:22.920
sort of research

00:09:22.920 --> 00:09:23.880
at Google

00:09:23.880 --> 00:09:26.140
and the core thing

00:09:26.140 --> 00:09:28.720
is they present this idea

00:09:28.720 --> 00:09:30.680
of instead of just stacking

00:09:30.680 --> 00:09:31.380
these like layers

00:09:31.380 --> 00:09:31.900
of neural nets

00:09:31.900 --> 00:09:33.200
like we're sort of used to,

00:09:33.200 --> 00:09:34.400
they replace the like

00:09:34.400 --> 00:09:35.460
neural net layer

00:09:35.460 --> 00:09:36.580
with this concept

00:09:36.580 --> 00:09:38.740
of a transformer block.

00:09:38.740 --> 00:09:39.860
A transformer block

00:09:39.860 --> 00:09:41.560
has this concept inside

00:09:41.560 --> 00:09:42.760
that's an attention mechanism.

00:09:42.760 --> 00:09:44.000
The attention mechanism

00:09:44.000 --> 00:09:45.460
is effectively

00:09:45.460 --> 00:09:46.420
three matrices

00:09:46.420 --> 00:09:48.100
that you combine

00:09:48.100 --> 00:09:48.860
in a specific order

00:09:48.860 --> 00:09:50.800
and the sort of logic

00:09:50.800 --> 00:09:51.220
is it

00:09:51.220 --> 00:09:52.560
is that one of the vectors

00:09:52.560 --> 00:09:54.120
takes you from some space

00:09:54.120 --> 00:09:55.400
to keys

00:09:55.400 --> 00:09:56.160
so it's almost like

00:09:56.160 --> 00:09:57.520
it's like identifying labels

00:09:57.520 --> 00:09:58.120
out of your data.

00:09:58.120 --> 00:09:59.640
Another one is

00:09:59.640 --> 00:10:01.060
taking you from your data

00:10:01.060 --> 00:10:01.820
to queries

00:10:01.820 --> 00:10:02.960
and then it like

00:10:02.960 --> 00:10:03.760
dot products those

00:10:03.760 --> 00:10:04.540
to find a weight

00:10:04.540 --> 00:10:05.660
and then for the one

00:10:05.660 --> 00:10:06.540
and then another one

00:10:06.540 --> 00:10:08.160
finds weight values

00:10:08.160 --> 00:10:08.900
for your things.

00:10:08.900 --> 00:10:09.720
So it takes this

00:10:09.720 --> 00:10:10.880
query and key,

00:10:10.880 --> 00:10:11.900
you get the weights

00:10:11.900 --> 00:10:12.240
for them

00:10:12.240 --> 00:10:13.160
and then you take

00:10:13.160 --> 00:10:13.840
the ones that were

00:10:13.840 --> 00:10:14.660
sort of the closest

00:10:14.660 --> 00:10:15.840
to get those values

00:10:15.840 --> 00:10:16.900
from the third matrix.

00:10:16.900 --> 00:10:18.020
Just doing it

00:10:18.020 --> 00:10:18.380
sort of like

00:10:18.380 --> 00:10:19.280
looks a little bit

00:10:19.280 --> 00:10:21.660
like accessing an element

00:10:21.660 --> 00:10:22.140
in a dictionary

00:10:22.140 --> 00:10:23.300
like key value lookup

00:10:23.300 --> 00:10:26.340
and it's a differentiable

00:10:26.340 --> 00:10:26.940
version of that

00:10:26.940 --> 00:10:28.700
and it did really well

00:10:28.700 --> 00:10:29.600
on their machine learning

00:10:29.600 --> 00:10:29.980
sorry,

00:10:29.980 --> 00:10:31.140
on their machine translation

00:10:31.140 --> 00:10:31.640
stuff.

00:10:31.640 --> 00:10:31.960
This was,

00:10:31.960 --> 00:10:32.920
I think it's

00:10:32.920 --> 00:10:33.900
like one of the

00:10:33.900 --> 00:10:34.540
first big one

00:10:34.540 --> 00:10:35.280
is this BERT model

00:10:35.280 --> 00:10:37.360
and that paper

00:10:37.360 --> 00:10:37.720
sort of

00:10:37.720 --> 00:10:39.160
the architecture

00:10:39.160 --> 00:10:40.040
of the actual

00:10:40.040 --> 00:10:40.900
neural net code

00:10:40.900 --> 00:10:42.940
is effectively

00:10:42.940 --> 00:10:43.540
unchanged

00:10:43.540 --> 00:10:44.120
from this

00:10:44.120 --> 00:10:45.300
to ChatGPT.

00:10:45.300 --> 00:10:46.220
Like there's

00:10:46.220 --> 00:10:46.880
a lot of stuff

00:10:46.880 --> 00:10:47.240
for like

00:10:47.240 --> 00:10:48.400
milking performance

00:10:48.400 --> 00:10:49.640
and increasing

00:10:49.640 --> 00:10:50.140
stability

00:10:50.140 --> 00:10:50.980
but the actual

00:10:50.980 --> 00:10:51.560
like core

00:10:51.560 --> 00:10:52.280
essence of the

00:10:52.280 --> 00:10:52.860
actual mechanism

00:10:52.860 --> 00:10:53.580
that drives it

00:10:53.580 --> 00:10:54.180
it's the same

00:10:54.180 --> 00:10:55.000
thing since this paper.

00:10:55.000 --> 00:10:55.540
Interesting.

00:10:55.540 --> 00:10:56.640
It's funny that

00:10:56.640 --> 00:10:57.160
Google didn't

00:10:57.160 --> 00:10:57.680
release something

00:10:57.680 --> 00:10:58.040
sooner.

00:10:58.040 --> 00:11:00.220
It's wild

00:11:00.220 --> 00:11:00.820
that they've had

00:11:00.820 --> 00:11:01.880
they keep

00:11:01.880 --> 00:11:02.620
showing off

00:11:02.620 --> 00:11:03.220
that they've got

00:11:03.220 --> 00:11:04.340
like equivalent

00:11:04.340 --> 00:11:05.000
or better things

00:11:05.000 --> 00:11:05.720
at different times

00:11:05.720 --> 00:11:06.320
but then not

00:11:06.320 --> 00:11:06.820
releasing it.

00:11:06.820 --> 00:11:07.840
When Dolly

00:11:07.840 --> 00:11:08.140
happened

00:11:08.140 --> 00:11:08.940
they had Imogen

00:11:08.940 --> 00:11:09.820
Imagine

00:11:09.820 --> 00:11:10.300
I guess

00:11:10.300 --> 00:11:10.560
I don't know

00:11:10.560 --> 00:11:11.020
how you say it

00:11:11.020 --> 00:11:12.240
and what was

00:11:12.240 --> 00:11:13.000
the Party

00:11:13.000 --> 00:11:13.660
as the two

00:11:13.660 --> 00:11:14.100
they had two

00:11:14.100 --> 00:11:14.460
different

00:11:14.460 --> 00:11:15.640
really good

00:11:15.640 --> 00:11:16.320
way better

00:11:16.320 --> 00:11:16.820
than Dolly

00:11:16.820 --> 00:11:17.180
way better

00:11:17.180 --> 00:11:17.740
than stable

00:11:17.740 --> 00:11:18.500
diffusion models

00:11:18.500 --> 00:11:19.260
like the

00:11:19.260 --> 00:11:19.700
that had

00:11:19.700 --> 00:11:20.040
were out

00:11:20.040 --> 00:11:20.640
and they like

00:11:20.640 --> 00:11:21.220
showed it

00:11:21.220 --> 00:11:21.860
demoed it

00:11:21.860 --> 00:11:22.360
like but

00:11:22.360 --> 00:11:23.100
never released

00:11:23.100 --> 00:11:23.820
it to be used

00:11:23.820 --> 00:11:25.020
so yeah

00:11:25.020 --> 00:11:25.920
it's one of these

00:11:25.920 --> 00:11:26.640
who knows

00:11:26.640 --> 00:11:26.940
what's going to

00:11:26.940 --> 00:11:27.440
happen with Google

00:11:27.440 --> 00:11:28.000
if they keep

00:11:28.000 --> 00:11:28.460
holding on to

00:11:28.460 --> 00:11:28.860
these things.

00:11:28.860 --> 00:11:29.860
Yeah well I think

00:11:29.860 --> 00:11:30.240
there was some

00:11:30.240 --> 00:11:30.680
hesitation

00:11:30.680 --> 00:11:31.920
I don't know

00:11:31.920 --> 00:11:32.560
holds up on

00:11:32.560 --> 00:11:33.140
accuracy

00:11:33.140 --> 00:11:34.180
or weird stuff

00:11:34.180 --> 00:11:34.820
like that.

00:11:34.820 --> 00:11:35.640
Sure.

00:11:35.640 --> 00:11:37.260
Yeah now

00:11:37.260 --> 00:11:37.780
cat's out of the

00:11:37.780 --> 00:11:38.080
bag now

00:11:38.080 --> 00:11:38.580
now it's happening.

00:11:38.720 --> 00:11:39.220
Yeah the cat's

00:11:39.220 --> 00:11:39.720
out of the bag

00:11:39.720 --> 00:11:40.500
and people are

00:11:40.500 --> 00:11:41.480
racing to do

00:11:41.480 --> 00:11:42.100
the best they

00:11:42.100 --> 00:11:43.580
can and it's

00:11:43.580 --> 00:11:44.080
going to have

00:11:44.080 --> 00:11:45.480
interesting consequences

00:11:45.480 --> 00:11:46.460
for us both

00:11:46.460 --> 00:11:47.140
positive and

00:11:47.140 --> 00:11:47.760
negative I think

00:11:47.760 --> 00:11:48.400
but you know

00:11:48.400 --> 00:11:49.480
let's leverage the

00:11:49.480 --> 00:11:50.380
positive once the

00:11:50.380 --> 00:11:50.860
cat's out of the

00:11:50.860 --> 00:11:51.580
bag anyway right?

00:11:51.580 --> 00:11:51.860
Yeah.

00:11:51.860 --> 00:11:52.200
Hopefully.

00:11:52.200 --> 00:11:52.800
Might as well

00:11:52.800 --> 00:11:53.520
like ask it

00:11:53.520 --> 00:11:54.220
questions for

00:11:54.220 --> 00:11:54.600
pandas.

00:11:55.260 --> 00:11:56.260
so let's play a

00:11:56.260 --> 00:11:56.620
little bit with

00:11:56.620 --> 00:11:57.680
chat GDP and

00:11:57.680 --> 00:11:58.440
maybe another

00:11:58.440 --> 00:11:58.900
one of these

00:11:58.900 --> 00:12:00.260
image type

00:12:00.260 --> 00:12:00.540
things.

00:12:00.540 --> 00:12:02.020
So I came

00:12:02.020 --> 00:12:02.480
in here and I

00:12:02.480 --> 00:12:03.380
stole this example

00:12:03.380 --> 00:12:04.440
from a blog post

00:12:04.440 --> 00:12:05.400
that's pretty nice

00:12:05.400 --> 00:12:06.660
about not using

00:12:06.660 --> 00:12:08.260
deeply nested

00:12:08.260 --> 00:12:08.540
codes.

00:12:08.540 --> 00:12:09.120
You can use a

00:12:09.120 --> 00:12:09.860
design pattern

00:12:09.860 --> 00:12:11.140
called a

00:12:11.140 --> 00:12:11.880
guarding clause

00:12:11.880 --> 00:12:13.500
that will look

00:12:13.500 --> 00:12:14.080
and say if the

00:12:14.080 --> 00:12:14.860
conditions are not

00:12:14.860 --> 00:12:15.520
right we're going

00:12:15.520 --> 00:12:16.600
to return early

00:12:16.600 --> 00:12:17.760
instead of having

00:12:17.760 --> 00:12:18.800
if something

00:12:18.800 --> 00:12:19.940
if that also

00:12:19.940 --> 00:12:20.840
if something else

00:12:20.840 --> 00:12:21.920
so there's this

00:12:21.920 --> 00:12:22.900
example that is

00:12:22.900 --> 00:12:23.660
written in a poor

00:12:23.660 --> 00:12:25.040
way and it says

00:12:25.040 --> 00:12:25.840
like it's checking

00:12:25.840 --> 00:12:26.760
for a platypus

00:12:26.760 --> 00:12:27.640
so it says if

00:12:27.640 --> 00:12:29.140
self.ismammal

00:12:29.140 --> 00:12:31.020
if self.hasfur

00:12:31.020 --> 00:12:32.600
if self.hasbeak

00:12:32.600 --> 00:12:32.920
etc.

00:12:32.920 --> 00:12:34.440
it's all

00:12:34.440 --> 00:12:35.540
deeply nested

00:12:35.540 --> 00:12:36.440
and just for people

00:12:36.440 --> 00:12:36.960
who haven't played

00:12:36.960 --> 00:12:37.660
with chat GDP

00:12:37.660 --> 00:12:38.600
like I put that

00:12:38.600 --> 00:12:38.980
in and I said

00:12:38.980 --> 00:12:39.820
sure I told her

00:12:39.820 --> 00:12:40.300
I wanted to call

00:12:40.300 --> 00:12:40.880
this arrow

00:12:40.880 --> 00:12:41.440
because it looks

00:12:41.440 --> 00:12:42.020
like an arrow

00:12:42.020 --> 00:12:43.360
and it says

00:12:43.360 --> 00:12:44.500
it tells me a

00:12:44.500 --> 00:12:44.960
little bit about

00:12:44.960 --> 00:12:45.360
this so I'm

00:12:45.360 --> 00:12:46.000
going to ask it

00:12:46.000 --> 00:12:47.700
please rewrite

00:12:47.700 --> 00:12:49.900
arrow to be

00:12:49.900 --> 00:12:51.440
less nested

00:12:51.440 --> 00:12:52.780
with girding

00:12:52.780 --> 00:12:53.820
clauses right

00:12:53.820 --> 00:12:54.380
this is like a

00:12:54.380 --> 00:12:55.020
machine right

00:12:55.020 --> 00:12:55.640
if I tell it

00:12:55.640 --> 00:12:56.980
this what is it

00:12:56.980 --> 00:12:57.440
going to say

00:12:57.440 --> 00:12:58.380
let's see

00:12:58.380 --> 00:12:59.260
it may fail

00:12:59.260 --> 00:12:59.980
but I think

00:12:59.980 --> 00:13:00.360
it's going to

00:13:00.360 --> 00:13:00.720
get it

00:13:00.720 --> 00:13:01.460
it's thinking

00:13:01.460 --> 00:13:01.900
I put it

00:13:01.900 --> 00:13:02.600
I mistakenly

00:13:02.600 --> 00:13:03.000
put it into

00:13:03.000 --> 00:13:03.520
chat GDP

00:13:03.520 --> 00:13:04.220
4 which

00:13:04.220 --> 00:13:04.840
takes longer

00:13:04.840 --> 00:13:06.000
I might switch

00:13:06.000 --> 00:13:06.440
it over to

00:13:06.440 --> 00:13:06.900
3 I don't

00:13:06.900 --> 00:13:07.160
know

00:13:07.160 --> 00:13:08.380
but the

00:13:08.380 --> 00:13:09.160
understanding

00:13:09.160 --> 00:13:09.620
of these

00:13:09.620 --> 00:13:10.040
things

00:13:10.040 --> 00:13:11.040
there's a lot

00:13:11.040 --> 00:13:11.360
of hype

00:13:11.360 --> 00:13:11.840
about it

00:13:11.840 --> 00:13:13.240
like I think

00:13:13.240 --> 00:13:13.640
you kind of

00:13:13.640 --> 00:13:14.180
agree with me

00:13:14.180 --> 00:13:14.760
that maybe

00:13:14.760 --> 00:13:15.280
this hype

00:13:15.280 --> 00:13:16.060
is worthwhile

00:13:16.060 --> 00:13:16.740
here we go

00:13:16.740 --> 00:13:18.240
so look

00:13:18.240 --> 00:13:18.560
at this

00:13:18.560 --> 00:13:19.760
it rewrote

00:13:19.760 --> 00:13:20.000
it said

00:13:20.000 --> 00:13:20.780
if it's

00:13:20.780 --> 00:13:21.200
platypus

00:13:21.200 --> 00:13:21.980
if not

00:13:21.980 --> 00:13:22.400
self is

00:13:22.400 --> 00:13:22.560
man

00:13:22.560 --> 00:13:22.900
will return

00:13:22.900 --> 00:13:23.240
false

00:13:23.240 --> 00:13:23.760
if not

00:13:23.760 --> 00:13:24.200
has fur

00:13:24.200 --> 00:13:24.900
and there's

00:13:24.900 --> 00:13:25.260
no more

00:13:25.260 --> 00:13:25.540
nesting

00:13:25.540 --> 00:13:25.940
that's pretty

00:13:25.940 --> 00:13:26.380
cool right

00:13:26.380 --> 00:13:26.940
yep

00:13:26.940 --> 00:13:27.960
I mean I'm

00:13:27.960 --> 00:13:28.580
sure you've

00:13:28.580 --> 00:13:29.020
you've played

00:13:29.020 --> 00:13:29.320
with stuff

00:13:29.320 --> 00:13:29.660
like this

00:13:29.660 --> 00:13:29.780
right

00:13:29.780 --> 00:13:30.500
yeah

00:13:30.500 --> 00:13:31.100
big user

00:13:31.100 --> 00:13:31.340
of this

00:13:31.340 --> 00:13:31.820
I mean this

00:13:31.820 --> 00:13:32.120
is kind

00:13:32.120 --> 00:13:32.540
of interesting

00:13:32.540 --> 00:13:32.720
right

00:13:32.720 --> 00:13:33.040
like it

00:13:33.040 --> 00:13:33.540
understood

00:13:33.540 --> 00:13:33.900
there was

00:13:33.900 --> 00:13:34.380
a structure

00:13:34.380 --> 00:13:34.760
and it

00:13:34.760 --> 00:13:35.060
understood

00:13:35.060 --> 00:13:35.460
what these

00:13:35.460 --> 00:13:35.700
were

00:13:35.700 --> 00:13:35.920
and it

00:13:35.920 --> 00:13:36.160
understood

00:13:36.160 --> 00:13:36.460
what I

00:13:36.460 --> 00:13:36.640
said

00:13:36.640 --> 00:13:37.340
but what's

00:13:37.340 --> 00:13:37.600
more

00:13:37.600 --> 00:13:38.080
impressive

00:13:38.080 --> 00:13:38.620
is like

00:13:38.620 --> 00:13:39.540
please

00:13:39.540 --> 00:13:40.320
rewrite

00:13:40.320 --> 00:13:41.240
the program

00:13:41.240 --> 00:13:42.920
to check

00:13:42.920 --> 00:13:43.640
for

00:13:43.640 --> 00:13:45.520
crocodiles

00:13:45.520 --> 00:13:46.420
crocodiles

00:13:46.420 --> 00:13:47.900
and you

00:13:47.900 --> 00:13:48.240
know it

00:13:48.240 --> 00:13:49.240
what is it

00:13:49.240 --> 00:13:49.560
going to do

00:13:49.560 --> 00:13:49.700
here

00:13:49.700 --> 00:13:50.420
let's see

00:13:50.420 --> 00:13:51.420
it says

00:13:51.420 --> 00:13:51.900
sure no

00:13:51.900 --> 00:13:52.360
problem

00:13:52.360 --> 00:13:53.420
writes the

00:13:53.420 --> 00:13:53.700
function

00:13:53.700 --> 00:13:54.360
is crocodile

00:13:54.360 --> 00:13:55.060
if not

00:13:55.060 --> 00:13:55.800
self.is

00:13:55.800 --> 00:13:56.400
reptile

00:13:56.400 --> 00:13:57.200
if not

00:13:57.200 --> 00:13:57.920
self.has

00:13:57.920 --> 00:13:58.320
scales

00:13:58.320 --> 00:13:59.000
if not

00:13:59.000 --> 00:13:59.600
self.has

00:13:59.600 --> 00:14:00.400
long snout

00:14:00.400 --> 00:14:01.520
oh my

00:14:01.520 --> 00:14:01.940
gosh

00:14:01.940 --> 00:14:03.020
like it

00:14:03.020 --> 00:14:03.700
not only

00:14:03.700 --> 00:14:04.460
remembered

00:14:04.460 --> 00:14:04.760
oh yeah

00:14:04.760 --> 00:14:05.040
there's

00:14:05.040 --> 00:14:05.680
this new

00:14:05.680 --> 00:14:06.200
version I

00:14:06.200 --> 00:14:06.680
wrote in

00:14:06.680 --> 00:14:06.940
the

00:14:06.940 --> 00:14:07.300
garden

00:14:07.300 --> 00:14:07.660
clause

00:14:07.660 --> 00:14:08.060
format

00:14:08.060 --> 00:14:08.920
but then

00:14:08.920 --> 00:14:09.080
it

00:14:09.080 --> 00:14:09.920
rewrote

00:14:09.920 --> 00:14:10.540
the tests

00:14:10.540 --> 00:14:11.240
I mean

00:14:11.240 --> 00:14:12.060
and then

00:14:12.060 --> 00:14:12.340
it's

00:14:12.340 --> 00:14:12.840
explaining

00:14:12.840 --> 00:14:13.380
to me

00:14:13.380 --> 00:14:14.280
why

00:14:14.280 --> 00:14:14.960
it wrote

00:14:14.960 --> 00:14:15.240
it that

00:14:15.240 --> 00:14:15.500
way

00:14:15.500 --> 00:14:16.700
it's

00:14:16.700 --> 00:14:17.160
just

00:14:17.160 --> 00:14:18.260
it's

00:14:18.260 --> 00:14:18.540
mind

00:14:18.540 --> 00:14:18.840
blowing

00:14:18.840 --> 00:14:19.620
like how

00:14:19.620 --> 00:14:20.820
how much

00:14:20.820 --> 00:14:21.060
you can

00:14:21.060 --> 00:14:21.300
have

00:14:21.300 --> 00:14:21.860
conversations

00:14:21.860 --> 00:14:22.220
with this

00:14:22.220 --> 00:14:22.460
and how

00:14:22.460 --> 00:14:22.620
much

00:14:22.620 --> 00:14:22.760
it

00:14:22.760 --> 00:14:23.340
understands

00:14:23.340 --> 00:14:23.660
things

00:14:23.660 --> 00:14:23.900
like

00:14:23.900 --> 00:14:24.300
code

00:14:24.300 --> 00:14:24.820
or

00:14:24.820 --> 00:14:25.100
physics

00:14:25.100 --> 00:14:25.420
or

00:14:25.420 --> 00:14:25.800
history

00:14:25.800 --> 00:14:26.360
what do

00:14:26.360 --> 00:14:26.520
you think

00:14:26.520 --> 00:14:28.040
yeah it's

00:14:28.040 --> 00:14:28.220
really

00:14:28.220 --> 00:14:28.620
satisfying

00:14:28.620 --> 00:14:29.000
I love

00:14:29.000 --> 00:14:30.060
that it's

00:14:30.060 --> 00:14:30.780
such a

00:14:30.780 --> 00:14:31.260
powerful

00:14:31.260 --> 00:14:32.600
generalist

00:14:32.600 --> 00:14:33.300
at these

00:14:33.300 --> 00:14:33.760
like things

00:14:33.760 --> 00:14:34.100
that are found

00:14:34.100 --> 00:14:34.300
on the

00:14:34.300 --> 00:14:34.500
internet

00:14:34.500 --> 00:14:35.360
so if it

00:14:35.360 --> 00:14:35.800
like if it

00:14:35.800 --> 00:14:36.200
exists and

00:14:36.200 --> 00:14:36.640
it's in

00:14:36.640 --> 00:14:36.940
the training

00:14:36.940 --> 00:14:37.320
data it

00:14:37.320 --> 00:14:37.900
can do so

00:14:37.900 --> 00:14:38.300
good at

00:14:38.300 --> 00:14:39.140
synthesizing

00:14:39.140 --> 00:14:39.820
composing

00:14:39.820 --> 00:14:40.360
bridging

00:14:40.360 --> 00:14:40.900
between them

00:14:40.900 --> 00:14:41.620
it's really

00:14:41.620 --> 00:14:42.040
satisfying

00:14:42.040 --> 00:14:42.880
so it's

00:14:42.880 --> 00:14:43.240
really fun

00:14:43.240 --> 00:14:43.620
asking it

00:14:43.620 --> 00:14:44.300
to as you're

00:14:44.300 --> 00:14:44.800
doing rewriting

00:14:44.800 --> 00:14:45.480
changing

00:14:45.480 --> 00:14:45.900
language

00:14:45.900 --> 00:14:46.500
I've been

00:14:46.500 --> 00:14:47.240
getting into

00:14:47.240 --> 00:14:47.580
a lot more

00:14:47.580 --> 00:14:47.940
JavaScript

00:14:47.940 --> 00:14:48.300
because I'm

00:14:48.300 --> 00:14:48.540
doing a

00:14:48.540 --> 00:14:48.800
bunch more

00:14:48.800 --> 00:14:49.200
like front end

00:14:49.200 --> 00:14:49.540
stuff and

00:14:49.540 --> 00:14:50.120
just I

00:14:50.120 --> 00:14:50.640
sometimes will

00:14:50.640 --> 00:14:51.220
write a quick

00:14:51.220 --> 00:14:51.780
one liner in

00:14:51.780 --> 00:14:52.040
Python

00:14:52.040 --> 00:14:52.480
that I know

00:14:52.480 --> 00:14:52.840
how to do

00:14:52.840 --> 00:14:53.240
with a

00:14:53.240 --> 00:14:54.060
list

00:14:54.060 --> 00:14:54.460
comprehension

00:14:54.460 --> 00:14:55.040
and then I'll

00:14:55.040 --> 00:14:55.260
be like

00:14:55.260 --> 00:14:55.820
make this

00:14:55.820 --> 00:14:56.340
for me

00:14:56.340 --> 00:14:56.960
in JavaScript

00:14:56.960 --> 00:14:57.360
because I

00:14:57.360 --> 00:14:57.800
can't figure

00:14:57.800 --> 00:14:58.300
out this

00:14:58.300 --> 00:14:59.100
like how to

00:14:59.100 --> 00:14:59.720
initialize an

00:14:59.720 --> 00:15:00.180
array with

00:15:00.180 --> 00:15:00.920
integers in

00:15:00.920 --> 00:15:01.100
it

00:15:01.100 --> 00:15:02.880
it's great

00:15:02.880 --> 00:15:03.240
for just

00:15:03.240 --> 00:15:03.660
like really

00:15:03.660 --> 00:15:04.240
quick spot

00:15:04.240 --> 00:15:04.520
checks

00:15:04.520 --> 00:15:05.400
and it also

00:15:05.400 --> 00:15:05.800
seems to

00:15:05.800 --> 00:15:06.400
know a lot

00:15:06.400 --> 00:15:06.780
about like

00:15:06.780 --> 00:15:07.320
really popular

00:15:07.320 --> 00:15:07.660
frameworks

00:15:07.660 --> 00:15:08.640
so you can

00:15:08.640 --> 00:15:09.080
ask it

00:15:09.080 --> 00:15:09.680
things that

00:15:09.680 --> 00:15:10.680
are surprisingly

00:15:10.680 --> 00:15:11.340
detailed about

00:15:11.340 --> 00:15:11.860
like a

00:15:11.860 --> 00:15:12.600
how would you

00:15:12.600 --> 00:15:13.060
do cores

00:15:13.060 --> 00:15:14.140
with requests

00:15:14.140 --> 00:15:15.360
in fast

00:15:15.360 --> 00:15:15.640
API

00:15:15.640 --> 00:15:16.400
and it can

00:15:16.400 --> 00:15:17.120
help you

00:15:17.120 --> 00:15:17.620
find that

00:15:17.620 --> 00:15:18.100
exact

00:15:18.100 --> 00:15:18.840
middleware

00:15:18.840 --> 00:15:19.160
you know

00:15:19.160 --> 00:15:19.460
it's like

00:15:19.460 --> 00:15:20.060
boilerplate-y

00:15:20.060 --> 00:15:20.540
but it's great

00:15:20.540 --> 00:15:21.320
that it can

00:15:21.320 --> 00:15:21.840
just be a

00:15:21.840 --> 00:15:22.280
source for that

00:15:22.280 --> 00:15:24.940
this portion

00:15:24.940 --> 00:15:25.540
of Talk Python

00:15:25.540 --> 00:15:25.900
to Me is

00:15:25.900 --> 00:15:26.200
brought to

00:15:26.200 --> 00:15:26.460
you by

00:15:26.460 --> 00:15:27.280
brilliant.org

00:15:27.280 --> 00:15:28.160
you're a

00:15:28.160 --> 00:15:28.740
curious person

00:15:28.740 --> 00:15:29.200
who loves

00:15:29.200 --> 00:15:29.500
to learn

00:15:29.500 --> 00:15:30.240
about technology

00:15:30.240 --> 00:15:30.940
I know

00:15:30.940 --> 00:15:31.440
because you're

00:15:31.440 --> 00:15:31.820
listening to

00:15:31.820 --> 00:15:32.180
my show

00:15:32.180 --> 00:15:32.980
that's why

00:15:32.980 --> 00:15:33.240
you would

00:15:33.240 --> 00:15:33.840
also be

00:15:33.840 --> 00:15:34.200
interested

00:15:34.200 --> 00:15:34.880
in this

00:15:34.880 --> 00:15:35.560
episode's

00:15:35.560 --> 00:15:35.940
sponsor

00:15:35.940 --> 00:15:36.940
brilliant.org

00:15:36.940 --> 00:15:38.040
brilliant.org

00:15:38.040 --> 00:15:38.820
is entertaining

00:15:38.820 --> 00:15:39.400
engaging

00:15:39.400 --> 00:15:40.080
and effective

00:15:40.080 --> 00:15:40.900
if you're

00:15:40.900 --> 00:15:41.240
like me

00:15:41.240 --> 00:15:41.660
and feel

00:15:41.660 --> 00:15:42.240
that binging

00:15:42.240 --> 00:15:42.960
yet another

00:15:42.960 --> 00:15:43.700
sitcom series

00:15:43.700 --> 00:15:44.500
is kind

00:15:44.500 --> 00:15:44.800
of missing

00:15:44.800 --> 00:15:45.420
out on life

00:15:45.420 --> 00:15:45.840
then how

00:15:45.840 --> 00:15:46.360
about spending

00:15:46.360 --> 00:15:46.940
30 minutes

00:15:46.940 --> 00:15:47.240
a day

00:15:47.240 --> 00:15:47.740
getting better

00:15:47.740 --> 00:15:48.420
at programming

00:15:48.420 --> 00:15:49.440
or deepening

00:15:49.440 --> 00:15:50.060
your knowledge

00:15:50.060 --> 00:15:50.660
and foundations

00:15:50.660 --> 00:15:51.320
of topics

00:15:51.320 --> 00:15:51.940
you've always

00:15:51.940 --> 00:15:52.320
wanted to

00:15:52.320 --> 00:15:52.860
learn better

00:15:52.860 --> 00:15:54.100
like chemistry

00:15:54.100 --> 00:15:54.920
or biology

00:15:54.920 --> 00:15:55.540
over on

00:15:55.540 --> 00:15:55.900
brilliant

00:15:55.900 --> 00:15:57.080
brilliant

00:15:57.080 --> 00:15:57.880
has thousands

00:15:57.880 --> 00:15:58.460
of lessons

00:15:58.460 --> 00:15:59.460
from foundational

00:15:59.460 --> 00:16:00.240
and advanced

00:16:00.240 --> 00:16:01.120
math to data

00:16:01.120 --> 00:16:01.560
science

00:16:01.560 --> 00:16:02.360
algorithms

00:16:02.360 --> 00:16:03.060
neural networks

00:16:03.060 --> 00:16:03.480
and more

00:16:03.480 --> 00:16:04.100
with new

00:16:04.100 --> 00:16:04.880
lessons added

00:16:04.880 --> 00:16:05.300
monthly

00:16:05.300 --> 00:16:06.460
when you sign up

00:16:06.460 --> 00:16:06.860
for a free

00:16:06.860 --> 00:16:07.220
trial

00:16:07.220 --> 00:16:08.000
they ask a couple

00:16:08.000 --> 00:16:08.500
of questions

00:16:08.500 --> 00:16:09.020
about what

00:16:09.020 --> 00:16:09.580
you're interested

00:16:09.580 --> 00:16:10.480
in as well as

00:16:10.480 --> 00:16:10.880
your background

00:16:10.880 --> 00:16:11.320
knowledge

00:16:11.320 --> 00:16:11.940
then you're

00:16:11.940 --> 00:16:12.540
presented with a

00:16:12.540 --> 00:16:13.080
cool learning

00:16:13.080 --> 00:16:13.660
path to get

00:16:13.660 --> 00:16:14.140
you started

00:16:14.140 --> 00:16:14.720
right where

00:16:14.720 --> 00:16:15.000
you should

00:16:15.000 --> 00:16:15.200
be

00:16:15.200 --> 00:16:16.080
personally

00:16:16.080 --> 00:16:16.780
I'm going

00:16:16.780 --> 00:16:17.380
back to some

00:16:17.380 --> 00:16:18.260
science foundations

00:16:18.260 --> 00:16:19.260
I love chemistry

00:16:19.260 --> 00:16:19.680
and physics

00:16:19.680 --> 00:16:20.300
but haven't

00:16:20.300 --> 00:16:20.720
touched them

00:16:20.720 --> 00:16:21.680
for 20 years

00:16:21.680 --> 00:16:23.100
so I'm looking

00:16:23.100 --> 00:16:23.860
forward to playing

00:16:23.860 --> 00:16:24.660
with PV

00:16:24.660 --> 00:16:25.800
equals NRT

00:16:25.800 --> 00:16:26.420
you know

00:16:26.420 --> 00:16:27.320
the ideal gas

00:16:27.320 --> 00:16:27.560
law

00:16:27.560 --> 00:16:28.360
and all the

00:16:28.360 --> 00:16:29.020
other foundations

00:16:29.020 --> 00:16:29.680
of our world

00:16:29.680 --> 00:16:30.860
with brilliant

00:16:30.860 --> 00:16:31.360
you'll get

00:16:31.360 --> 00:16:31.960
hands-on

00:16:31.960 --> 00:16:32.460
on a whole

00:16:32.460 --> 00:16:33.040
universe of

00:16:33.040 --> 00:16:33.500
concepts

00:16:33.500 --> 00:16:34.100
in math

00:16:34.100 --> 00:16:34.760
science

00:16:34.760 --> 00:16:35.600
computer science

00:16:35.600 --> 00:16:36.540
and solve

00:16:36.540 --> 00:16:37.200
fun problems

00:16:37.200 --> 00:16:37.700
while growing

00:16:37.700 --> 00:16:38.220
your critical

00:16:38.220 --> 00:16:39.200
thinking skills

00:16:39.200 --> 00:16:39.980
of course

00:16:39.980 --> 00:16:40.480
you could just

00:16:40.480 --> 00:16:41.360
visit brilliant.org

00:16:41.360 --> 00:16:41.820
directly

00:16:41.820 --> 00:16:42.700
its url is

00:16:42.700 --> 00:16:42.980
right there

00:16:42.980 --> 00:16:43.380
in the name

00:16:43.380 --> 00:16:43.780
isn't it

00:16:43.780 --> 00:16:44.600
but please

00:16:44.600 --> 00:16:44.940
use our

00:16:44.940 --> 00:16:45.480
link because

00:16:45.480 --> 00:16:45.800
you'll get

00:16:45.800 --> 00:16:46.580
something extra

00:16:46.580 --> 00:16:47.920
20% off

00:16:47.920 --> 00:16:48.560
an annual

00:16:48.560 --> 00:16:49.020
premium

00:16:49.020 --> 00:16:49.600
subscription

00:16:49.600 --> 00:16:50.740
so sign up

00:16:50.740 --> 00:16:51.240
today at

00:16:51.240 --> 00:16:52.200
talkpython.fm

00:16:52.200 --> 00:16:52.820
slash brilliant

00:16:52.820 --> 00:16:53.660
and start a

00:16:53.660 --> 00:16:54.460
7 day free

00:16:54.460 --> 00:16:54.820
trial

00:16:54.820 --> 00:16:55.380
that's

00:16:55.380 --> 00:16:56.460
talkpython.fm

00:16:56.460 --> 00:16:57.120
slash brilliant

00:16:57.120 --> 00:16:57.900
the link is

00:16:57.900 --> 00:16:58.400
in your podcast

00:16:58.400 --> 00:16:59.100
player show notes

00:16:59.100 --> 00:16:59.980
thank you to

00:16:59.980 --> 00:17:00.620
brilliant.org

00:17:00.620 --> 00:17:01.160
for supporting

00:17:01.160 --> 00:17:01.520
the show

00:17:01.520 --> 00:17:04.900
it's insane

00:17:04.900 --> 00:17:05.840
I don't know

00:17:05.840 --> 00:17:06.440
if I've got it

00:17:06.440 --> 00:17:07.860
in my history

00:17:07.860 --> 00:17:08.100
here

00:17:08.100 --> 00:17:09.460
we're rewriting

00:17:09.460 --> 00:17:10.920
our mobile

00:17:10.920 --> 00:17:11.420
apps for

00:17:11.420 --> 00:17:11.900
talkbython

00:17:11.900 --> 00:17:12.400
training for

00:17:12.400 --> 00:17:12.920
our courses

00:17:12.920 --> 00:17:13.900
in

00:17:13.900 --> 00:17:14.480
Flutter

00:17:14.480 --> 00:17:15.500
and we're

00:17:15.500 --> 00:17:15.860
having a

00:17:15.860 --> 00:17:16.200
problem

00:17:16.200 --> 00:17:16.900
downloading

00:17:16.900 --> 00:17:17.260
stuff

00:17:17.260 --> 00:17:18.080
concurrently

00:17:18.080 --> 00:17:18.820
using a

00:17:18.820 --> 00:17:19.180
particular

00:17:19.180 --> 00:17:19.840
library

00:17:19.840 --> 00:17:20.100
in

00:17:20.100 --> 00:17:20.420
Flutter

00:17:20.420 --> 00:17:21.940
and so

00:17:21.940 --> 00:17:22.740
I asked

00:17:22.740 --> 00:17:23.000
it

00:17:23.000 --> 00:17:24.060
I said

00:17:24.060 --> 00:17:24.360
hey

00:17:24.360 --> 00:17:25.180
I want

00:17:25.180 --> 00:17:25.520
some help

00:17:25.520 --> 00:17:25.820
with a

00:17:25.820 --> 00:17:26.080
Flutter

00:17:26.080 --> 00:17:26.420
and Dart

00:17:26.420 --> 00:17:26.740
programs

00:17:26.740 --> 00:17:27.140
what do you

00:17:27.140 --> 00:17:27.300
want

00:17:27.300 --> 00:17:27.500
it says

00:17:27.500 --> 00:17:28.360
I'm using

00:17:28.360 --> 00:17:28.580
the

00:17:28.580 --> 00:17:29.120
dio

00:17:29.120 --> 00:17:29.720
package

00:17:29.720 --> 00:17:29.940
do you

00:17:29.940 --> 00:17:30.080
know

00:17:30.080 --> 00:17:30.200
it

00:17:30.200 --> 00:17:30.580
oh yes

00:17:30.580 --> 00:17:30.780
I'm

00:17:30.780 --> 00:17:31.100
familiar

00:17:31.100 --> 00:17:31.460
it does

00:17:31.460 --> 00:17:31.860
HTTP

00:17:31.860 --> 00:17:32.240
client

00:17:32.240 --> 00:17:32.500
stuff

00:17:32.500 --> 00:17:32.660
for

00:17:32.660 --> 00:17:32.940
Dart

00:17:32.940 --> 00:17:33.280
okay

00:17:33.280 --> 00:17:34.020
I want

00:17:34.020 --> 00:17:34.160
to

00:17:34.160 --> 00:17:34.560
download

00:17:34.560 --> 00:17:35.100
binary

00:17:35.100 --> 00:17:35.560
video

00:17:35.560 --> 00:17:35.960
files

00:17:35.960 --> 00:17:36.260
and a

00:17:36.260 --> 00:17:36.480
bunch

00:17:36.480 --> 00:17:36.640
of

00:17:36.640 --> 00:17:36.800
them

00:17:36.800 --> 00:17:37.120
given

00:17:37.120 --> 00:17:37.260
a

00:17:37.260 --> 00:17:37.680
URL

00:17:37.680 --> 00:17:38.220
I want

00:17:38.220 --> 00:17:38.380
to do

00:17:38.380 --> 00:17:38.580
them

00:17:38.580 --> 00:17:39.220
concurrently

00:17:39.220 --> 00:17:39.620
with three

00:17:39.620 --> 00:17:39.840
of them

00:17:39.840 --> 00:17:40.020
at a

00:17:40.020 --> 00:17:40.400
time

00:17:40.400 --> 00:17:41.060
write the

00:17:41.060 --> 00:17:41.320
code for

00:17:41.320 --> 00:17:41.400
that

00:17:41.400 --> 00:17:41.740
and boom

00:17:41.740 --> 00:17:42.000
it just

00:17:42.000 --> 00:17:42.680
writes it

00:17:42.680 --> 00:17:43.680
like using

00:17:43.680 --> 00:17:44.420
that library

00:17:44.420 --> 00:17:45.040
I told it

00:17:45.040 --> 00:17:45.580
about not

00:17:45.580 --> 00:17:46.160
just Dart

00:17:46.160 --> 00:17:47.720
so that's

00:17:47.720 --> 00:17:48.320
incredible that

00:17:48.320 --> 00:17:48.800
we can get

00:17:48.800 --> 00:17:49.240
this kind

00:17:49.240 --> 00:17:49.740
of assistance

00:17:49.740 --> 00:17:50.960
for knowledge

00:17:50.960 --> 00:17:51.500
and programming

00:17:51.500 --> 00:17:52.100
like you'll

00:17:52.100 --> 00:17:52.580
never find

00:17:52.580 --> 00:17:53.440
I mean I take

00:17:53.440 --> 00:17:53.660
that back

00:17:53.660 --> 00:17:54.080
you might

00:17:54.080 --> 00:17:54.520
find that

00:17:54.520 --> 00:17:55.040
if there's

00:17:55.040 --> 00:17:55.600
a very

00:17:55.600 --> 00:17:56.200
specific

00:17:56.200 --> 00:17:56.640
stack

00:17:56.640 --> 00:17:57.000
overflow

00:17:57.000 --> 00:17:57.560
question

00:17:57.560 --> 00:17:58.520
or something

00:17:58.520 --> 00:17:58.820
but if

00:17:58.820 --> 00:17:59.200
there's not

00:17:59.200 --> 00:17:59.700
a write-on

00:17:59.700 --> 00:18:00.500
question for it

00:18:00.500 --> 00:18:00.760
you're not

00:18:00.760 --> 00:18:01.140
going to find

00:18:01.140 --> 00:18:01.300
it

00:18:01.300 --> 00:18:02.360
I love

00:18:02.360 --> 00:18:03.080
when you

00:18:03.080 --> 00:18:04.120
know the

00:18:04.120 --> 00:18:04.320
stack

00:18:04.320 --> 00:18:04.600
overflow

00:18:04.600 --> 00:18:05.040
would exist

00:18:05.040 --> 00:18:05.560
for a

00:18:05.560 --> 00:18:05.900
variant

00:18:05.900 --> 00:18:06.220
of your

00:18:06.220 --> 00:18:06.580
question

00:18:06.580 --> 00:18:07.400
but the

00:18:07.400 --> 00:18:08.560
exact one

00:18:08.560 --> 00:18:08.820
doesn't

00:18:08.820 --> 00:18:09.160
exist

00:18:09.160 --> 00:18:09.500
and you

00:18:09.500 --> 00:18:09.800
have to

00:18:09.800 --> 00:18:10.180
go grab

00:18:10.180 --> 00:18:10.640
the three

00:18:10.640 --> 00:18:10.860
of them

00:18:10.860 --> 00:18:11.460
to synthesize

00:18:11.460 --> 00:18:12.240
and it's

00:18:12.240 --> 00:18:12.660
just great

00:18:12.660 --> 00:18:13.000
at that

00:18:13.000 --> 00:18:14.440
it also

00:18:14.440 --> 00:18:15.240
is pretty

00:18:15.240 --> 00:18:15.520
good at

00:18:15.520 --> 00:18:16.140
fixing errors

00:18:16.140 --> 00:18:17.040
sometimes it

00:18:17.040 --> 00:18:17.360
can walk

00:18:17.360 --> 00:18:17.880
itself into

00:18:17.880 --> 00:18:18.800
lying to

00:18:18.800 --> 00:18:18.920
you

00:18:18.920 --> 00:18:19.260
repeatedly

00:18:19.260 --> 00:18:19.560
but

00:18:19.560 --> 00:18:20.240
that's

00:18:20.240 --> 00:18:23.040
so problematic

00:18:23.040 --> 00:18:23.460
yeah

00:18:23.460 --> 00:18:24.420
but you can

00:18:24.420 --> 00:18:24.960
also ask

00:18:24.960 --> 00:18:25.080
it

00:18:25.080 --> 00:18:25.700
here's

00:18:25.700 --> 00:18:25.880
my

00:18:25.880 --> 00:18:26.320
program

00:18:26.320 --> 00:18:26.940
are there

00:18:26.940 --> 00:18:27.400
security

00:18:27.400 --> 00:18:27.900
vulnerabilities

00:18:27.900 --> 00:18:28.400
or do

00:18:28.400 --> 00:18:28.620
you see

00:18:28.620 --> 00:18:28.780
any

00:18:28.780 --> 00:18:29.080
bugs

00:18:29.080 --> 00:18:29.520
and it'll

00:18:29.520 --> 00:18:30.300
find them

00:18:30.300 --> 00:18:30.600
yep

00:18:30.600 --> 00:18:31.200
yeah

00:18:31.200 --> 00:18:31.460
it's

00:18:31.460 --> 00:18:32.100
nuts

00:18:32.100 --> 00:18:33.400
so people

00:18:33.400 --> 00:18:33.740
may be

00:18:33.740 --> 00:18:34.020
wondering

00:18:34.020 --> 00:18:34.540
we haven't

00:18:34.540 --> 00:18:34.880
talked yet

00:18:34.880 --> 00:18:35.320
about your

00:18:35.320 --> 00:18:36.240
project sketch

00:18:36.240 --> 00:18:36.880
why I'm

00:18:36.880 --> 00:18:37.480
talking so much

00:18:37.480 --> 00:18:37.900
about chat

00:18:37.900 --> 00:18:38.120
CP

00:18:38.120 --> 00:18:39.340
so that is

00:18:39.340 --> 00:18:39.940
kind of

00:18:39.940 --> 00:18:40.860
the style

00:18:40.860 --> 00:18:41.700
of AI

00:18:41.700 --> 00:18:42.580
that your

00:18:42.580 --> 00:18:43.480
project brings

00:18:43.480 --> 00:18:44.260
to pandas

00:18:44.260 --> 00:18:44.580
which we're

00:18:44.580 --> 00:18:44.840
going to get

00:18:44.840 --> 00:18:45.020
to

00:18:45.020 --> 00:18:45.580
but I want

00:18:45.580 --> 00:18:45.840
to touch

00:18:45.840 --> 00:18:46.340
on two

00:18:46.340 --> 00:18:46.780
more really

00:18:46.780 --> 00:18:47.320
quick AI

00:18:47.320 --> 00:18:47.820
things that

00:18:47.820 --> 00:18:48.480
we'll dive

00:18:48.480 --> 00:18:48.860
into it

00:18:48.860 --> 00:18:49.480
the other

00:18:49.480 --> 00:18:50.060
is this

00:18:50.060 --> 00:18:50.760
just around

00:18:50.760 --> 00:18:51.340
images

00:18:51.340 --> 00:18:52.160
just the

00:18:52.160 --> 00:18:52.960
ability to

00:18:52.960 --> 00:18:53.600
ask questions

00:18:53.600 --> 00:18:54.080
you've already

00:18:54.080 --> 00:18:54.480
mentioned

00:18:54.480 --> 00:18:55.280
three

00:18:55.280 --> 00:18:55.920
dolly

00:18:55.920 --> 00:18:56.860
imagine

00:18:56.860 --> 00:18:57.420
and then

00:18:57.420 --> 00:18:57.900
the other

00:18:57.900 --> 00:18:58.240
one I don't

00:18:58.240 --> 00:18:58.680
remember from

00:18:58.680 --> 00:18:59.160
Google that

00:18:59.160 --> 00:18:59.720
they haven't

00:18:59.720 --> 00:19:00.340
put out yet

00:19:00.340 --> 00:19:01.140
a mid

00:19:01.140 --> 00:19:01.500
journey is

00:19:01.500 --> 00:19:01.820
another

00:19:01.820 --> 00:19:02.740
just the

00:19:02.740 --> 00:19:03.220
ability to

00:19:03.220 --> 00:19:03.680
say hey

00:19:03.680 --> 00:19:04.480
I want a

00:19:04.480 --> 00:19:04.880
picture of

00:19:04.880 --> 00:19:05.040
this

00:19:05.040 --> 00:19:05.480
no actually

00:19:05.480 --> 00:19:06.160
change it

00:19:06.160 --> 00:19:06.780
slightly like

00:19:06.780 --> 00:19:07.020
that

00:19:07.020 --> 00:19:07.340
it's

00:19:07.340 --> 00:19:07.820
mind

00:19:07.820 --> 00:19:08.100
blowing

00:19:08.100 --> 00:19:08.580
they're a lot

00:19:08.580 --> 00:19:08.820
of fun

00:19:08.820 --> 00:19:09.300
they're great

00:19:09.300 --> 00:19:10.320
for sparking

00:19:10.320 --> 00:19:10.700
creativity

00:19:10.700 --> 00:19:11.220
or having

00:19:11.220 --> 00:19:11.600
idea and

00:19:11.600 --> 00:19:11.860
just getting

00:19:11.860 --> 00:19:12.240
to see it

00:19:12.240 --> 00:19:12.540
in front of

00:19:12.540 --> 00:19:12.620
you

00:19:12.620 --> 00:19:13.020
I think

00:19:13.020 --> 00:19:13.400
it's more

00:19:13.400 --> 00:19:13.980
impressive to

00:19:13.980 --> 00:19:14.260
me

00:19:14.260 --> 00:19:15.000
than even

00:19:15.000 --> 00:19:16.060
this chat

00:19:16.060 --> 00:19:16.420
GTP

00:19:16.420 --> 00:19:16.940
telling me

00:19:16.940 --> 00:19:17.320
I want a

00:19:17.320 --> 00:19:17.620
GTP

00:19:17.620 --> 00:19:29.140
I want an

00:19:29.140 --> 00:19:29.460
artificial

00:19:29.460 --> 00:19:30.160
intelligence

00:19:30.160 --> 00:19:30.600
panda

00:19:30.600 --> 00:19:31.620
and it

00:19:31.620 --> 00:19:32.100
came up

00:19:32.100 --> 00:19:32.440
and I

00:19:32.440 --> 00:19:32.700
want it

00:19:32.700 --> 00:19:33.480
photorealistic

00:19:33.480 --> 00:19:34.200
in the style

00:19:34.200 --> 00:19:34.680
of National

00:19:34.680 --> 00:19:35.220
Geographic

00:19:35.220 --> 00:19:36.200
and so

00:19:36.200 --> 00:19:36.580
it gave

00:19:36.580 --> 00:19:36.800
me

00:19:36.800 --> 00:19:37.720
this panda

00:19:37.720 --> 00:19:38.180
you can see

00:19:38.180 --> 00:19:39.160
beautiful whiskers

00:19:39.160 --> 00:19:39.860
but just

00:19:39.860 --> 00:19:40.460
behind the

00:19:40.460 --> 00:19:40.720
ear

00:19:40.720 --> 00:19:41.240
you can see

00:19:41.240 --> 00:19:41.900
the fur is

00:19:41.900 --> 00:19:42.180
gone

00:19:42.180 --> 00:19:42.520
and it's

00:19:42.520 --> 00:19:43.240
like

00:19:43.240 --> 00:19:44.780
an android

00:19:44.780 --> 00:19:45.300
type of

00:19:45.300 --> 00:19:45.660
creature

00:19:45.660 --> 00:19:47.200
that is a

00:19:47.200 --> 00:19:48.000
beautiful

00:19:48.000 --> 00:19:48.380
picture

00:19:48.380 --> 00:19:49.760
it's pretty

00:19:49.760 --> 00:19:50.220
accurate

00:19:50.220 --> 00:19:51.380
it's nuts

00:19:51.380 --> 00:19:51.760
that I can

00:19:51.760 --> 00:19:52.280
just go talk

00:19:52.280 --> 00:19:52.560
to these

00:19:52.560 --> 00:19:52.920
systems

00:19:52.920 --> 00:19:53.400
and ask

00:19:53.400 --> 00:19:53.700
them these

00:19:53.700 --> 00:19:54.100
questions

00:19:54.100 --> 00:19:55.200
I find it

00:19:55.200 --> 00:19:55.360
interesting

00:19:55.360 --> 00:19:55.840
comparing

00:19:55.840 --> 00:19:56.300
the chat

00:19:56.300 --> 00:19:56.660
GPT

00:19:56.660 --> 00:19:57.720
and the

00:19:57.720 --> 00:19:58.700
mid-journey

00:19:58.700 --> 00:19:58.960
style

00:19:58.960 --> 00:20:02.340
I completely

00:20:02.340 --> 00:20:02.680
get it

00:20:02.680 --> 00:20:03.060
it's very

00:20:03.060 --> 00:20:03.380
visceral

00:20:03.380 --> 00:20:04.140
it's also

00:20:04.140 --> 00:20:05.100
from another

00:20:05.100 --> 00:20:05.660
perspective

00:20:05.660 --> 00:20:06.340
I think

00:20:06.340 --> 00:20:06.620
of the

00:20:06.620 --> 00:20:06.920
weights

00:20:06.920 --> 00:20:07.220
and the

00:20:07.220 --> 00:20:07.460
scale

00:20:07.460 --> 00:20:07.680
of the

00:20:07.680 --> 00:20:07.880
model

00:20:07.880 --> 00:20:08.480
and

00:20:08.480 --> 00:20:09.500
these

00:20:09.500 --> 00:20:10.100
image

00:20:10.100 --> 00:20:10.380
ones

00:20:10.380 --> 00:20:10.580
that

00:20:10.580 --> 00:20:11.120
solve

00:20:11.120 --> 00:20:11.320
all

00:20:11.320 --> 00:20:11.580
images

00:20:11.580 --> 00:20:11.920
are

00:20:11.920 --> 00:20:12.520
so

00:20:12.520 --> 00:20:12.780
much

00:20:12.780 --> 00:20:13.120
smaller

00:20:13.120 --> 00:20:13.440
in

00:20:13.440 --> 00:20:13.760
scale

00:20:13.760 --> 00:20:14.020
than

00:20:14.020 --> 00:20:14.220
these

00:20:14.220 --> 00:20:14.680
language

00:20:14.680 --> 00:20:14.980
ones

00:20:14.980 --> 00:20:15.120
that

00:20:15.120 --> 00:20:15.240
have

00:20:15.240 --> 00:20:15.420
all

00:20:15.420 --> 00:20:15.600
this

00:20:15.600 --> 00:20:15.820
other

00:20:15.820 --> 00:20:16.240
data

00:20:16.240 --> 00:20:16.400
and

00:20:16.400 --> 00:20:20.520
stuff. So it's fascinating how complex language is. Yeah, I know the smarts is so much less,

00:20:20.520 --> 00:20:26.380
but just something about it actually came up with a creative picture that never existed.

00:20:26.380 --> 00:20:31.020
Yeah. Right. You could show this to somebody like, oh, that's an artificial panda. That's

00:20:31.020 --> 00:20:36.960
insane. Right. But it's, but I just gave it like a sentence or two. Yeah. Yeah. Yeah. I don't know.

00:20:36.960 --> 00:20:41.500
Yeah. This, it's a sort of a technical interpreter, but I love it because it's

00:20:41.500 --> 00:20:46.600
like this, it's just phenomenal interpolation. It's like through semantically labeled space. So

00:20:46.600 --> 00:20:51.340
like the words have meaning and it understands the meeting and can move sliders of like, well,

00:20:51.340 --> 00:20:54.780
I've seen lots of these machine things. I understand the concept of gears and this metal and this,

00:20:54.780 --> 00:21:00.720
like the shiny texture and then the fur texture and like, they're very good at texture. It's a,

00:21:00.720 --> 00:21:04.960
yeah, really great how it interprets all of that just to fit the, you know, the small prompt.

00:21:04.960 --> 00:21:08.960
Yeah. There are other angles of which it's frustrating. Like I want it turned, I want it

00:21:08.960 --> 00:21:13.320
in the back of the picture, not the, no, it's always in the center. One more thing really quick.

00:21:13.320 --> 00:21:19.180
And this leads me into my final thing is, is a GitHub copilot. GitHub copilot is like this in

00:21:19.180 --> 00:21:23.920
your editor, which is kind of insane, right? You can just give it like a comment or a series of

00:21:23.920 --> 00:21:29.720
comments and it will write it. I think chat GDP is maybe more open-ended and more creative, but this

00:21:29.720 --> 00:21:35.260
is, this is also a pretty interesting way to go. I'm a heavy user of copilot. I, if there's a,

00:21:35.320 --> 00:21:40.820
there's a weird crux and I'm like slowly developing like a need to have this in my browser. I was a,

00:21:40.820 --> 00:21:46.400
on a flight recently and was with the internet and copilot wasn't working. And I felt the, like,

00:21:46.400 --> 00:21:50.880
I felt the difference. I felt like I was like walking through mud instead of just like actually

00:21:50.880 --> 00:21:56.520
running a little bit. And I was like, Oh, I've been disconnected from my distributed mind. I am broken

00:21:56.520 --> 00:22:03.180
partially. Yeah. So incredible. So the last part I guess is like, you know, what are the ethics

00:22:03.180 --> 00:22:08.280
of this? Like I went on very positively about mid journey, but how much of that is trained on

00:22:08.280 --> 00:22:16.900
copyright material or there's GitHub copilot. How much of that is trained on GPL based stuff that was

00:22:16.900 --> 00:22:23.020
in GitHub. But when I use it, I don't have the GPL any longer on my code. I might use it on commercial

00:22:23.020 --> 00:22:29.700
code, but just running it through the AI, does that strip licenses or does it not? There's a GitHub

00:22:29.700 --> 00:22:35.200
copilotlitigation.com, which is interesting. I mean, we might be finding out. There's also

00:22:35.200 --> 00:22:41.160
think Getty, I think it's the Getty images. I'm not 100% sure, but I think Getty images is suing

00:22:41.160 --> 00:22:47.480
one of these image generation companies. I can't remember which one I don't maybe mid journey. I

00:22:47.480 --> 00:22:50.160
don't think it's mid journey. I think it's stable diffusion, but anyway, it doesn't really matter.

00:22:50.160 --> 00:22:53.900
Like there's a bunch of things that are pushing back against us. Like, wait a minute,

00:22:53.900 --> 00:22:58.500
where did you get this data? Did you have rights to use this data in this way? And I mean,

00:22:58.500 --> 00:23:02.100
what are your thoughts on this angle of AI these days?

00:23:02.100 --> 00:23:08.220
Yeah. I know it sounds like I don't worry too much about it in either direction. I think I

00:23:08.220 --> 00:23:14.400
believe in personal ethics. I believe in open source things, availability of things,

00:23:14.400 --> 00:23:19.000
because it just sort of like accelerates collective progress. But that said, I also believe in like

00:23:19.000 --> 00:23:24.540
slightly different like social structures to help support people. Like I'm a, I guess,

00:23:24.580 --> 00:23:27.840
a person believer in things like UBI or something like that on that direction.

00:23:27.840 --> 00:23:31.940
So when you combine those, I feel like it, you know, things sort of work out kind of well,

00:23:31.940 --> 00:23:34.440
but when we like, but it is still a thing that like,

00:23:34.440 --> 00:23:38.420
Be Right exists and that there is this sense of ownership and this is my thing. And I wanted to

00:23:38.420 --> 00:23:44.920
put licenses on it. And, I think that this sort of story started presumably that I wasn't really

00:23:44.920 --> 00:23:49.020
having this conversation, but like when the internet came around and search engines happened and like

00:23:49.020 --> 00:23:54.160
Google could just go and pull up your thing from your page and summarize it in a little blob on the

00:23:54.160 --> 00:23:58.880
page is, was that fair? What if it starts, you know, your shop and it allows you to go buy that

00:23:58.880 --> 00:24:04.100
same product from other shops. Like it, I think that the same things are showing up and in the same way

00:24:04.100 --> 00:24:08.860
that the web, like in the internet sort of, it's sort of, it was a large thing, but then it sort of,

00:24:08.860 --> 00:24:12.720
I don't know if it got quieter, but it sort of became in the background. We sort of found new

00:24:12.720 --> 00:24:17.280
systems. It stopped being piracy and CDs and the music industry is going to struggle. And Hey,

00:24:17.340 --> 00:24:21.600
things like Spotify exist and streaming services exist. And like, I don't know what the next way

00:24:21.600 --> 00:24:21.840
is.

00:24:21.840 --> 00:24:24.680
They're doing better than ever basically. Yeah. Yeah. Yeah. So I think it's just evolution.

00:24:24.680 --> 00:24:29.800
And then like the, some things will change and adopt some things will like fall apart and new

00:24:29.800 --> 00:24:33.740
things will be born. I, that's just a great, it's a good time for lots of opportunity, I guess is the

00:24:33.740 --> 00:24:34.780
part that I'm excited about.

00:24:34.780 --> 00:24:39.280
Yeah. Yeah. Yeah. For sure. I think that's definitely true. It probably, you're probably right. It probably

00:24:39.280 --> 00:24:45.120
will turn out to be, you know, old man yells at cloud cloud doesn't care sort of story, you know,

00:24:45.120 --> 00:24:49.820
in the end where it's like, on the other hand, if, if somebody came back and said,

00:24:49.820 --> 00:24:54.940
you know, a court came back and said, you know what, actually anything trained on GPL

00:24:54.940 --> 00:25:01.440
and then you use copilot on it, that's GPL. Like that would have instantly mega effects. Right.

00:25:02.100 --> 00:25:06.880
Yeah. I, yeah. And I guess there's also stuff like the, I don't, I didn't actually read the

00:25:06.880 --> 00:25:09.840
article. I only saw the headline and you know, that's the worst thing to do is to repeat a thing,

00:25:09.840 --> 00:25:14.660
which is a headline. But, there was that Italy thing that I saw about, like, I don't know.

00:25:14.660 --> 00:25:20.200
Yeah. That was really clickbaity, but I didn't get a time to look at it yet. So yeah. You probably

00:25:20.200 --> 00:25:25.400
ask chat to be to summarize for you. If as long as it can be a Bing, I guess, get that updated.

00:25:25.400 --> 00:25:30.820
Yeah. Yeah. Yeah. Yeah. There's a lot of, there's a lot of things playing in that space,

00:25:30.820 --> 00:25:36.700
right? Some different places. Okay. So yeah, very cool. But as a regular user, I would say,

00:25:36.700 --> 00:25:40.580
you know, regardless of kind of how you feel about this, at least this is my viewpoint right now.

00:25:40.580 --> 00:25:45.480
It's like, regardless of how I feel about which side is right in these kinds of disputes,

00:25:45.480 --> 00:25:50.160
this stuff is out of the bag. It's out there and available and it's a tool. And it's like saying,

00:25:50.160 --> 00:25:55.500
you know, I don't want to use spell check or I don't want to use some kind of like code checking. I just

00:25:55.500 --> 00:26:00.100
want to write like in straight notepad because it's pure, right? Like sure you could do that,

00:26:00.100 --> 00:26:05.320
but there's these tools that will help us be more productive and it's better to embrace them and know

00:26:05.320 --> 00:26:10.400
them than to just like yell at them, I suppose. Yeah. A lot of accelerant you can get.

00:26:10.400 --> 00:26:15.520
really speed up whatever you want to get done. Yeah, absolutely. All right. So speaking of

00:26:15.520 --> 00:26:22.040
speeding up things, let's talk pandas and not even my artificial pandas, but actual programming pandas

00:26:22.040 --> 00:26:30.600
with this project that you all have from approximate. Yeah. Approximate labs called sketch. So sketch is

00:26:30.600 --> 00:26:35.980
pretty awesome. Sketch is actually why we're talking today because I first talked about this on Python

00:26:35.980 --> 00:26:41.780
bytes and I saw this was sent over there by Jake Furman and to me and said, you should check this

00:26:41.780 --> 00:26:49.720
thing out. It's awesome. And, yeah, it's pretty nuts. So tell us about sketch. Yeah. So, even

00:26:49.720 --> 00:26:54.540
though I use a copilot as I sort of described already, and it's become a crux I found in Jupyter

00:26:54.540 --> 00:27:00.720
notebooks when I wanted to work with data, it just didn't, it doesn't actually apply that. So on one side,

00:27:00.720 --> 00:27:04.920
it was sort of like missing the mark at times. And so it was sort of like, how can I get this

00:27:04.920 --> 00:27:09.260
integrated into my flow? The way I actually work in a Jupyter notebook, if maybe I'm working a Jupyter

00:27:09.260 --> 00:27:13.280
notebook on a remote server and I don't want to set up VS Code to do it. So I don't have copilot at all.

00:27:13.280 --> 00:27:17.080
Like there's a bunch of different reasons that I was just like in Jupyter. It's a very different IDE

00:27:17.080 --> 00:27:21.640
experience. It is. Yeah. It's super different, but also you might want to ask questions about the data,

00:27:21.640 --> 00:27:26.720
not the structure of the code that analyzes the data, right? Exactly. Yeah. And so just a bunch of that

00:27:26.720 --> 00:27:31.680
type of stuff. And then also at the other side, I was trying to find something that I could throw together

00:27:31.680 --> 00:27:38.400
that I thought was strong demonstration of the value approximate labs is trying to chase, but wouldn't

00:27:38.400 --> 00:27:42.420
take me too much time to make. So it was a, oh, I could probably just go throw this together pretty quickly.

00:27:42.420 --> 00:27:48.680
I bet this is going to be actually useful and helpful. And so let's just do that. And so through on top of

00:27:48.680 --> 00:27:54.060
the actual library I was using, it was sketch. I put this on it and then shift it. So sort of shifted what the

00:27:54.060 --> 00:27:59.000
project was. Yeah. Yeah. So you also have this other project called Lambda Prompt. And so were

00:27:59.000 --> 00:28:03.820
you trying to play around Lambda Prompt and then like see what you could kind of apply here to leverage

00:28:03.820 --> 00:28:11.440
it? Or is that the full journey I can get into is started with data sketches. I left my last job

00:28:11.440 --> 00:28:17.600
to chase bringing the algorithm, like combining data sketches with AI, but just like the vague,

00:28:17.600 --> 00:28:22.900
like at that level. Tell us what data sketches is real quick. Sure. Yeah. So a data sketch is a

00:28:22.900 --> 00:28:27.740
probabilistic aggregation of data. So if you have, I think the most common one that people have heard of

00:28:27.740 --> 00:28:32.340
is hyperloglog and it's used to estimate cardinality. So estimate the number of unique

00:28:32.340 --> 00:28:39.600
values in a column. A data sketch is a class of algorithms that all sort of like use roughly fixed

00:28:39.600 --> 00:28:46.640
width in binary, usually representations. And then in a single pass, so their ON will look at each row

00:28:46.640 --> 00:28:52.180
and hash the row and then update the sketch or not necessarily hash, but they update this sketch

00:28:52.180 --> 00:28:57.080
object. Essentially. Those sketch objects also have another property that they are mergeable. So you

00:28:57.080 --> 00:29:03.060
have this like really fast ON to go bring that like to aggregate up and you get this mergeability. So you

00:29:03.060 --> 00:29:09.520
can map reduce it in, you know, trivial speeds. The net result is that this like tight binary packed

00:29:09.520 --> 00:29:15.520
object can be used to approximate measures you were looking for on the original data. So you could look

00:29:15.520 --> 00:29:21.360
at, if you do a few of these, they're like theta sketches, you can go and estimate not just the

00:29:21.360 --> 00:29:25.820
unique count, but you can also estimate if this one column would join well with this other column,

00:29:25.820 --> 00:29:30.840
or you can estimate, Oh, if I were to join this column to this column, then this third column that

00:29:30.840 --> 00:29:35.700
was on that other table would actually be correlated to this first column over here. So you get these,

00:29:35.920 --> 00:29:40.000
a bunch of different distributions, you get a whole bunch of these types of properties.

00:29:40.000 --> 00:29:44.880
And each sketch is sort of just, I would say, algorithmically engineered, like very, very

00:29:44.880 --> 00:29:50.580
engineered to be like information theory optimal at solving one of those like measures on the data.

00:29:50.580 --> 00:29:53.740
And so tight packed binary representations.

00:29:53.740 --> 00:29:57.880
All right. So you thought about, well, that's cool, but chat CTP is cool too.

00:29:57.880 --> 00:29:58.760
Yeah.

00:29:58.760 --> 00:29:59.800
What else?

00:30:00.220 --> 00:30:07.080
The core thing was, so those representations aren't usable by AI right now. And when you actually go and

00:30:07.080 --> 00:30:13.160
use GPT three or something like this, you have to figure out a way to build the prompt to get it to do

00:30:13.160 --> 00:30:18.460
what you want. This was especially true in a pre instruction tuning world, you had to really like, you had to

00:30:18.460 --> 00:30:23.360
play the prompt engineer role even more than you have to now. Now you could sort of get away with describing it to

00:30:23.360 --> 00:30:28.880
ChatGPT. And one of the things that you really have to like, play the game of is how do you get all the

00:30:28.880 --> 00:30:35.120
information it's going to need into this prompt in a succinct, but good enough way that it helps it do

00:30:35.120 --> 00:30:41.920
this. And so what sketch was about was, rather than just looking at the context of the data, like the

00:30:41.920 --> 00:30:48.480
metadata, the column names and the code you have, also go get some representations of representation of the

00:30:48.480 --> 00:30:53.400
content of the data, turn that into a string, and then bring that string in as part of the prompt.

00:30:53.400 --> 00:30:59.120
And then when it has that, it should understand much better at actually generating code, generating

00:30:59.120 --> 00:31:03.500
answers to questions. And that's what that sketch was a proof of concept of that, that worked very well.

00:31:03.500 --> 00:31:08.540
It really quickly showed how valuable actual data content context is.

00:31:08.540 --> 00:31:13.300
Yeah, I would say people are resonating with people. It's got 1.5,000 stars on GitHub.

00:31:13.300 --> 00:31:18.220
And it looks about six months old. So that's pretty good growth there.

00:31:18.440 --> 00:31:22.380
Yeah, January 16th was the day I posted it on Hacker News. And it had three,

00:31:22.380 --> 00:31:24.560
there was an empty repo at that point.

00:31:24.560 --> 00:31:33.860
Okay, three stars. It's like me and my friends. Okay, cool. So this is a tool that basically patches

00:31:33.860 --> 00:31:42.660
pandas to add functionality or functions, literally to pandas data frames that allows you to ask

00:31:42.660 --> 00:31:44.500
questions about it, right?

00:31:44.500 --> 00:31:44.800
Yep.

00:31:44.800 --> 00:31:47.760
So what kind of questions can you ask it? What can it help you with?

00:31:47.880 --> 00:31:53.480
Yeah, so there's two classes of questions you can ask, you can ask it, the ask type questions,

00:31:53.480 --> 00:32:00.240
these are sort of from that summary statistics data. So from the general, you know, representation of your

00:32:00.240 --> 00:32:04.060
data, ask it to like, give you answers about it, like, what are the columns here, you sort of have

00:32:04.060 --> 00:32:10.240
a conversation where it sort of understands the general under like shape of the data, general

00:32:10.240 --> 00:32:15.780
distributions, things like that, number of uniques, and like give that context to it, ask questions of that

00:32:15.780 --> 00:32:21.520
system. And then the other one is ask it how to do something. So you specifically can get it to write

00:32:21.520 --> 00:32:25.160
code to solve a problem you have, you describe the problem you want, and you can ask it to do that.

00:32:25.160 --> 00:32:31.200
Right. I've got this data frame, I want to plot a graph of this versus that, but color by the other

00:32:31.200 --> 00:32:31.500
thing.

00:32:31.500 --> 00:32:37.960
Yep. And in the data space world, what I sort of decided to do is like in the demo here is just sort of

00:32:37.960 --> 00:32:43.040
walk through what are some standard things people want to ask of data, like, like, what are those common

00:32:43.040 --> 00:32:49.460
questions that you hear, like, in Slack between, you know, like, business team and an analyst team. And it's just

00:32:49.460 --> 00:32:54.100
sort of like, Oh, can you do this? Can you get me this? Can you tell me if there's any PII? Is this safe to send?

00:32:54.180 --> 00:32:59.460
Can I send the CSV around? Can you clean up this CSV? Oh, I need to load this into our catalog. Can you

00:32:59.460 --> 00:33:04.000
describe each of these columns and check the data types all the way to can you actually go get me

00:33:04.000 --> 00:33:05.360
analytics or plot this?

00:33:05.360 --> 00:33:13.040
Yeah. Awesome. So and it plugs right into Jupyter Notebooks, so you can just import it and basically

00:33:13.040 --> 00:33:18.980
installing Sketch, which is a pip or Conda type thing, and then you just import it, and it's good to go,

00:33:19.040 --> 00:33:24.660
right? Yep. Using the Pandas extensions API, which allows you to essentially hook into their data

00:33:24.660 --> 00:33:31.560
frame callback and register a, you know, a function. Interesting. So it's not as jammed on from the

00:33:31.560 --> 00:33:35.660
outside. It's a little more, plays a little nicer with Pandas rather than just like, we're going to go

00:33:35.660 --> 00:33:41.900
to the class and just tap on it. Yeah, yeah. I, yeah. Not full monkey patching here. It's a,

00:33:41.900 --> 00:33:46.760
it's like hack supported, I think. I don't, I don't see it used often, but it is somewhere in the docs.

00:33:46.900 --> 00:33:52.700
Excellent. But here it is. So what I wanted to do for this is there's a, an example that you can do,

00:33:52.700 --> 00:33:57.260
like if you go to the repo, which obviously I'll link to, there's a video, which I mean,

00:33:57.260 --> 00:34:02.140
mad props to you because I review so many things, especially for the Python Bytes podcast, where

00:34:02.140 --> 00:34:06.940
there's a bunch of news items and new things we're just going to check out. And we'll, we'll find people

00:34:06.940 --> 00:34:13.440
recommending GUI frameworks that haven't, not a single screenshot or other types of things. Like,

00:34:13.680 --> 00:34:17.700
I have no way to judge whether this thing even might look like that. What does it even make?

00:34:17.700 --> 00:34:21.940
I don't even know, but somebody put a lot of effort, but they didn't bother to post an image. And you

00:34:21.940 --> 00:34:27.580
posted a minute and a half animation of it going through this process, which is really, really

00:34:27.580 --> 00:34:34.480
excellent. So people can go and watch that one minute, one minute 30 video. But there's also a

00:34:34.480 --> 00:34:41.120
collab opening Google collab, which gives you a running interactive variant here. So you can just

00:34:41.480 --> 00:34:46.640
follow along, right? And play these pieces requires me to sign up on and run it. That's okay.

00:34:46.640 --> 00:34:51.620
Let me talk people through some of the things it does. And you can tell me what it's doing,

00:34:51.620 --> 00:34:57.580
how it's doing that, like how people might find that advantageous. So import sketch, import pandas

00:34:57.580 --> 00:35:03.440
as PD standard. And then you can say pandas read CSV and you give it one from a, like a,

00:35:03.440 --> 00:35:08.860
some example CSV that you got on your, one of your GitHub repos, right? Or in your account.

00:35:08.860 --> 00:35:12.340
Yeah. I found one online and then added just random synthetic data to it.

00:35:12.340 --> 00:35:14.380
Yeah. Like, Oh, here's a data dump. No, just kidding.

00:35:14.380 --> 00:35:21.780
So then you need to go to that data frame called sales data. You say dot sketch dot ask as a string,

00:35:21.780 --> 00:35:28.000
what columns might have PII personal identifying information in them?

00:35:28.000 --> 00:35:33.920
Awesome. And so it comes, tell me how that works and what it's doing here.

00:35:33.920 --> 00:35:40.240
So it does, I guess it has to build up the prompt, which is sent to GPT. So to open AI specific

00:35:40.240 --> 00:35:44.680
completion endpoint, the building up the prompt, it looks at the data frame. It does a bunch of

00:35:44.680 --> 00:35:50.740
summarization stats on it. So it calculates uniques and sums and things like that. There's two modes in

00:35:50.740 --> 00:35:55.280
the backend that either does sketches to do those, or it just uses like DF dot describe type stuff.

00:35:55.340 --> 00:36:00.880
And then it pulls those summary stats together for all the columns, throws it together with my,

00:36:00.880 --> 00:36:05.540
the rest of the prompt I have, you can, we can go find it, but then it sends that prompt.

00:36:05.540 --> 00:36:10.740
Actually, it also grabs some information off of inspect. So it sort of like walks the,

00:36:10.740 --> 00:36:15.480
the stack up to go and check the variable name because the data frame is named sales data.

00:36:15.700 --> 00:36:20.300
So it actually tries to go find that variable name in your call stack so that it can, when it writes

00:36:20.300 --> 00:36:25.740
code, it writes valid code, puts all that together, send it off to open AI, gets code back, uses Python

00:36:25.740 --> 00:36:30.760
AST to parse it, check that it's valid. If it's not valid Python code, or you tried to import something

00:36:30.760 --> 00:36:36.840
that you don't have, it will ask it to rewrite once. So this is sort of like an iterative process. So it

00:36:36.840 --> 00:36:41.240
takes the error or it takes the thing and it sends it back to open AI. It's like, Hey, fix this code.

00:36:41.360 --> 00:36:45.860
And then it, or in this case, sorry, ask, it actually just takes this, it sends that exact

00:36:45.860 --> 00:36:51.060
same prompt, but it just changes the last question to, can you answer this question off of the information?

00:36:51.060 --> 00:36:58.680
This portion of talk Python me is brought to you by us over at Talk Python Training with our courses.

00:36:58.680 --> 00:37:05.000
And I want to tell you about a brand new one that I'm super excited about. Python web apps that fly

00:37:05.000 --> 00:37:11.080
with CDNs. If you have a Python web app, you want it to go super fast. Static resources,

00:37:11.280 --> 00:37:17.380
turn out to be a huge portion of that equation. Leveraging a CDN could save you up to 75% of your

00:37:17.380 --> 00:37:24.480
server load and make your app way faster for users. And this course is a step-by-step guide on how to do

00:37:24.480 --> 00:37:30.560
it. And using the CDN to make your Python apps faster is way easier than you think. So if you've

00:37:30.560 --> 00:37:35.800
got a Python web app and you would like to have it scaled out globally, if you'd like to have your users

00:37:35.800 --> 00:37:41.120
have a much better experience and maybe even save some money on server hosting and bandwidth,

00:37:41.200 --> 00:37:46.580
check out this course over at talkpython.fm/courses. It'll be right up there at the top.

00:37:46.580 --> 00:37:51.300
And of course the link will be in your show notes. Thank you to everyone who's taken one of our courses.

00:37:51.300 --> 00:37:54.400
It really helps support the podcast. I'm back to the show.

00:37:56.300 --> 00:38:02.220
And so that sounds very, very similar to my arrow program. Rewrite it with garden clauses, redo it.

00:38:02.220 --> 00:38:07.400
Like you kind of, I gave you this data in this code and I asked you this question and you can have a

00:38:07.400 --> 00:38:10.200
little conversation, but at some point you're like, all right, well, we're going to take what it gives me

00:38:10.200 --> 00:38:12.280
after a couple of rounds at it. Right.

00:38:12.480 --> 00:38:18.020
Yeah. I take the first one that doesn't, that like passes an import check and passes AST linting.

00:38:18.020 --> 00:38:23.840
There was a, the, when you use small models, you run into not valid Python a lot more, but with these

00:38:23.840 --> 00:38:25.160
ones, it's almost always good.

00:38:25.160 --> 00:38:30.520
It's ridiculous. Yeah. Yeah. Yeah. It's crazy. Okay. So it says the columns that might have PII

00:38:30.520 --> 00:38:37.780
in them are credit card, SSN and purchase address. Okay. That's pretty excellent. And then you say,

00:38:37.780 --> 00:38:44.440
all right, sales data dot sketch dot ask. Can you give me friendly name to each column and output this

00:38:44.440 --> 00:38:51.500
as an HTML list, which is parsed as HTML and rendered in Jupyter notebook accurately. Right. So it says

00:38:51.500 --> 00:38:52.500
index. Well, that's an index.

00:38:52.500 --> 00:38:53.560
This one ends up being the same.

00:38:53.560 --> 00:38:57.940
It's not a great, this one is not a great example because it doesn't have to like infer

00:38:57.940 --> 00:39:04.660
because the names are like order space date, right? Instead of order, like maybe lowercase

00:39:04.660 --> 00:39:09.880
O and then like attached a big D or whatever, but it'll give you some more information. You

00:39:09.880 --> 00:39:13.240
can like kind of ask it questions about the type of data, right?

00:39:13.240 --> 00:39:17.560
Yeah, exactly. I found this is really good at if you play the game and you just name all

00:39:17.560 --> 00:39:21.280
your columns, like call one, call two, call three, call four, and you ask it, give me new column

00:39:21.280 --> 00:39:24.000
names for all of these. It gives you something that's pretty reasonable based off of the data.

00:39:24.000 --> 00:39:24.860
So pretty useful.

00:39:24.860 --> 00:39:28.380
Okay. So it's like, oh, these look like addresses. So we'll call that address. And this looks like

00:39:28.380 --> 00:39:31.880
social security numbers and credit scores and whatnot.

00:39:31.880 --> 00:39:35.260
Yep. Yep. So it can really help with that quick first onboarding step.

00:39:35.260 --> 00:39:39.540
Yeah. So everyone heard it here first. Just name all your columns. One, two, three, four,

00:39:39.660 --> 00:39:48.820
and then just get help. Like AI, what do we call these? All right. So the next thing you did in this

00:39:48.820 --> 00:39:54.420
demo notebook was you said sales data dot sketch dot. And this is different before I believe,

00:39:54.420 --> 00:40:01.220
because before you were saying ask, and now you can say how to create some derived features from the,

00:40:01.220 --> 00:40:03.680
from the address. Tell us about that.

00:40:03.680 --> 00:40:07.820
Yeah. This is the one that actually is the code writing. It's essentially the exact same prompt,

00:40:07.820 --> 00:40:13.740
but the change is the very end. It says like, return this as Python code that you can execute to do

00:40:13.740 --> 00:40:17.800
this. So instead of answering the question directly, answer the question with code that will answer the

00:40:17.800 --> 00:40:18.040
question.

00:40:18.040 --> 00:40:23.120
Right. Write a Python line of code that will answer this question given this data, something like that.

00:40:23.120 --> 00:40:27.280
Yep. Yep. Something like that. I don't remember exactly anymore. It's been a while, but yeah,

00:40:27.420 --> 00:40:32.040
some I've iterated a little bit until it started working and I was like, okay, cool. And so,

00:40:32.040 --> 00:40:37.540
uh, ask it for that. And then it spits back code. And that was, it sort of, it sounds overly simple,

00:40:37.540 --> 00:40:42.020
but that was it. That was like, that was the moment. And I was just like, oh, I could just ask it to do my

00:40:42.020 --> 00:40:45.780
analytics for me. And it's just all the, every other feature just sort of became like apparently

00:40:45.780 --> 00:40:50.520
solvable with this. And the more I played with it, the more it was just, I don't have to think about,

00:40:50.520 --> 00:40:55.180
I don't even have to go to Google or stack overflow to ask the question, to get the API stuff for me.

00:40:55.280 --> 00:40:59.320
I could, from zero to I have code that's working is one step in Jupyter.

00:40:59.320 --> 00:41:04.200
So you wrote that how to, and you gave it the question and then it wrote the lines of code and you just

00:41:04.200 --> 00:41:09.860
drop that into the next cell and just run it. Right. And so for example, in this example, it said, well,

00:41:09.860 --> 00:41:16.520
we can come up with city state and zip code and by writing a vector transform by passing a lambda,

00:41:16.520 --> 00:41:21.300
that'll pull out, you know, the city from the string that was the full address and so on. Right.

00:41:21.300 --> 00:41:22.260
Yeah. That's pretty neat.

00:41:22.400 --> 00:41:26.240
Yeah. It's fun to see what it, what it does. Not again, not any of these things are always

00:41:26.240 --> 00:41:30.620
probabilistic, but it also usually serves as a great starting point if, even if it doesn't get it

00:41:30.620 --> 00:41:30.760
right.

00:41:30.760 --> 00:41:35.200
Yeah. Sure. You're like, oh, okay. I see. Maybe that's not exactly right. Cause we have Europeans

00:41:35.200 --> 00:41:40.620
in their city, maybe in their zip code or in different orders sometimes, but it gives you

00:41:40.620 --> 00:41:44.380
something to work with pretty quickly. Right. By asking just a, what can I do?

00:41:44.380 --> 00:41:48.440
And then another one, this one's a little more interesting instead of just saying like, well,

00:41:48.440 --> 00:41:53.340
what other things can we pull out? It's like, this gets towards the analytics side, right? It says,

00:41:53.340 --> 00:42:01.080
get the top five grossing states for the sales data. Right. And it writes a group by some sorts,

00:42:01.080 --> 00:42:05.800
and then it does a head given five. And that's pretty neat. Tell us about this. I mean, I guess

00:42:05.800 --> 00:42:08.380
it's about the same, right? Just ask more questions.

00:42:08.380 --> 00:42:13.980
They all feel pretty similar to me. I think, I guess I could jump towards like things that

00:42:13.980 --> 00:42:18.980
I wanted to put next, but I didn't, we're not reliable enough to like really make the cut.

00:42:18.980 --> 00:42:24.900
I wanted to have it go like in my question was like, go build a model that predicts sales for the

00:42:24.900 --> 00:42:31.160
next six months and then plot it on a 2d plot with a dotted line for the predicted plot. And like,

00:42:31.160 --> 00:42:36.180
it would try, but it would always do something off. And I found I always had to break up the

00:42:36.180 --> 00:42:40.300
like prompted to like smaller, smaller intern level code back. Yeah.

00:42:40.300 --> 00:42:48.620
Yeah. It was fun getting it to train models, but it was also its own like separate thing. I sort of

00:42:48.620 --> 00:42:54.800
didn't play with too much. And there's another part of sketch that I guess is not in this notebook. I

00:42:54.800 --> 00:43:00.340
didn't realize. Yeah. Because you have to use the open AI API key, but it's the sketch apply. And

00:43:00.340 --> 00:43:07.540
that's the, I'll say this one is another just like power tool. This one has like, I don't really talk

00:43:07.540 --> 00:43:11.860
about, I don't even include it in the video because it's not just like as plug and play, you do have to

00:43:11.860 --> 00:43:15.840
go set an environment variable. And so it's like, yeah, that's one step further than I want to,

00:43:15.840 --> 00:43:22.720
I don't, it's not terrible, but it's a step. And so what it does is it lets you apply a completion

00:43:22.720 --> 00:43:28.880
endpoint of whatever your design row wise. So every single row, you can go and apply and run something.

00:43:29.020 --> 00:43:35.080
So if every row of your pandas data frame is a, some serialized text from a PDF or something,

00:43:35.080 --> 00:43:39.560
or a file in your directory structure, and you just load it as a data frame, you can do dot

00:43:39.560 --> 00:43:45.240
df.sketch.apply. And it's almost the exact same as df.apply. But the thing you put in as your function

00:43:45.240 --> 00:43:51.620
is now just a Jinja template that will fill in your column variables for that row and then ask GPT to

00:43:51.620 --> 00:43:58.240
continue completing. So I think I did silly ones, like here's a few states. And then the prompt is

00:43:58.240 --> 00:44:04.300
extract the state for it. Or so I think, right, extract the capital of the state. Yeah. Yeah. So

00:44:04.300 --> 00:44:10.200
just pure information extraction from it, but you can sort of like this grows into a lot more.

00:44:10.200 --> 00:44:15.540
So does that come out of the data? Or is that coming out of open AI where like it sees what is

00:44:15.540 --> 00:44:19.780
the capital of state and it sees New York? It's like, okay, well, all right, Albany.

00:44:20.300 --> 00:44:25.160
Yeah. So this is purely extracting out of the model weights. Essentially, this is not like a factual

00:44:25.160 --> 00:44:29.680
extraction. So this is probably a bad example because it's like it. But the thing that actually,

00:44:29.680 --> 00:44:34.300
actually, the better example I did once was, what is like some interesting colors that are

00:44:34.300 --> 00:44:38.820
good for each state? And it like just came up with a sort of like flaggish colors or sports team colors.

00:44:38.820 --> 00:44:43.780
That was sort of fun when it wrote that as hex. You can also do things like if you have a large text

00:44:43.780 --> 00:44:47.500
document or you can actually, I'll even do the more common one that I think everybody actually wants

00:44:47.500 --> 00:44:52.820
is you have messy data. You have addresses that are like syntactically messy and you can say,

00:44:52.820 --> 00:44:57.760
normalize these addresses to be in this form. And you sort of just write one example. It's a run

00:44:57.760 --> 00:45:03.780
dot apply and you get a new column that is that cleaned up data. Yeah. Incredible. Okay. A couple

00:45:03.780 --> 00:45:11.060
things here. It says I can use, you can directly call open AI and not use your endpoint. So at the

00:45:11.060 --> 00:45:16.820
moment it kind of proxies through web service that you all have that somehow checks stuff or what does

00:45:16.820 --> 00:45:21.640
that do? Yeah. It was just a pure ease of use. I wanted people to be able to do pip install and

00:45:21.640 --> 00:45:28.280
import sketch and actually get it because I know how much I use things in, in a collab or in Jupyter

00:45:28.280 --> 00:45:32.780
notebooks on weird machines and remembering an environment variable, managing secrets. It's like

00:45:32.780 --> 00:45:38.340
this whole overhead that I want to deal with. And so I wanted to just offer a lightweight way if you

00:45:38.340 --> 00:45:42.640
just want to be able to use it. But I know that that's not sufficient for secure. If people are going

00:45:42.640 --> 00:45:46.680
to be conscious of this things and want to be able to, you know, not go through my proxy thing that's

00:45:46.680 --> 00:45:47.760
there for help. So sure.

00:45:47.760 --> 00:45:48.300
Offer this up.

00:45:48.300 --> 00:45:53.960
What's next? Do you have a roadmap for this? Are you happy where it is and you're just letting it be or

00:45:53.960 --> 00:45:55.500
do you have grand plans?

00:45:55.500 --> 00:46:00.120
I don't have much of a roadmap for this right now. I'm actually, I guess there's like grand roadmap,

00:46:00.120 --> 00:46:05.040
which is like at the company scale, what we're working on. I would say that if this, we're really trying to

00:46:05.040 --> 00:46:11.760
solve data and with AI just in general. And so these are the types of things we hope to open source and

00:46:11.760 --> 00:46:16.220
just give out there, like actually everything we're hoping to open source. But the starting place is

00:46:16.220 --> 00:46:20.360
going to be a bunch of these like smaller toolkits or just utility things that hopefully save people

00:46:20.360 --> 00:46:26.840
time or very useful. The grand thing we're working towards, I guess, is this more like the, it's the

00:46:26.840 --> 00:46:31.200
full automated data stack. It's like the dream I think that people have wanted where you just ask it

00:46:31.200 --> 00:46:36.540
questions and it goes and pulls the data that you need. It cleans it. It builds up the full pipeline.

00:46:36.540 --> 00:46:40.700
It executes the pipeline. It gets you to the result and it shows you the result. And you look,

00:46:40.700 --> 00:46:45.740
you can inspect all of that, that whole DAG and say, yes, I trust this. So we're working on getting

00:46:45.740 --> 00:46:46.820
full end to end.

00:46:46.820 --> 00:46:51.820
So when I went and asked about that Arrow program, I said, I think this will still do it. I think this

00:46:51.820 --> 00:46:58.080
will probably work again. And it did, which is awesome. Just the way I expected. But, you know,

00:46:58.820 --> 00:47:05.440
AI is not as deterministic as read the number seven. If seven is less than eight, do this,

00:47:05.440 --> 00:47:11.480
right? Like what is the repeatability? What is the sort of experience of doing this? Like I ran it,

00:47:11.480 --> 00:47:16.340
I ran it again. Is it going to be pretty much the same or is it going to have like, what's the mood

00:47:16.340 --> 00:47:18.760
of the AI when it gets to you?

00:47:18.760 --> 00:47:22.760
This is sort of a parameter you can, there's a little bit of a parameter you can set if you want

00:47:22.760 --> 00:47:26.620
to play that game with the temperature parameter on these models at higher and higher temperatures,

00:47:26.620 --> 00:47:31.920
you get more and more random, but it can also truly be out of left field random if you go too

00:47:31.920 --> 00:47:32.420
high temperature.

00:47:32.420 --> 00:47:34.740
Okay. But you get maybe more creative solutions.

00:47:34.740 --> 00:47:38.320
Yeah. You could sometimes get that. And as you move towards zero, it gets more and more

00:47:38.320 --> 00:47:43.900
deterministic. Unfortunately for really trying to do some like good provable, like sort of like

00:47:43.900 --> 00:47:48.260
build chain type things with like hashing and caching and stuff. It's not fully deterministic,

00:47:48.360 --> 00:47:53.360
even at zero temperature, but that's just, I think it's worth thinking about, but at the same time,

00:47:53.360 --> 00:47:59.120
run it once, see the answers that it gives you comment that business out and just like put that

00:47:59.120 --> 00:48:05.420
as markdown, you know, freeze it. It like us memorialize it in markdown because you don't need

00:48:05.420 --> 00:48:10.440
to ask it over and over what columns have PII. Like, well, probably the same ones as last time.

00:48:10.700 --> 00:48:15.080
We're just kind of like, right, these columns, credit card, social security and purchase address,

00:48:15.080 --> 00:48:20.380
they have, have that. And so now, you know, right. Is that a reasonable way to think about it?

00:48:20.380 --> 00:48:24.020
I think, yeah, if you, if you want to get determinism or the performance is a thing that

00:48:24.020 --> 00:48:28.440
you're worried about, yeah, you can always cash. I think however you do it, comments or actually

00:48:28.440 --> 00:48:28.940
with systems.

00:48:28.940 --> 00:48:35.040
Sure. Sure. Sure. Sure. Or that like, like, how do I, you know, how do I do that group by sorting

00:48:35.040 --> 00:48:38.700
business? Like you don't have to ask that over and over once it gives you the answer.

00:48:38.700 --> 00:48:43.360
Yeah. Yeah. My workflow, when I use sketch, definitely I asked the question, I copy the

00:48:43.360 --> 00:48:47.220
code and then I go delete the question or ask it a different question for my next problem that I have.

00:48:47.220 --> 00:48:53.300
Yeah. I like, it's not code that it is a little bit like a vestigial when it, when you like save

00:48:53.300 --> 00:48:57.140
your notebook at the end and you sort of want to go back and delete all the questions you asked because

00:48:57.140 --> 00:49:01.500
you don't need to rerun it when you actually just go to execute the notebook later. But yeah,

00:49:01.500 --> 00:49:04.940
that makes a lot of sense. And plus you look smarter if you don't have to show how you got

00:49:04.940 --> 00:49:05.380
the answers.

00:49:05.380 --> 00:49:07.960
Look at this beautiful code that's even commented.

00:49:07.960 --> 00:49:12.480
Yeah, exactly. I guess you could probably ask it to comment your code, right?

00:49:12.480 --> 00:49:17.220
Yeah. You can ask it to describe. There's been some really cool things where people will throw

00:49:17.220 --> 00:49:20.880
like assembly at it and ask it to translate to different like languages so they can interpret

00:49:20.880 --> 00:49:26.300
it. Or you could do really fun things like cross language, cross, I guess I'll say like levels

00:49:26.300 --> 00:49:30.120
of abstraction. You could sort of ask it to describe it like at a very top level, or you can get really

00:49:30.120 --> 00:49:34.120
precise, like for this line, what are all the implications if I change a variable or something like that?

00:49:34.340 --> 00:49:39.000
Yeah, that's really cool. I suppose you could do that here. Can you can you converse with it? You

00:49:39.000 --> 00:49:43.480
can say, okay, you gave me this. Does it I guess what's the word? Does it have like tokens in context

00:49:43.480 --> 00:49:50.020
like chat HTTP does? Can you say, okay, that's cool. But but I want as integers, not as strings.

00:49:50.440 --> 00:49:55.060
I don't know. Yeah, I did. I did not include that in this. There was a version that had something like

00:49:55.060 --> 00:50:00.000
that, where I was sort of just keeping the last few calls around. But it quickly became it didn't align

00:50:00.000 --> 00:50:04.680
with the Jupyter IDE experience, because you end up like scrolling up and down. And it you have too much

00:50:04.680 --> 00:50:10.040
power over how you execute in a Jupyter notebook. So your context can change dramatically by just scrolling

00:50:10.040 --> 00:50:16.500
up. And trying to via inspect look across different like, across a Jupyter notebook is just a whole

00:50:16.500 --> 00:50:20.260
other nightmare. So yeah, I didn't try and like extract the code out of the notebook so that it

00:50:20.260 --> 00:50:23.920
could understand the local context. You could go straight to chat HTTP or something like that,

00:50:23.920 --> 00:50:25.980
take what it gave you and start asking it questions.

00:50:26.200 --> 00:50:32.440
Okay, so another question that I had here about this. So in order for that to do its magic,

00:50:32.440 --> 00:50:37.380
like you said, the really important thought or breakthrough or idea you had was like, not just

00:50:37.380 --> 00:50:41.560
the structure of the pandas code or anything like that, but also a little bit about the data.

00:50:41.560 --> 00:50:47.960
What is the privacy implications of me asking this question about my data? Suppose I have

00:50:47.960 --> 00:50:55.820
super duper secret CSV. And should I not ask or how to on it? Or what is the story there?

00:50:55.820 --> 00:51:02.540
What's the, if I work with data, how much sharing do I do of something I might not want to share if I

00:51:02.540 --> 00:51:04.100
ask a question about it?

00:51:04.100 --> 00:51:09.340
I'd say the same discretion you'd use if you would copy like a row or a few rows of that data into

00:51:09.340 --> 00:51:12.440
a, into ChatGPT to ask it a question about it.

00:51:12.440 --> 00:51:12.640
Okay.

00:51:12.640 --> 00:51:19.460
Is the level of concern I guess you should have like on the specifically, I am not storing these

00:51:19.460 --> 00:51:24.620
things, but I know is at least it was, it seems like they're going to start getting towards like a 30

00:51:24.620 --> 00:51:28.640
day thing. But, so there's a little bit of, yeah, I mean, you're sending your stuff over the

00:51:28.640 --> 00:51:32.980
wire, like over network, if you do this and to use these language models until they come local,

00:51:32.980 --> 00:51:37.540
until these things like llama and alpaca get good enough that they're, yeah, they're going to be

00:51:37.540 --> 00:51:41.160
remote. Actually, that could be a fun, sorry. I just now thought that could be a fun thing. Like

00:51:41.160 --> 00:51:45.100
just go get alpaca working with a sketch so that it can be fully local.

00:51:45.340 --> 00:51:48.100
Interesting. Like a privacy preserving type of deal.

00:51:48.100 --> 00:51:52.200
Yeah. I hadn't actually, yeah, that's the, that's the power of these, smaller models that are

00:51:52.200 --> 00:51:56.060
almost good enough. I could probably just like quickly throw that in here and see if it,

00:51:56.060 --> 00:51:57.900
yeah, maybe it's a wider audience.

00:51:57.900 --> 00:52:04.060
You have a option to not get through your API, but directly go to open AI. You could have another

00:52:04.060 --> 00:52:07.340
one to pick other, other options, right? Potentially.

00:52:07.340 --> 00:52:14.040
Yep. Yep. Yep. The, interface to these, one thing that I think is not, maybe it's talked

00:52:14.040 --> 00:52:17.980
about it more than other places, but I haven't heard as much like excitement about it is that these,

00:52:17.980 --> 00:52:23.440
uh, the APIs have gotten pretty nice for this whole space. the, they're all like the idea of a

00:52:23.440 --> 00:52:28.160
completion endpoint is pretty straightforward. You send it some amount of text and it will continue

00:52:28.160 --> 00:52:32.980
that text. And it's such a, it's so simple, but it's so generalizable. You could build so many

00:52:32.980 --> 00:52:38.120
tools off of just that one API endpoint essentially. And so combine that with an embedding endpoint and

00:52:38.120 --> 00:52:41.900
you sort of have all you need to, to make complex AI apps.

00:52:41.900 --> 00:52:48.120
It's crazy. Speaking of making AI apps, maybe touch a bit on your, your other projects, Lambda.

00:52:48.120 --> 00:52:50.600
So yeah, Lambda. Yeah.

00:52:50.600 --> 00:52:56.760
But before you get into it, mad props for like Greek letter, like that's a true physicist or

00:52:56.760 --> 00:52:59.080
mathematician that I can appreciate that there.

00:52:59.080 --> 00:53:02.300
Yeah. That was, I was excited to put it everywhere, but then of course,

00:53:02.300 --> 00:53:08.420
these things don't playing, playing games with character sets and websites. I'm the one that

00:53:08.420 --> 00:53:13.100
causes, I both feel the pain, have to clean the data that I also put into these systems.

00:53:13.100 --> 00:53:17.780
So yeah. Yeah. People are like a prompt and why is the a so italicized? I don't get it.

00:53:17.780 --> 00:53:24.780
Yeah. Okay. Yeah. So yeah. Yeah. So this one came, I was, working with, this is pre GPT.

00:53:25.140 --> 00:53:29.080
This is October. I guess it was right around ChatGPT coming out like around that time.

00:53:29.080 --> 00:53:32.480
But I was, I was really just messing around a lot with, completion endpoints as we were talking.

00:53:32.480 --> 00:53:38.140
And I kept rewriting the same request boiler over and over. And then I also kept rewriting

00:53:38.140 --> 00:53:43.080
f-strings that I was trying to like send in. And I was just like, ah, Jinja templates solved this

00:53:43.080 --> 00:53:49.020
already. Like there already is formatting for strings in Python. Let me just use that, compose that into a

00:53:49.020 --> 00:53:53.800
function. And there's, let me call these completion endpoints. I don't want to think of them as like

00:53:53.800 --> 00:53:59.100
API endpoint or RPC is a nice mental model, but I want to use them as functions. I want to be able to

00:53:59.100 --> 00:54:05.160
put decorators on them. I want to be able to use them both async or not async in Python. I want to,

00:54:05.160 --> 00:54:10.480
I just want to have this as a thing that I can just call really quickly with one line and just do

00:54:10.480 --> 00:54:15.620
whatever I need to with it. And so through this together, it's very simple. Like, honestly, I mean,

00:54:15.620 --> 00:54:20.580
like the hardest part was just getting all the layers of, there's actually two things you

00:54:20.580 --> 00:54:26.140
can make a prompt that then, cause I wrap any function as a prompt. So not just, these

00:54:26.140 --> 00:54:32.000
calls to GPT and then I do tracing on it. So as you like get into the call stack, every input and output

00:54:32.000 --> 00:54:38.180
is you can sort of like get hooked into and trace with some like call traces. So there's a bunch of just

00:54:38.180 --> 00:54:42.520
like weird stuff to make the utility nice, but functionally, as you can see here on it's,

00:54:42.620 --> 00:54:48.000
you just import it, you write a Jinja template with the class, and then you use that object that

00:54:48.000 --> 00:54:53.380
comes back as a function and your Jinja template variables get filled in. And your result is the

00:54:53.380 --> 00:54:57.920
text string that comes back out of a GPT. Interesting. And people probably, some people

00:54:57.920 --> 00:55:02.200
might be thinking like Jinja, okay, well I got to create an HTML file and all that, like not just a

00:55:02.200 --> 00:55:08.040
string that has double curlies for turning stuff into like strings within the string, kind of a different

00:55:08.040 --> 00:55:13.420
way to do f-strings as you were hinting at. Yeah. Yeah. There was a two pieces here. I realized as

00:55:13.420 --> 00:55:18.760
I was doing this also, I think I sort of mentioned with a sketch I do. I really often was taking the

00:55:18.760 --> 00:55:24.400
output of a language model prompt, doing something in Python, or actually I can do a full example of

00:55:24.400 --> 00:55:31.580
the SQL writing like exploration we did, but, we would do these things, that were sort of run

00:55:31.580 --> 00:55:38.220
GPT three to ask it to write the SQL. You take the SQL, you go try and execute it, but it fails for

00:55:38.220 --> 00:55:42.920
whatever reason, or you, and you take that error, you say, Hey, rewrite it. So we talked about that

00:55:42.920 --> 00:55:46.700
sort of pattern, which is sort of like rewriting. Another one of the patterns was increase the

00:55:46.700 --> 00:55:52.200
temperature, ask it to write the SQL. You get like 10 different SQL answers in parallel. And this is where

00:55:52.200 --> 00:55:57.100
the async was like really important for this. Cause I just wanted to use asyncio gather and run all 10

00:55:57.100 --> 00:56:02.020
of these truly in parallel against the open eye endpoint, get 10 different answers to the SQL,

00:56:02.020 --> 00:56:08.020
run all 10 queries against your database, then pull on what the most common, like of the ones that

00:56:08.020 --> 00:56:12.820
successfully ran, which ones gave the same answer the most often, then that's probably the correct

00:56:12.820 --> 00:56:19.260
answer. And, just chaining that stuff. It's like very pythonic functions. Like you can really

00:56:19.260 --> 00:56:22.760
just imagine like, Oh, I just need to write a for loop. I just need to run this function, take the

00:56:22.760 --> 00:56:28.460
output feed into another function, very procedural. But when you, all the abstractions in the

00:56:28.460 --> 00:56:34.680
open at open AI API, the things like just everything else, there was nothing else really at the time,

00:56:34.680 --> 00:56:38.320
but even the new ones that have come out like Lang chain that have sort of like taken the space by

00:56:38.320 --> 00:56:43.460
storm now are not really just trying to offer the minimal ingredient, which is the function.

00:56:43.460 --> 00:56:47.460
And to me, it was just like, if I can just offer the function, I can write a for loop. I can write,

00:56:47.460 --> 00:56:51.520
I can store a variable and then keep passing it into it. You could do so many different

00:56:51.520 --> 00:56:56.540
emergent behaviors with just starting with the function and then simple Python, scripting

00:56:56.540 --> 00:56:57.040
on top of it.

00:56:57.040 --> 00:57:04.220
And there's some interesting stuff here, land of prompt. So you can start, you can kind of start

00:57:04.220 --> 00:57:09.640
it, set it. I don't know what chat GDP, you can tell it a few things. I'm going to ask you a question

00:57:09.640 --> 00:57:14.980
about a book. Okay. The book is a choose your own adventure book. Okay. Now here I'm going to like,

00:57:14.980 --> 00:57:19.460
you can prepare it, right? There's probably a more formal term for that, but you can do this here.

00:57:19.460 --> 00:57:25.680
Like you can say, Hey system, you are a type of bot. And then you, that creates you an object that

00:57:25.680 --> 00:57:29.540
you can have a conversation with. And you say, what should we get for lunch? And your type of bot is

00:57:29.540 --> 00:57:33.500
pirate. And then so to say, as a pirate, I would suggest we have some hearty seafood or whatever,

00:57:33.500 --> 00:57:37.660
right? Like that's, that's beyond what you're doing with sketch. I mean, obviously this is not so much

00:57:37.660 --> 00:57:42.900
for code. This is like conversing with Python rather than in Python. I don't know. And your editor.

00:57:42.900 --> 00:57:49.280
Yeah. This one was the open AI chat API endpoint came out and I was just like, Oh, I should support

00:57:49.280 --> 00:57:53.640
it. So that's what this, I wanted to be able to Jinja template inside of the conversation. So you

00:57:53.640 --> 00:57:59.120
can imagine a conversation that is prepared with like seven steps back and forth, but you want to hard

00:57:59.120 --> 00:58:02.880
code with the conversation, like how the flow of the conversation was going. And you want to template

00:58:02.880 --> 00:58:08.200
it so that like on message three, it put your new context problem on message four, it put the output

00:58:08.200 --> 00:58:13.740
from another prompt that you ran on message five. It is this other data thing. And then you ask it to

00:58:13.740 --> 00:58:18.660
complete this, the intent of like, it's arbitrarily complex, but still something like that

00:58:18.660 --> 00:58:23.500
would be, you know, just three lines or so in Lambda prompt. The idea was that it would offer up a really

00:58:23.500 --> 00:58:28.900
simple API for this. Well, other thing that's interesting is of an async and async version. So that's,

00:58:28.900 --> 00:58:35.500
that's cool. People can, can check that out. Also a way to make it a hosted as a web service with say

00:58:35.500 --> 00:58:42.480
like FastAPI or something like that. Yeah. And you can make it a decorator if you like an app prompt

00:58:42.480 --> 00:58:47.520
decorator. Yeah. On any function you can just throw app prompt and it, it wraps it with the same class

00:58:47.520 --> 00:58:54.120
so that all the, all the magic you get from that works. The server bit is I took, so FastAPI has

00:58:54.120 --> 00:59:00.720
that sort of like inspection on the function, part. I did a little bit of middleware to get the

00:59:00.720 --> 00:59:06.780
two happy together. And then all you have to do is import FastAPI and then run, you know, Gunicorn

00:59:06.780 --> 00:59:14.220
that app. And, it's two lines and any prompts you have made become their own independent rest

00:59:14.220 --> 00:59:20.080
endpoint where you can just do a get or a post to it. And it returns the response from calling the prompt.

00:59:20.080 --> 00:59:24.220
But these prompts can also be these chains of prompts. Like one prompt can call another prompt,

00:59:24.220 --> 00:59:28.820
which can call another prompt. And those prompts can call async to not async back to async and things

00:59:28.820 --> 00:59:34.220
like that. And it should work. Pretty sure this one actually, I did test everything as far as I know,

00:59:34.220 --> 00:59:39.060
I'm pretty sure I've got pretty good coverage. So yeah, super cool. All right. Well get a little

00:59:39.060 --> 00:59:43.760
short on time, but I think people are going to really, really dig this, especially sketch. I think

00:59:43.760 --> 00:59:50.240
there's a lot of folks out there doing pandas that would love an AI buddy to help them

00:59:50.240 --> 00:59:55.220
do things like not just analyze the code, but the data as well.

00:59:55.220 --> 00:59:59.680
Yeah. Just, I think, anybody's, I know it's for me, but it's just like copilot in

00:59:59.680 --> 01:00:05.060
VS Code ID, sketch in your Jupyter ID, takes almost nothing to add. And you,

01:00:05.060 --> 01:00:08.500
whenever you're just sort of sitting there, you think you're about to alt tab to go to Google. You

01:00:08.500 --> 01:00:14.020
could just try the sketch.ask and it's surprising how often that sketch.ask or sketch.howto gets you

01:00:14.020 --> 01:00:17.860
way closer to a solution without even having to leave the, you don't even have to leave your,

01:00:17.860 --> 01:00:23.240
your environment. It's like a whole other level of autocomplete for sure. And super cool. All right.

01:00:23.240 --> 01:00:27.800
Now, before I let you out of here, you got to answer the final two questions. If you're going to write

01:00:27.800 --> 01:00:33.220
some Python code and it's not a Jupyter notebook, what editor are you using? It sounds to me like you may

01:00:33.220 --> 01:00:38.480
have just given a strong hint at what that might be. Yeah. I've switched almost entirely to VS Code.

01:00:38.480 --> 01:00:43.660
and I've been really liking it with the remote development and, like it's just, I work

01:00:43.660 --> 01:00:48.560
across like many machines, both cloud and local and some like five, six different machines are my

01:00:48.560 --> 01:00:53.960
like primary working machines. And I use the remote, VS Code thing. And it just, I have a unified

01:00:53.960 --> 01:00:59.020
environment that gives me terminal, the files and the code all in one and copilot on all of them.

01:00:59.020 --> 01:01:04.920
Yeah. It's wild. All right. And then notable pipe UI package. I mean, pip install sketch,

01:01:04.920 --> 01:01:08.460
you can throw that out there if you like. That's pretty awesome. But anything you've run across

01:01:08.460 --> 01:01:12.380
you're like, Oh, this is people should know about this. Yeah. It doesn't have to be popular. Just

01:01:12.380 --> 01:01:17.820
like, Oh, this is cool. In the, I guess these, these two are very popular, but, in the data

01:01:17.820 --> 01:01:25.460
space, I really, I'm a huge fan of, Ray and, also arrow. Like I use those two tools as like

01:01:25.460 --> 01:01:30.800
my backend bread and butter for everything I do. And so those have just been really great work.

01:01:30.800 --> 01:01:38.320
Apache arrow. Right. And then Ray, I'm not sure. Yeah. Ray is a distributed, scheduling compute

01:01:38.320 --> 01:01:42.400
framework. It's sort of like a, right. I don't know what they, yeah. I remember seeing about this.

01:01:42.400 --> 01:01:47.920
Yeah. This is a, it is, I'm parsing, he didn't talk about other things, but I'm like parsing common

01:01:47.920 --> 01:01:53.740
crawl, which is like 25 petabytes of data. And, Ray is great. It's just the workhorse. It power is

01:01:53.740 --> 01:02:00.720
really useful. Like I find it's so snappy and good, but it offers everything I need in a distributed

01:02:00.720 --> 01:02:05.200
environment. So I can write code that runs on a hundred machines and not have to think about it.

01:02:05.200 --> 01:02:09.640
It works really well. That's, that's pretty nuts. Not as nuts as chat GDP and mid journey,

01:02:09.640 --> 01:02:14.500
but it's still pretty nuts. So before we call it a date, do you want to tell people about approximate

01:02:14.500 --> 01:02:19.680
labs? It sounds like you guys are making some good progress. I might have some, some jobs for people

01:02:19.680 --> 01:02:23.520
to work in this kind of area as well. Yeah. So, we're, we're working at the intersection

01:02:23.520 --> 01:02:28.100
of, AI and tabular data. So anything related to these training, these large language models,

01:02:28.100 --> 01:02:32.820
and also, tabular data. So things with_columns and rows, we are trying to like solve that problem,

01:02:32.820 --> 01:02:37.180
try and bridge the gap here. Cause there's a pretty big gap. We have three main initiatives

01:02:37.180 --> 01:02:41.280
that working on, which is we're trying to build up the data set of data sets. So just like the pile

01:02:41.280 --> 01:02:46.860
or the stack or lay on five B these like big data sets that were used to train all these big

01:02:46.860 --> 01:02:51.240
models. We're making our own on tabular data. We are training models. So this is actually

01:02:51.240 --> 01:02:55.580
training large language models, doing these training, these full transformer models.

01:02:55.580 --> 01:03:00.380
And then we're also building apps like sketch, like UIs, things that are actually there to help

01:03:00.380 --> 01:03:05.840
make data more accessible to people. So anything that helps people get value from data and make it open

01:03:05.840 --> 01:03:11.200
source. That's what we're working on. We just, raised our seed round. So we are now officially

01:03:11.200 --> 01:03:15.280
hiring. So, looking for people who are interested in the space and who are enthusiastic

01:03:15.280 --> 01:03:23.220
about these problems. Awesome. Well, very exciting demo libraries, I guess, however you call them.

01:03:23.220 --> 01:03:27.380
But I think this, I think these are neat. People are going to find a lot of cool uses for them. So

01:03:27.380 --> 01:03:32.500
excellent work and congrats on all the success so far. It sounds like you're just starting to take

01:03:32.500 --> 01:03:36.220
off. Yeah. Thank you. All right, Justin, final call to action. People want to get started.

01:03:36.220 --> 01:03:39.300
Let's pick sketch. People want to get started with sketch. What do you tell them?

01:03:39.300 --> 01:03:43.800
Just, pip install it. Give sketch a, give sketch a try, pip install it,

01:03:43.800 --> 01:03:49.080
import it, and then throw it on your data frame. Awesome. And then ask it questions or how tos.

01:03:49.080 --> 01:03:53.180
Yeah. Yeah. Yep. Whatever you want. if you really, if you really want to, and you,

01:03:53.180 --> 01:03:57.000
you trust the model, like throw some, applies and have it clean your data for you. Cool.

01:03:57.000 --> 01:04:00.060
Awesome. All right. Well, thanks for being on the show.

01:04:00.060 --> 01:04:04.180
Come in here and tell us about all your work. It's great. Yeah. Thank you. Yeah. See you later.

01:04:04.180 --> 01:04:04.740
Thanks for having me.

01:04:05.860 --> 01:04:11.340
This has been another episode of talk Python to me. Thank you to our sponsors. Be sure to check out

01:04:11.340 --> 01:04:16.780
what they're offering. It really helps support the show. Stay on top of technology and raise your value

01:04:16.780 --> 01:04:24.320
to employers or just learn something fun in STEM at brilliant.org. Visit talkpython.fm/brilliant

01:04:24.320 --> 01:04:30.620
to get 20% off an annual premium subscription. Want to level up your Python? We have one of the largest

01:04:30.620 --> 01:04:35.540
catalogs of Python video courses over at talk Python. Our content ranges from true beginners

01:04:35.540 --> 01:04:40.840
to deeply advanced topics like memory and async. And best of all, there's not a subscription in

01:04:40.840 --> 01:04:45.880
sight. Check it out for yourself at training.talkpython.fm. Be sure to subscribe to the show,

01:04:45.880 --> 01:04:51.240
open your favorite podcast app and search for Python. We should be right at the top. You can also find

01:04:51.240 --> 01:04:57.360
the iTunes feed at /itunes, the Google play feed at /play and the direct RSS feed at slash

01:04:57.360 --> 01:05:03.780
RSS on talkpython.fm. We're live streaming most of our recordings these days. If you want to be part of

01:05:03.780 --> 01:05:08.520
the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at

01:05:08.520 --> 01:05:14.300
 talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening. I really

01:05:14.300 --> 01:05:16.840
appreciate it. Now get out there and write some Python code.

01:05:16.840 --> 01:05:37.420
I'll see you next time.