Notebooks vs data science-enabled scripts

Episode #217, published Fri, Jun 21, 2019, recorded Wed, May 8, 2019

Episode Deep Dive Links Transcript

On this episode, I meet up with Rong Lu and Katherine Kampf from Microsoft while I was at BUILD this year. We cover a bunch of topics around data science and talk about two opposing styles of data science development and related tooling: Notebooks vs Python code files and editors.

The conversation was a lot of fun and I'm looking forward to sharing it with you all.

Episode Deep Dive

Guests introduction and background

Rong Lu is a program manager on the Python team at Microsoft with a background in data mining and previous experience on the C++ and C# teams. She’s currently focused on enhancing the data science workflow in Visual Studio Code (VS Code), especially around bridging the gap between notebooks and traditional Python code files.

Catherine Kampf is also a program manager on the Python team at Microsoft, working on Azure Notebooks. She helps create seamless, cloud-based Jupyter notebook experiences for students, educators, and enterprise data scientists, often focusing on large-scale data operations and integrations with other Azure services.

What to Know If You're New to Python

Below are a few insights from the episode that will help newer Python developers follow along:

Python’s flexibility: It’s a language that can be used for everything from scripting, to full web development, to interactive data science notebooks.
Notebooks are a special environment: Tools like Jupyter Notebooks and Azure Notebooks let you see your code results immediately, making Python exploration and data tasks easier.
Visual Studio Code offers strong Python support: Once you have Python set up, installing the Python extension in VS Code unlocks editing, debugging, and data science features.
If you’re unsure about setup, consider a hosted service: Cloud platforms (e.g., Azure Notebooks) let you try Python and notebooks with zero local installation hassle.

Key points and takeaways

Notebooks vs. Python Code Files in Data Science One core debate in data science is whether to develop in a Jupyter-like notebook environment or in .py files within a code editor. Notebooks excel at quick exploration, interactive visuals, and teaching scenarios, while .py scripts are more traditional, often easier to manage in version control, and production-friendly. Microsoft’s Python team aims to blend the best of both worlds by integrating notebook features into VS Code while enabling a smooth path to .py files. This reduces manual copying between exploratory notebooks and final production code.
- Links and tools:
  - Visual Studio Code with the Python extension
  - Azure Notebooks
Bringing Jupyter-Like Interactivity into VS Code VS Code’s Python extension supports running code cells inline, displaying plots, and converting notebooks to and from .py files. This approach keeps the editing workflow code-centric while preserving the quick-execution flow found in notebooks. Users can shift from experimentation to production without wrestling with two separate tooling ecosystems.
- Links and tools:
  - VS Code Jupyter support
Azure Notebooks: Zero-Setup Data Science in the Cloud Azure Notebooks offers a free way to run Jupyter notebooks without installing anything locally. It simplifies environment management (no complicated library installs on your laptop) and allows quick sharing or cloning of projects. Educators especially enjoy it for classroom instruction, where each student can have consistent, pre-configured compute.
- Links and tools:
  - Azure Notebooks
Collaboration and Version Control Challenges with Notebooks Jupyter notebooks (in .ipynb format) can be cumbersome to diff and merge, as they include output cells and metadata in a JSON structure. Teams sometimes resort to manual merging or external tools. Microsoft’s solution includes converting notebooks to plain Python files for version control in VS Code or carefully using Git extensions for JupyterLab.
- Links and tools:
  - Git
  - PaperMill (mentioned as a useful notebook tool for parameterizing notebooks)
Azure Machine Learning (AML) Service AML provides an end-to-end machine learning pipeline for data storage, training, and model deployment. It simplifies scaling up compute resources or spinning down clusters to save costs, tracking experiment runs, and versioning models. With just a few commands, you can turn a trained model into a web service on Azure.
- Links and tools:
  - Azure Machine Learning
Scaling and Managing Data in the Cloud Using Data Science Virtual Machines or Azure storage allows data scientists to tackle large datasets without hardware constraints. Projects such as image classification (like pet breed recognition) become more manageable when you can spin up GPU-backed VMs, run experiments, and shut them down to control spending.
- Links and tools:
  - Azure Data Science VMs
Cool Science Meets Notebooks The LIGO (Laser Interferometer Gravitational-Wave Observatory) project uses Python notebooks, showcasing that cutting-edge physics and research are heavily reliant on Python data tools. Libraries such as NumPy and specialized HPC solutions let scientists process large volumes of signal data and share their findings interactively.
- Links and tools:
  - LIGO Project
New Windows Terminal and Improved Developer Experience Microsoft unveiled a revamped Windows Terminal, offering tabbed and themeable features to match modern CLI expectations. Along with WSL (Windows Subsystem for Linux), it makes Windows more appealing for cross-platform devs who rely on a robust command line. This shift underscores Microsoft’s drive to embrace developers on Windows for Python and beyond.
- Links and tools:
  - Windows Terminal
  - WSL
Growing Popularity of Python Through Data Science Both guests highlighted how Python’s meteoric rise is largely driven by AI and machine learning demands. Data scientists find the language simpler to learn than lower-level alternatives, while Python’s ecosystem of libraries (like pandas, scikit-learn, Plotly) cuts down development time and complexity.
- Links and tools:
  - Plotly
  - pandas
Exploratory vs. Production Code Many data scientists use notebooks to explore datasets and quickly visualize results, but software teams often need a stable, testable, and maintainable codebase. Tools in VS Code allow exporting a final or intermediate notebook to a .py file to integrate with tests, CI/CD, and standard software practices. This bridging lessens friction between experimentation and shipping a reliable product.

Interesting quotes and stories

"We tried to solve [notebook merging issues] by avoiding using notebook files... so you’re literally working with a Python file." Rong Lu

"Plotly was one of the first things I tried when I joined the [Python] team and it’s super cool to render interactive 3D plots right in VS Code." Rong Lu

"As I walk through the booth area [at conferences], it’s fun to see people from all sorts of unexpected companies, because they’re using Python for development." Rong Lu

Key definitions and terms

Jupyter Notebooks: An interactive coding environment where code cells and outputs (including plots) appear inline, making it easy for exploration and data analysis.
Azure Notebooks: A free, cloud-hosted version of Jupyter notebooks provided by Microsoft for teaching, prototyping, and sharing.
VS Code Python Extension: Microsoft’s open-source extension that adds Python editing, IntelliSense, debugging, and data science features into Visual Studio Code.
Azure Machine Learning (AML) Service: A platform for training, tracking, deploying, and managing machine learning models at scale on Azure’s cloud infrastructure.
PaperMill: A tool to parameterize and execute Jupyter Notebooks, allowing you to treat notebooks more like functional units of computation.

Learning resources

For those wanting to build a solid foundation in Python or dive deeper into data-centric workloads, here are a few courses:

Python for Absolute Beginners If you are new to Python or programming, this course covers the essential language constructs and will help you gain confidence writing your first applications.
Data Science Jumpstart with 10 Projects Learn Python’s main data-focused libraries while working on real-world mini-projects, a great practical introduction to data science.
Python 3.11: A Guided Tour Through Code When you’re ready to see the newest Python features in practice, this 2-hour guided tour is perfect for leveling up.

Overall takeaway

Notebooks offer quick exploration and immediate feedback, making them beloved by data scientists and educators, while traditional Python files shine for larger, production-grade applications. Tools from Microsoft, such as the Python extension in VS Code and Azure Notebooks, help unify these worlds by providing robust collaboration, version control strategies, and integration with powerful cloud services like Azure Machine Learning. By leveraging these approaches, teams can move smoothly from experimentation to production, tapping into Python’s rapid growth in data science, research, and beyond.

Links from the show

Rong on Twitter: davorabbit
Katherine on Twitter: @kvkampf
Talk Session: Build an AI-powered Pet Detector with Python, TensorFlow, and Visual Studio Code: microsoft.com
The Scientific Paper Is Obsolete - Here’s what’s next: theatlantic.com
Laser Interferometer Gravitational-Wave Observatory (LIGO): wikipedia.org
Episode #217 deep-dive: talkpython.fm/217
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #217 deep-dive: talkpython.fm/217

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 On this episode, I meet up with Rong Lu and Catherine Kampf from Microsoft while I was at Build this year.

00:04 We cover a bunch of topics around data science and talk about two opposing styles of data science development and related tooling,

00:11 using notebooks in Jupyter or Python code files and editors like Visual Studio Code and PyCharm.

00:17 The conversation was a lot of fun and I'm looking forward to sharing it with you.

00:20 This is Talk Python To Me, episode 217, recorded May 8th, 2019.

00:25 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:44 This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.

00:48 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via at Talk Python.

00:55 This episode is sponsored by Linode and Backlog.

00:58 Please check out what they're offering during their segments. It really helps support the show.

01:01 Rong, Catherine, welcome to Talk Python.

01:04 Yeah, thank you. Thanks so much for having us.

01:06 Yeah, it's great to be here with you and literally here with you at Microsoft Build at the conference doing some live recording.

01:14 So, quite cool. How's the conference going?

01:17 It's been great. I'm a little sad because it's so sunny out, but the stuff inside is great too.

01:21 Yeah, really exciting news happening here and you see a lot of people getting excited.

01:26 Yeah, is this an event that you've been planning for for a long time?

01:30 Yeah, every year.

01:31 We have the back-to-back PyCon to build.

01:34 It's been a lot of conferencing, a lot of fun announcements, a lot to keep up with, but also met a lot of great people.

01:39 You were also at PyCon, right?

01:41 Yes.

01:41 Yeah, that's pretty awesome. What did you think of PyCon?

01:43 It's great. It was my first PyCon.

01:45 So, pretty cool out there and then see the community and meet people.

01:49 So, it was my first sort of community-run conference.

01:53 I've only been to Microsoft ones in the past, so it was very cool to see that and have more interaction with Python users and Notebooks users.

02:01 Yeah, that's cool. I think both the conferences are pretty special and really nice and I enjoy going to both of them, but they're really different.

02:07 Yeah, definitely.

02:09 For people who haven't gone to Microsoft Build, the experience walking around here is like all this cool stuff and there's like robotics and other kinds of neat things going on, but it's mostly put on by the different Microsoft teams, right?

02:22 Like there's a section of like the Visual Studio for Mac people and you can go talk to the people that work on it and that's great.

02:28 And you go to PyCon and it just doesn't have like that centralized structure, right?

02:33 It's just like a thousand flowers blooming and just, you know, whatever's happening is happening out there.

02:38 So, it's a little more chaotic, but, you know, a little more independent as well.

02:42 I don't know. I think they're both fun and they're both a cool experience.

02:45 There's some nice surprises as well at PyCon.

02:48 Like as I walk through the booth area, like at Build, you kind of expected all these Microsoft Teams there, but at PyCon, you see different companies showing up there because they use Python for development.

03:01 Yeah.

03:01 And like really unexpected companies there, but it's fun.

03:06 It's interesting.

03:06 Yeah, it's super interesting.

03:07 Very different.

03:08 Yeah, you know, one of the parts I really like is the first two and a half days, like the opening night and then the first two days are the Expo Hall days.

03:18 You know, we've got a booth.

03:19 You all had a booth.

03:20 Were you at the booth?

03:21 Yeah, I was.

03:22 Many hours at the booth.

03:23 Yes.

03:23 The booth is a blessing and a curse.

03:25 I was at a booth as well.

03:27 I have been for the last couple of years and it's great.

03:30 You meet so many people, but also like you don't get to experience as much of the rest of the stuff going on.

03:34 But then the next day they take that down.

03:37 It's the poster section and then the job fair.

03:39 Right.

03:39 And the job fair, I think it's pretty revealing because you can walk along and see like all these companies are hiring Python developers, data scientists, web developers, whatever.

03:48 And it's super clear and obvious kind of like what they're up to.

03:51 So you can kind of really put your finger on the pulse of the community by just walking through the job fair.

03:57 Yeah, it was super crowded.

03:58 Yeah.

03:59 So the community is super engaging and super fun.

04:04 Yeah.

04:04 Yeah, absolutely.

04:05 I think that's for sure.

04:06 Both of these events were super cool.

04:08 And I was glad we were all at them.

04:10 That's great.

04:11 So let's start with your story.

04:12 And I guess, Ron, we'll start with you.

04:14 How did you get into programming in Python?

04:16 Actually, about 10 years ago when I was in college, I actually majored in data mining.

04:21 And then back then I learned C.

04:24 Okay.

04:24 But after I graduated, I joined Microsoft.

04:26 I mostly worked on C# related developer tools and then moved to C++.

04:31 And just recently, I just moved from C++ team into the Python team.

04:36 It's like, oh, these people are talking about Python.

04:38 I got to see what's going on there.

04:41 And then, of course, my own knowledge from 10 years ago about data mining no longer applies.

04:47 Like so many new things happening in the AI, data science world.

04:50 And this is super exciting.

04:52 Sure.

04:53 And that's why I got into the Python team.

04:55 It was like, hey, maybe I can use some of my own knowledge here.

04:58 But a lot of learning for me, actually, given how things have moved.

05:02 Yeah, I'm sure.

05:03 I mean, it's changing so fast.

05:05 What was your impression of coming into Python from like these more statically typed compiled type languages and all that?

05:13 Definitely feel like the level of experience required to get started is very different coming from a C++ world.

05:21 And how I see like all these users we have and even younger kids, they start to learn about programming.

05:29 Python is the first language they learn and up to like people are just using Python for various different projects for real life.

05:37 And that's super cool.

05:38 And that's a lot of applications in real life that could change people in their daily life using the power Python.

05:45 Like earrings, predicting different things.

05:51 Yeah.

05:51 Did the white space drive you crazy at first?

05:53 A little.

05:54 And now?

05:55 Now, getting better.

05:57 Good.

05:58 Good.

05:59 But yeah.

06:00 Cool.

06:00 Catherine, how about you?

06:01 I kind of had a similar experience.

06:02 I started programming, just randomly took a course in my high school and then ended up majoring in computer science.

06:08 And that was all in C++.

06:10 And I remember one of my years I ended up in an AI course.

06:14 And the course was mainly around Python.

06:16 And I remember I like hated Python at first.

06:18 I was like, this feels too easy.

06:20 What's going on here?

06:21 Someone's tricking me.

06:22 I don't know what's going on.

06:24 This is no.

06:25 I want my C++.

06:27 And then over the course of the semester, I just fell in love with Python.

06:30 And then when I started at Microsoft, I was exposed more to the big data landscape.

06:34 So using things like PySpark and getting exposure to notebooks there.

06:37 And then, yeah, I've just, Python has very much grown on me.

06:40 My friends from school always laugh because they're like, remember when you hated it and now you're on the Python team?

06:45 Life takes weird turns.

06:47 Yeah.

06:48 Yeah.

06:48 For sure.

06:48 It's like, no, I want to use a real language.

06:50 Right.

06:50 Compiling and linking and headers.

06:53 Come on.

06:54 Space doesn't matter.

06:54 So my cool.

06:55 That's right.

06:56 I'll write it all in one line.

06:57 Yeah.

06:59 That's pretty cool.

06:59 I love it.

07:00 So, Ron, you both work at Microsoft, but you don't work on the same projects or the same team, right?

07:07 We are on the same team.

07:07 Same team.

07:08 Different projects.

07:09 Different projects.

07:09 Okay.

07:09 So, Ron, let's start with you.

07:11 What do you do day to day?

07:12 Yeah.

07:12 So, I am a program manager on the Python team at Microsoft.

07:15 Primarily, I'm focusing on building developer experience around data science and machine learning inside Visual Studio Code.

07:23 Visual Studio Code is our lightweight cross-platform editor.

07:28 And it's been pretty good for Python general development already.

07:31 But now we're adding a lot of new things around how to make it easier for people who do data exploration, analysis, and machine learning.

07:40 How do we bring all those power of Jupyte Notebooks into an editor so you can use it for everything you do?

07:49 So, there are people who prefer an editor versus notebooks.

07:52 That has been an ongoing debate in the community.

07:55 And that's definitely there are personal preference.

07:59 But we also see a place where VS Code can be the place where you take your notebook file after you're done with your experimentation and you're ready to turn your code into a production code.

08:11 And this is where you can kind of bridge the two worlds together.

08:15 And VS Code can play a role in that.

08:18 So, that's the new area that we're investing in.

08:20 That's what I've been working on since I joined this team.

08:23 I think it's cool that you're working on that because I feel like there's a bit of a divide.

08:26 Not in a negative way, but there's certainly people that will open up a notebook in Jupyter, JupyterLab, something like that, to do exploratory data.

08:34 And there's people who write production code that's structured with tests and coverage and all of that stuff.

08:41 And going between those, I don't know what people do today, but it feels like, well, you take the notebook and you just copy bits over.

08:47 So, we're trying to hopefully minimize the manual copy and paste between the two.

08:56 So, when you have a notebook ready to turn into production, we can help you import a notebook into Python code, in VS Code directly, so you don't have to do all the work yourself.

09:05 And then, once you're in VS Code, of course, you get all the editing features that you expect.

09:09 There's IntelliSense refactoring, debugging, everything in there, so you can live in a developer tool to finish your developer work.

09:17 Yeah, that sounds awesome.

09:18 And Catherine, how about you?

09:19 You're on the other side of the design.

09:21 Yeah.

09:21 We fight.

09:22 We fight.

09:22 She is on the notebook side.

09:26 Every day, 8 a.m., wrong, and I face off to see who's going to win that day.

09:30 So, I'm a program manager on the Azure Notebooks team, and Azure Notebooks is basically an Azure-hosted, free, no-setup-required Jupyter Notebooks environment.

09:39 So, in my day-to-day, I spend a lot of time talking to customers, learning about their experience, whether it be in an editor, like VS Code, or in notebooks, and learning, you know, how we can make that better, and what our next generation of Azure Notebooks should look like to enable every set of scenarios, from educators to enterprise data scientists, and learning about any features or things that they need to be happy on Azure Notebooks.

10:03 Yeah, awesome.

10:04 That sounds like a really fun project.

10:06 It's a blast.

10:06 Yeah, I love it.

10:07 Yeah, cool.

10:08 So, I want to talk a little bit about just kind of notebooks in general and find out where you think they're really useful or, you know, what's sort of coming next in it.

10:17 And I just want to give a quick shout-out to an article on The Atlantic, I think, which I've talked about a few times, but it's called The Scientific Paper is Obsolete, and Here's What's Next.

10:27 And it's got, like, a lot of a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper.

10:46 And I think that's a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper.

10:53 And I think that's a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper.

11:06 And I think that's a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper.

11:13 I think that's a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper.

11:20 And I think that's a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper, and it's got a lot of people in the paper.

11:32 So there's scenarios like that, all the way to, you know, we have a security team at Microsoft who's building their anomaly detector and threat detection, threat hunting, all within notebooks.

11:43 So I think it's just super cool to be a part of and see the vast array of use cases for notebooks.

11:49 Yeah, it sounds like maybe it's not even really shaken out, right?

11:52 Yeah, exactly.

11:53 We kind of know what the use case for code looks like, right?

11:55 Like regular executable files, you know, websites and whatnot.

12:00 But I feel like with notebooks, there's just all these different attempts to make use of them.

12:05 Right.

12:06 You know, could you treat them as like units that can be combined as like a function or something, or like are they a paper that's better?

12:12 Yeah.

12:13 Yeah.

12:13 Which is right.

12:13 Yeah, it's cool to see.

12:14 I mean, we get within our community, it's easy to see the applications and data science, but then it's always great to hear outside perspectives, people coming from different languages or different areas of technology and everything,

12:25 and thinking about how they can use this as a platform for experimentation or sharing work and collaborating.

12:31 So it's very, very inspiring.

12:33 Yeah, yeah.

12:34 That's pretty awesome.

12:34 Yeah.

12:35 Wrong?

12:35 Absolutely.

12:35 I think the interactivity is what Jupyter is most powerful at.

12:41 Like the fact that we can turn a static report or paper into something that everybody can go and execute.

12:48 That's super cool.

12:50 Not only we have seen a lot of people using notebooks for teaching, because that's like where organized documentation plus real code.

12:58 Yeah.

12:59 It's super easy for a teaching setting.

13:01 And also we use that like internally as well as we report numbers and build dashboards.

13:07 What if everything is interactive and other people can go and pivot the data independently without having to go into like a pile of code.

13:16 So that's really, really, really powerful.

13:19 And that's why it's so popular in the data science community.

13:22 Absolutely.

13:23 A great thing to have.

13:25 And I think that adoption is going to just keep going, given how easy it is to get started.

13:30 And I think one of the things you mentioned earlier, just reproducibility.

13:33 I think that's one of the things notebooks simultaneously does really well and also poorly in terms of like environment sharing.

13:41 And that's something we've been thinking a lot about just in our platform.

13:43 You know, if I'm sharing a notebook with wrong, how can I make sure that dependencies get so complex?

13:48 How can I make sure that everything wrong has everything she will need to run this notebook?

13:53 And even with data sets, you know, if you share a report that connected to a database that was once storing, you know, this data from May 2016.

14:02 And now it's June 2019.

14:03 You know, how do you handle those cases?

14:05 Right.

14:05 And the report says, you know, such and such and like mark down below, like in a cell below it.

14:10 But it's like the data is not the same anymore.

14:12 Right.

14:12 That's actually, that's an interesting idea.

14:14 Right.

14:15 The life cycle of it.

14:16 Right.

14:16 Yeah.

14:16 I think that's one thing that Azure notebooks could potentially help because the fact that it runs in the cloud and then we could potentially have pre-config the environment.

14:25 So you can make sure your notebooks will always produce the same results.

14:29 Yeah.

14:30 So right now you can use like environment setup scripts and import your requirements.txt or environment config.

14:35 But in the future, we definitely want to make that as seamless as possible.

14:38 So I can just share whether it be an Azure notebooks project or a GitHub repository with you that has my requirements.

14:43 So it'll just be in a container and everything's good to go.

14:46 That's pretty cool.

14:46 I mean, even with the requirements, the .txt file, right?

14:50 It might say we need these things to run this notebook, but they might not be pinned.

14:54 Right.

14:54 And so like you run it again a year later, one of those things has become incompatible with like the code that was written or something like that.

15:01 Right.

15:02 It's a challenge.

15:03 And as a data scientist, you probably don't think about like, oh, I better like pin the versions of my dependencies.

15:08 You're like, oh, look, it's a graph.

15:09 It's working.

15:10 Yes.

15:10 Don't change it.

15:12 Don't touch it.

15:12 That's my version.

15:14 But of course, later on, if the package has newer versions, then you have to upgrade your notebook to make sure it still runs and stuff like that.

15:21 It is kind of a pain to manage.

15:23 It's pretty tricky.

15:25 So I want to dig in more to Azure Notebooks, but like let's, you said there's like this battle you all do at age.

15:30 So I think it's Ron's turn to like say a little bit about like, if not notebooks, if it's going to be more like code editor style of work, but still in the data exploratory way, you talked about this other thing that you're doing.

15:45 What's that look like?

15:45 Yeah, essentially to summarize what we have done is we brought the power of Jupyter Notebooks into VS Code.

15:52 So imagine, so today in Jupyter Notebooks, you can run a piece of code and get your results right away, whether it's a data frame or it's a plot or just text, markdown.

16:04 And imagine you can do all that inside VS Code.

16:06 In a slightly different view, but you get the same results.

16:10 Is it actually the same format, the IPYNB file?

16:14 Or is it something different?

16:16 So actually, so at the moment, the scenario we support is you work with Python scripts.

16:21 So once you have your Jupyter Notebooks ready, you're ready to import, we grab all the code in there and put everything into a Python file.

16:28 So you're literally working with a Python file.

16:31 Right, just a .py file.

16:32 We have converted it into PY.

16:35 Yes.

16:35 So the idea is, hey, this is really an editing centric experience.

16:40 And the fact that you can visualize all those things like plots and data frames and all that, that's like an add-on bonus.

16:46 But you are really working with...

16:48 Yeah, and do you have like a demarcator for the different cells?

16:51 Yes.

16:51 Like a double hash or something?

16:53 Right.

16:53 It's single hash.

16:55 Single hash, okay.

16:55 No hash percent percent is the magic command.

16:58 Okay.

16:59 Not command, comment style, special comment you can put in Python file.

17:03 And then we can recognize, oh, this is a cell.

17:05 And then we do magic things.

17:07 Okay.

17:08 So we'll visualize, like, kind of visually we'll highlight those cells in the code editor.

17:13 So you kind of get a feel of, like, Jupyter notebooks.

17:16 But then you can also run single cell using shift enter.

17:20 Like, that's the key that people use, shift enter.

17:24 Yeah, you can do the same in VS Code.

17:26 Okay.

17:26 And you get the same results.

17:28 We show all the results in a separate window in VS Code that's separate from your code.

17:34 That's just how the design.

17:36 We decided to go with that design.

17:38 But there's no reason we can't show results in line because we have got some requests around that.

17:44 Right, right.

17:45 So we're trying to bring the two together.

17:47 I think there are very different users who want different things depending on their background.

17:54 People coming from Jupyter notebooks are more familiar with inline results.

17:59 But those who come from software engineering background that I used to work with editors and cofiles,

18:05 they're more like, oh, I just wanted something on the side.

18:08 So I can see all the history, like all the results.

18:11 I can always go back.

18:13 So nothing is getting replaced in line.

18:15 But they want more of an editor experience.

18:17 Right.

18:18 That's more of an editor experience.

18:19 So I think that we just provide different options for people who come from different backgrounds.

18:24 So you can pick if you want a notebook, if you want a notebook style in VS Code, you can do all that.

18:29 This portion of Talk Python To Me is brought to you by Linode.

18:34 Are you looking for hosting that's fast, simple, and incredibly affordable?

18:37 Well, look past that bookstore and check out Linode at talkpython.fm/Linode.

18:43 That's L-I-N-O-D-E.

18:44 Plans start at just $5 a month for a dedicated server with a gig of RAM.

18:49 They have 10 data centers across the globe.

18:51 So no matter where you are or where your users are, there's a data center for you.

18:55 Whether you want to run a Python web app, host a private Git server, or just a file server,

18:59 you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network, 24-7 friendly support,

19:07 even on holidays, and a seven-day money-back guarantee.

19:09 Need a little help with your infrastructure?

19:11 They even offer professional services to help you with architecture, migrations, and more.

19:16 Do you want a dedicated server for free for the next four months?

19:19 Just visit talkpython.fm/Linode.

19:22 Well, you'll have to help me out with this one because I haven't done a lot of teamwork on notebooks,

19:28 but I hear that merging notebooks can be challenging, especially if someone's rerun a cell and it has a new output.

19:36 Oh, yeah.

19:37 Right.

19:37 Because it's called an API and it's like caught in the weather, but they ran it and the weather's now different.

19:43 Yeah.

19:43 And so it's like a merge conflict and stuff like that.

19:46 This is a problem?

19:47 This is a problem, yeah.

19:48 The way notebooks are stored in IPYMB is just kind of a JSON file.

19:53 So it stores, you know, the output and specific metadata about your file as well.

19:58 So you can go and change one line and you'll suddenly, you know, try to merge it and you'll see eight lines or like 80 lines and everything's moved around.

20:06 So this is, yeah, one of our big focus in the near future for Azure notebooks is

20:10 how do you make sure that source control is as easy as possible for notebooks?

20:14 And, you know, we were just talking to a user last week who was like, we tried to use GitHub with notebooks, but then it got too complex.

20:20 So we all just had, you know, they were a team of four.

20:22 They all created their individual notebooks.

20:24 And then in the end, one person was tasked with copying the bits and pieces that were good.

20:27 It's your job to make the final notebook.

20:29 Yeah, you know, they got the short of the stick.

20:31 Yeah.

20:32 Uh-huh.

20:32 So, you know, we were talking to a bunch of users, as I said, like that to understand not just having GitHub integrated right here,

20:40 because that doesn't solve the issue with the formatting of notebooks and that still doesn't make it perfect.

20:44 But making sure that we have a complete, super easy end-end experience for that problem in the notebooks world.

20:50 It sounds tricky.

20:51 And it sounds like the code thing that you guys are doing maybe sidesteps that.

20:55 Right.

20:55 Yeah.

20:56 We tried to solve that by avoiding using notebook files.

21:00 That's why we convert everything into a PIE file.

21:03 Now you can use source control and everything around it.

21:06 It's just plain code.

21:07 Right, and the output goes somewhere else.

21:08 Yes.

21:09 And it's not part of the diff or whatever.

21:11 Right, right.

21:12 But you can actually export the results as a notebook file if you want to go back.

21:17 Like if you want to share the results with someone, send someone a notebook file.

21:21 So we joke that wrong and I fight, but we work really closely together because so many scenarios are,

21:27 I explore in notebooks.

21:28 Now I want to build a .py file and VS Code.

21:31 And maybe I make some changes that I actually want to now share that as an updated notebook.

21:35 We work very hardly to make that end-to-end complete scenario that's super easy.

21:40 Yeah, that's cool.

21:41 I guess while we're on the subject, maybe we could talk about like collaboration.

21:45 If I'm working on an Azure notebook, I go over there.

21:48 Can I share that kind of Google Docs style?

21:51 With collaboration?

21:52 Not yet.

21:53 We're thinking about how we could maybe leverage something like VS Code Live Share,

21:56 where you could do live sharing with individual people and watch them edit all within the same document.

22:01 But this is something that has come up in a few notebooks products before and then sometimes been rolled back

22:07 because it's not super easy with the formatting.

22:10 But yeah, so right now in Azure Notebooks, you can share a notebook publicly and you can clone it.

22:15 And we can collaborate that way, but not yet with the live co-editing.

22:18 Yeah, cool.

22:19 That's something you can do in VS Code with the Visual Studio Live Share feature.

22:22 Yeah, maybe people don't know about that.

22:24 I did talk to Dan Taylor in the previous episode, but maybe people didn't hear so.

22:28 Live Share enables real-time co-editing and co-debugging across different machines or across different OS.

22:35 So the idea is any number, I guess up to 30 at the moment, any number of developers can collaborate real-time.

22:45 And it's not just screen sharing.

22:47 It's real session share, being shared.

22:49 So anyone can make changes to the same code base and everybody else can see it and you can run the code and debug and everyone else can see that too.

22:58 So essentially you can encourage a lot of scenarios around like pill programming or just remote troubleshooting.

23:07 And we also see like that being used in classroom settings too.

23:10 Like a teacher is hosting a session with 30 students looking at the same thing.

23:16 And that's pretty interesting.

23:17 That actually generated a lot of interesting scenarios.

23:20 Yeah, I bet it did like office hours, but I'll help you by just being on your system basically, right?

23:26 Yeah, absolutely.

23:27 So we actually extended that functionality beyond the basic support.

23:32 So now we have added the Jupyter support on top of Live Share as well.

23:36 So now you can imagine you're working on your notebook, you want some help and somebody else can connect to your session, you can run a cell.

23:45 And both of your machines are going to see the plots, the data frames and everything the same on the screen.

23:51 So that's pretty cool.

23:52 Yeah, that seems super cool.

23:53 Like one of the problems, of course, especially I think in data science would be the actual data, right?

23:59 Like if you have a gigabyte of data and then you're talking to that from your notebook or from these like notebook-like PyFiles, it's one thing to say, well, yeah, you can just grab the notebook or grab the Python file out of GitHub.

24:11 But then there's a gig of data, you got to get to them somehow, right?

24:15 So being able to like connect in and just sort of run in place would be cool.

24:19 Yeah, actually, everything happens on the host machine.

24:21 Meaning if the guest machine doesn't have the same setup, same environment setup, that's okay because nothing is required on the guest machine.

24:31 You literally just need VS Code installed.

24:34 And then all the like building and IntelliSense debugging and all the packages, environment settings, everything is streamed from the host machine.

24:43 So you can literally see everything exactly the same.

24:45 Yeah, that's really nice.

24:46 You know, I have some friends who did a lot of paired programming.

24:49 It was always, well, let's set up a virtual machine and like do a shared remote desktop so we can both type or other like weird things.

24:55 Yeah, it's a lot of work to get us set up.

24:58 I'm really excited about this.

24:59 This is cool.

25:00 And how about on Azure Notebooks?

25:01 Can you like, if I have a gig of data, can I like upload that in there?

25:05 Like what's the story with the data backing them?

25:07 Yeah, so you can either store things locally in Azure Notebooks within your project.

25:12 And so if you cloned it, you'd clone that data with and none of that's going to your local machine.

25:16 So you don't have to worry about those things.

25:18 It's all Azure hosted.

25:19 It happens really quickly.

25:19 The cloud is fast.

25:21 So yeah, you can also, you know, connect to an Azure Blob store.

25:25 So for instance, in the demo Ron and I are doing today, we put a bunch of pet images that we're classifying in a Blob store and we pull them down in Azure Notebooks and then send it out and test it on our web service from there.

25:36 Yeah, that's cool.

25:37 I guess that would be the most cloud natural cloud native sort of solution is we'll put it in like cloud storage and then talk to it or read it or write it.

25:44 Right, right.

25:44 Yeah, okay.

25:45 Yeah, that's pretty awesome.

25:46 So I know that Visual Studio Code is free, right?

25:50 Free open source and so on.

25:51 Yes, everything we do here.

25:53 The Python extension and Jupyter support is part of the Python extension, which is also all free at open source.

25:59 So I don't need to install anything else.

26:01 It's just like when the Python extension updates itself, then magically.

26:05 It's all in there.

26:06 It's in there.

26:06 Okay, super cool.

26:07 What about Notebooks?

26:08 Notebooks is free.

26:09 Okay.

26:10 You can, you know, we let you run for free on our hosted compute.

26:14 You can connect to an Azure VM if you want a more beefy machine, GPUs, etc.

26:20 You can connect to an Azure VM, in which case those would be paid.

26:23 Right, so I would pay for like a fancy VM that has GPUs and then I would talk to it?

26:27 Yes, exactly.

26:28 How does that work, right?

26:29 Like instead of, I wouldn't just go to my VM I set up and just like pip install Jupyter and do stuff there, right?

26:36 Like it sounds like it would be different.

26:37 Yeah, so we actually use something called a data science virtual machine in Azure.

26:41 And with that, Jupyter's already installed there.

26:44 So from Azure Notebooks, you just, we have an easy compute picker that'll automatically populate with the list of your data science virtual machines that you have access to.

26:52 You just click on it, hit run, and everything just works.

26:55 And you're running your notebook on that VM.

26:57 Yeah, that's cool.

26:58 Are there like scientists and companies and stuff using that a lot to do cool stuff?

27:02 Yeah.

27:02 We have a bunch of companies as well as education, research.

27:06 We see a lot of researchers using it, especially because for the free classroom settings, people use that a lot.

27:12 Yeah, that seems pretty cool.

27:14 One of the things I saw on the Azure Notebooks that I thought was kind of cool is a lot of different languages supported.

27:20 Yes.

27:21 Yeah, so what languages are, can I run there?

27:23 I mean, there's Python, obviously.

27:24 So Python 2 and 3, and then Fsharp and R.

27:28 Okay.

27:29 So Fsharp is kind of surprising, right?

27:30 Fsharp is like a functional .NET programming language, which I haven't really done very much with.

27:36 But what I didn't see there was Csharp.

27:38 And I figured if you're going to have Fsharp, maybe you would also just throw in Csharp for the fun of it.

27:42 Like, that's kind of interesting.

27:43 Why not?

27:43 It's something we're thinking about.

27:45 We haven't, you know, we see so much of our usage is Python.

27:48 So from that, we really focus on making that our optimal experience.

27:52 And that's where most of our energy goes.

27:53 But we definitely do hear requests every now and again for Csharp.

27:58 And it's one of those, similar to what I was talking about earlier, where you see so much Python usage with notebooks, but there's so many other scenarios.

28:04 And Csharp is a perfect example.

28:06 You know, how can you experiment with Csharp on notebooks?

28:08 So something we love to do, just haven't done yet.

28:11 Yeah, yeah, sure.

28:12 No, I mean, I think definitely focusing on Python and Python 3, that's like the sweet spot, right?

28:17 Of course.

28:17 But just, yeah, the Fsharp being there kind of stood out to me.

28:21 So if I'm building these notebooks or even with VS Code, like, what are some of the other things in Azure that people are using and connecting to?

28:30 Like, I know there's a bunch of other data science, machine learning stuff going on there.

28:33 Oh, yeah.

28:34 What are some of the cool scenarios?

28:35 So the very basic thing we can start with is definitely just beefy machines, right?

28:40 Yeah.

28:40 Give me some GPUs.

28:42 Right, GPUs, faster machines.

28:43 And then, so on top of that, Azure provides what we call data science VMs.

28:49 These are, think of those as VMs pre-installed with, like, data packages pre-installed.

28:54 So when you go there, everything is already there.

28:56 Like, Jupyter Notebooks is installed.

28:57 All those packages are already there.

29:00 So you can definitely connect to that.

29:01 We have Windows.

29:02 We have Linux.

29:03 Different OSs, depending on what you want.

29:05 So that's kind of the fairly common use case of Azure resources.

29:09 Of course, you can use storage if you want to store data there.

29:12 So you can do all that already.

29:15 On top of that, Azure also provides a service that's machine learning specific called Azure Machine Learning Service.

29:22 It is a comprehensive set of services that covers the entire workflow in machine learning.

29:28 So starting with data.

29:29 So Azure Machine Learning Service can help you manage data.

29:33 Like, we provide what we call a data store, which is part of the service.

29:37 So you don't have to manage your storage separately.

29:40 Machine Learning Service knows where to find your data for training, for example.

29:44 I see.

29:44 It's probably, you don't have to, like, use the blob storage API.

29:47 Yeah, and then you figure out how to connect to that thing, right?

29:50 Yeah, exactly.

29:50 So Azure Machine Learning Service is part of the whole world.

29:52 The machine learning service manages the whole world for you.

29:55 And then moving on, of course, for training, you need more compute.

29:58 Like VMs we talked about.

30:00 So Azure Machine Learning Service can help you manage compute as well at scale.

30:04 So you can say, I want 50 machines, up to 50 machines, but scale them back down if I'm not using them.

30:12 Which is the best part because I'm terrible at controlling my costs.

30:15 Right.

30:15 Scale down to zero.

30:16 Yeah, scale down to zero.

30:18 But you can scale up to as many as you want pretty fast.

30:22 Yeah, that's cool.

30:22 And that's good that if you forget.

30:23 Right.

30:24 Yeah.

30:24 Which I always do.

30:26 Yeah, you won't spend all your money.

30:27 Right, right.

30:28 Especially when they're like, you know, you have like a 16-node GPU cluster.

30:32 Right.

30:32 Keep them running for days.

30:34 That'll be...

30:35 Yeah, it's a big bill.

30:36 Yeah.

30:36 So it also helps you manage your training jobs.

30:39 So you have different training runs and you have different results you want to keep around.

30:45 So Azure Machine Learning Service can help you manage all that.

30:48 So for example, you have one experiment that has like 100 repeated runs.

30:52 Each run has different results.

30:54 And at the end of a run, you generate a different model file.

30:57 So that's all being stored as part of a service.

31:00 So you can always go back, look at a model and go back and say, oh, this is the run I did that got to this model.

31:07 So you can always trace back.

31:09 So that's all managed.

31:11 And then of course, once you have the model at the end of training, you can download it to local and do whatever you want with it.

31:17 Right.

31:17 But also you could have the Machine Learning Service manage it for you.

31:21 Imagine if you have multiple models and you have multiple versions that becomes a nightmare to manage.

31:27 So you can have Machine Learning Service to manage all that in a central location.

31:32 Not only that, but also once you have that registered with the service, it's super simple to turn the model into a runnable service.

31:39 Like literally, we can do that in five minutes.

31:41 That's cool.

31:42 Turn that into a service that's running in a single container that Azure can spin up for you fairly quickly.

31:48 Really, five minutes, everything's done.

31:50 Then you can start using this model already.

31:52 So you talked about this pet image.

31:54 Yes.

31:54 So what are you trying to find from these pet images?

31:58 Yeah.

31:58 So the end result is basically you can send our web service an image of a pet, cat or dog, not like an obscure pet.

32:05 Not a lizard.

32:05 Yeah.

32:06 And our model will send back probabilities of different breeds.

32:11 So if you send a golden retriever, I had my roommate wanted to test it with her golden retriever from home.

32:16 So send that.

32:16 And it got 99.7% confident that it was a golden retriever.

32:20 That's awesome.

32:21 Yeah.

32:21 So you could do that like with this thing she's talking about, right?

32:24 Yeah.

32:24 You turn it up and you say turn it into a service.

32:26 Yeah.

32:26 And then it's like a HTTP post an image to it or something.

32:29 And then it says, here's a JSON result.

32:31 Yeah.

32:32 Exactly.

32:32 So we're going to talk about all that in our build talk.

32:35 Yes.

32:36 Oh, yeah.

32:36 Which is now available online at the time of this podcast.

32:39 Yeah.

32:41 We can put a link to it in the show notes actually thanks to the power of time travel and recording

32:46 and all that.

32:47 This portion of Talk Python To Me is sponsored by Backlog from NewLab.

32:52 Developers know the importance of organization and efficiency when it comes to collaborating on a team.

32:57 And Backlog is the perfect collaborative task management software for your team.

33:02 With Backlog, you can create tasks, track bugs, make changes, give feedback, and have team conversations right next to your code.

33:08 You track progress with features like Gantt and Burndown charts.

33:11 You document your processes right alongside your wikis.

33:15 You can integrate with the tools you use every day like Slack, Jira, and Google Sheets.

33:18 You can automatically register issues from email or web form submissions.

33:22 Take your work on the go using their top rated mobile apps available on Android and iOS.

33:27 Try Backlog for your team for free for 30 days using our special URL at talkpython.fm/backlog.

33:33 That's talkpython.fm/backlog.

33:38 You know, one thing that I saw when I was going through your website that I thought was super cool was this LIGO project.

33:45 So, you know, Python has been at the center of a bunch of super cool science stuff, right?

33:51 Like, Kyle Cranmer and his team did a bunch of stuff with Python around the Higgs boson discovery.

33:57 There was Dr. Katie Bowen and the black hole picture.

34:02 And then the LIGO experiment, which is detecting, like, you know, I still just don't have my mind around, like, general relativity.

34:09 But, like, the space-time, you know, with, like, the curvature of gravity and all that.

34:14 And so, the idea is if, like, there's enough of a disturbance in it, it should have a wave, right?

34:21 And, you know, Einstein predicted that.

34:23 But just recently, the LIGO project detected black holes colliding and actually captured that.

34:30 And, you know, that was Azure Notebooks, apparently.

34:33 Yeah, they do use Azure Notebooks.

34:35 I'm not sure if it was used for the specific black hole collision.

34:38 Yeah, yeah, yeah.

34:39 They do use it, yeah.

34:40 I think it's just so cool that Python and Notebooks, in particular, seem to be showing up around all this, like, super cool science.

34:48 I guess, let me ask both of you, you know, when you talk to people who are coming into the Python world and into the notebook type of space, what are they coming from?

34:58 Is that, like, MATLAB or other tools?

35:00 You know, like, where are they coming from?

35:01 I see a lot of users coming from R and RStudio.

35:05 That's kind of the world.

35:07 But they see, like, some of they still, I've seen users who use a mix of those two languages.

35:12 They still feel like R has done something really well and they already have things set up there.

35:17 But they also see, like, all these Python packages that make things easy.

35:21 So they wanted to use that as well.

35:23 So I definitely see a lot of that.

35:25 Yeah, I think what's cool about Python is you do kind of land on it from a lot of different paths.

35:30 So I had a user I was talking to last week who was saying, you know, I feel like they're business analysts and they've been using Excel for so many years.

35:37 And they were like, I feel like Python has become the next thing once you've learned Excel.

35:41 And now, you know, you want to do more advanced things or go from Excel into Python.

35:44 But then you'll talk, I was talking to a researcher a couple of weeks ago who also teaches a course.

35:48 And he was saying their course was in MATLAB two years ago.

35:52 Then they moved to R for a year.

35:54 And now they're standardizing across their department on Python.

35:58 Because it is so, they're like, people would start in MATLAB.

36:01 Then they need R for a different course.

36:03 And it was just too much.

36:04 But they felt like Python would cover all their courses, use cases and make their students happy.

36:09 Yeah, that's pretty interesting.

36:10 I'm sure they have a little bit of whiplash going on.

36:12 Yeah, I know.

36:13 But Python's interesting because it's not just useful for this like focused data analysis.

36:19 Right.

36:20 Like if I go and learn MATLAB and I say I'm getting a degree in math or biology or whatever,

36:26 and then I leave and I'm not still doing research in biology, my MATLAB skills are probably not helpful.

36:32 For the most part, right?

36:34 It's not like, well, I'm going to go and like get recruited to work as a MATLAB person.

36:38 Maybe, but not nearly to the extent that Python skills will like open up the door to parallel careers.

36:45 And so I think whenever I think of students or academic programs, I think, you know,

36:49 it's really good that they're doing something like Python, even if it weren't Python,

36:53 something that is like a general skill independent of like the research or the PhD or whatever, right?

36:59 And I know they've standardized like high school, the AP curriculum in the US has standardized on Python

37:05 for being the introductory language, which I thought was really cool because when I took it, it was Java.

37:09 And I kind of never used Java.

37:11 Right.

37:12 Yeah.

37:13 Yeah.

37:13 Python is definitely what used in so many different areas.

37:17 Like for one, being able to build web apps, that's huge.

37:20 And in the data science area and beyond those two, we also see a lot of just general purpose scripting using Python.

37:28 Yeah.

37:28 And on top of whatever else they're using, right?

37:31 Yeah.

37:32 Yeah.

37:32 And the IoT stuff as well, right?

37:34 Like it's, you got CircuitPython and MicroPython.

37:36 Oh, yeah.

37:36 That's super cool.

37:37 So I think that's pretty cool.

37:38 Now, one thing I did want to ask you about, because I think this is super positive, but I don't really know why it's so different in data science.

37:46 I feel like in data science, the core projects, NumPy, Jupyter, and whatnot, are well-funded, at least compared to, say, Flask for web development or, you know, SQLAlchemy.

37:58 Like you go and you look at NumPy, and it's got like, I don't remember exactly, like Sloan Foundation funding and DARPA funding maybe.

38:07 But there's like these large groups funding these data science projects.

38:10 Why do you think data science, open source, gets so much support like this, whereas a lot of the rest of Python kind of doesn't?

38:18 I mean, it's not like Instagram isn't worth, you know, billions of dollars and based on Python as well.

38:23 Or, you know, there's a hundred examples like that.

38:25 Do you have thought?

38:26 Since I'm relatively new to the area, to the space, I'm not super familiar with those particular projects and how they got funded.

38:34 But I feel like, just make a feel, like the popularity in the space, like interest in the space, just AI and data science in general got a lot of attention, for sure, from all sorts of people.

38:47 And it's not just like, hey, we build this for fun, right?

38:52 But also, we want to take AI, we want to build that into our businesses.

38:56 And that's something serious, that it better make good predictions, right?

38:59 If this is going to impact my business or people's real life, like making predictions for patients and that serious stuff.

39:08 So we need to get it right.

39:09 It better work great.

39:10 All those packages we rely on, right?

39:12 They're fundamental to the prediction and data analysis.

39:16 And that might be why it's easier to get attention and funded because the value to the business, like we see in real life.

39:24 I have a guess for myself as well, but I want to hear your thoughts.

39:27 Yeah.

39:27 I was thinking something similar just in terms of the enterprise use cases.

39:32 And of course, there's all the power of buzzword and data science.

39:36 Is it based on AI?

39:37 We're going to pay for that.

39:38 Yeah, I love that.

39:38 Yes.

39:39 AI infuse everything.

39:41 So I think that likely definitely helps.

39:43 And then like you were referencing a few of those projects where, you know, science and research, which might be more ingrained in grant worlds versus something like Instagram.

39:52 Exactly.

39:53 Like there's already like a grant infrastructure, right?

39:56 Right.

39:56 Like it's super normal to go, I'm going to apply for a $3 million grant for this project.

40:02 Right.

40:02 And it's, I guess, because it lives in that world so much, maybe that's a little bit why.

40:06 Yeah.

40:07 I don't know.

40:07 It's interesting.

40:08 Yeah.

40:08 The other reason I was thinking is the data scientists usually report to like executives.

40:14 That helps too.

40:15 Leverage those one-on-ones.

40:18 Yeah.

40:19 Exactly.

40:19 You know, yeah, we need a little bit of support here.

40:22 Yeah.

40:23 So let's see.

40:24 I got a few other things I want to ask you about before we run out of time here.

40:28 So, you know, not too long ago, GitHub was acquired by Microsoft.

40:31 Like what changes have you seen internally or like in the Azure data science side of things as a result?

40:38 Actually, long before the acquisition, Microsoft internal teams have already started to adopt Git for source control, like in our internal teams.

40:49 That has happened even before this.

40:51 So I would say after the acquisition, there's definitely more use of Git, GitHub.

40:57 And also we see more integrations that we are building with GitHub being a front door to developers.

41:04 And so more integration into like Visual Studio Code, better experience there.

41:09 And we also build new features on GitHub as a result of GitHub being part of Microsoft.

41:15 Yeah, that's cool.

41:16 And you guys did do the virtual file system stuff contribution back to GitHub so that you could actually put Windows in there.

41:22 Because apparently it was broken.

41:24 Catherine, how about you?

41:25 Yeah, it's been super exciting.

41:27 I mean, like Ron was saying, I mean, Git is so ubiquitous that we were using it a lot beforehand.

41:32 But now the acquisition has just opened the door to so many opportunities in terms of product integrations.

41:39 And it kind of has reshifted a lot of our thinking across our different products, particularly in developer division where Ron and I work.

41:45 What should we build into our product first?

41:47 What would be a good integration where we could leverage GitHub for this?

41:51 It's opened up a bunch of new scenarios and hopefully customers will continue to see the value from it.

41:56 Yeah, that's cool.

41:56 Everybody at Microsoft uses Git now.

41:58 Yeah.

41:59 Even as PMs, we write documentation.

42:02 And then we're checking there, have to use Git.

42:06 Microsoft Docs are all on Git now, which is fun.

42:09 Yeah, all our documentations run on Git.

42:11 So that's one single system that we use.

42:13 Yeah, that's super cool.

42:14 So with Azure Notebooks, what's the version control story?

42:18 I know what it is for the Python.

42:20 PY files.

42:21 Just check them into GitHub or wherever.

42:23 Yeah, so for Azure Notebooks right now, you can launch a terminal within Azure Notebooks that'll connect you to the container you're running your project in.

42:33 And then from there, you can use the Git command line as you're used to.

42:36 And we also have two views in Azure Notebooks.

42:39 There's the vanilla classic Jupyter UI.

42:41 And then you can also use the JupyterLab UI and install any Git extensions you want to in either of those ecosystems.

42:48 That's cool.

42:48 Yeah, I hadn't really thought about that.

42:49 I guess it is baked into JupyterLab probably, right?

42:52 Yeah.

42:53 Cool.

42:53 So one thing that I did think was pretty neat is on the Azure Notebook page, you have a bunch of featured projects that are kind of cool.

42:59 So maybe I could talk a little bit about something.

43:02 So there's one that talks about getting started with Azure machine learning.

43:06 That's pretty cool.

43:06 But then the second one was the Python Data Science Handbook by Jake Vanderplass.

43:11 Yes.

43:11 That's cool.

43:12 Yeah.

43:12 Maybe you want to talk about those a little bit?

43:13 Like just how people can find them as well?

43:15 Yeah.

43:16 We feature a few projects that are six or so that we kind of rotate in or out depending on new releases or anything like that.

43:23 Jake's book is usually always there because people love that.

43:26 And we love it.

43:27 It's kind of our, it's such a big book of notebooks.

43:30 Yeah.

43:30 Listeners aren't familiar with it.

43:32 It's like a whole O'Reilly book written all in Jupyter Notebook.

43:35 So it's a massive GitHub repository.

43:37 So we'll use it for like stress testing our systems and stuff because it's so big.

43:41 It's kind of become our de facto standard.

43:43 But we also have things like I think University of Cambridge Introduction to Python course is featured there, which is super cool.

43:49 And it's really great for all of our education scenarios that use Azure notebooks to see that.

43:54 And then it also feeds a lot of online course inspiration as well.

43:58 So we'll see a lot of people coming from different EDU, Coursera courses.

44:01 So we try to feature that content as well.

44:03 That's cool.

44:04 You see people creating like online courses and books and stuff like that.

44:07 Yeah.

44:08 And something like any cloud hosted notebooks makes such a great case for online courses.

44:13 So you don't have to spend 10 minutes telling people how to install things.

44:17 And then you're not there in person to help them with any installation.

44:20 So being able to use something like Azure notebooks where everything's just ready and you can clone this repo and you're good to go.

44:26 It's definitely has been super helpful.

44:28 Yeah, that's cool.

44:29 Definitely when I've done training and stuff, it's like a lot of it is let's make sure everyone can run Python.

44:34 Like just type Python 3 dash capital V.

44:37 Tell me what you get.

44:39 Are you okay?

44:40 Is it a number?

44:41 Is it 3, 4?

44:44 That's a very important first step.

44:46 Right.

44:47 Yeah.

44:48 Yeah.

44:48 So we try to limit the time spent there and maximize time spent on code.

44:51 Yeah.

44:52 That's cool.

44:52 Yeah.

44:53 Did you hear about the announcement of Python shipping with the next version of Windows?

44:57 The next major release in Windows?

44:59 This was Steve Dower talked about this at PyCon.

45:03 And basically, you know, Steve got Python 3.7 in the Windows store.

45:08 Yes.

45:09 And that was cool.

45:09 So if you need to go install it, that would set it up.

45:11 Yeah.

45:12 So apparently now Python is going to come as a little shim in Windows.

45:16 So if you type Python.

45:17 Right.

45:18 Yeah.

45:18 It'll pop up that store thing and say click here to install it and then type it again.

45:22 Yes.

45:22 I know Steve worked on that hard.

45:24 Finally made it happen.

45:26 Yeah.

45:26 That's awesome.

45:27 I really think that's going to be a huge, huge deal.

45:29 I mean.

45:30 Yeah.

45:30 So many people get lost in that first initial hurdle.

45:33 So hopefully it'll help.

45:34 Yeah.

45:35 I mean, it still doesn't solve all the, you know, pip install this.

45:37 Right.

45:38 First step.

45:40 Yeah.

45:40 We're getting closer.

45:41 It's at least there's one less bit of friction there.

45:44 Yeah.

45:44 Definitely.

45:44 Yeah.

45:45 Yeah.

45:45 So let's see.

45:46 I guess, you know, maybe you could each chime in about what was your favorite thing that

45:50 you saw or coolest thing here at the conference?

45:53 I'm super excited about the new terminal that ships in Windows.

45:57 Yeah.

45:57 That sounds pretty cool.

45:58 Yeah.

45:58 I haven't used it myself, but it looks pretty cool.

46:01 And then I already heard of like comments about, hey, I want to switch to Windows now for development

46:09 because now I can do everything on one machine.

46:12 Like it's pretty cool.

46:14 I'm looking forward to start using it.

46:15 Yeah.

46:16 That thing has not changed in a really long time.

46:18 Yeah.

46:19 I know.

46:19 And I'm glad they focused on that.

46:21 Yeah.

46:21 Because command line is the center of developer workflow.

46:25 And we definitely think that's important too.

46:29 Yeah.

46:30 I feel that's actually an interesting point.

46:31 I wonder how much of like people sort of doing a lot of open source other like Microsoft

46:37 outreach to other platforms outside of Windows, bringing people that kind of live in that space,

46:42 which is often more command line terminal driven back end, like sort of going like,

46:47 all right, why is it such a bad experience here on Windows?

46:49 Yeah.

46:49 On the command prompt and like sort of get, I wonder like there's probably some cycle there.

46:53 Oh yeah, absolutely.

46:54 They have definitely went through like multiple iterations.

46:58 Yeah.

46:59 But because it used to work very differently than the other platforms.

47:02 Yeah.

47:03 But now it is important that we get that right and make that better.

47:06 So make Windows a great development environment again.

47:10 Very cool.

47:10 Yeah.

47:11 That's a good one.

47:11 How about you?

47:11 Yeah.

47:12 So I thought the terminal and the WSL, Windows subsystem for Linux announcements were all really

47:17 great.

47:17 I also really like, we talked a bit about Azure machine learning and originally that had their

47:23 own APIs, which they'll still maintain, but they're also introducing compatibility with MLflow,

47:28 which is a Databricks project from the Spark community around managing ML life cycles.

47:34 And I thought that was another cool combination where we're enabling the open source community

47:39 on Azure platforms.

47:40 So I was excited about that as well.

47:42 All right.

47:42 Big things that are like work with that API.

47:44 Yeah.

47:45 Yeah.

47:45 So you can use the MLflow APIs and it'll work perfectly with your Azure machine learning workspace

47:50 and track the model life cycle.

47:52 Very nice.

47:52 Yeah.

47:53 That's really, really cool.

47:54 All right.

47:55 Well, one more question and then I'll ask you the two sort of closing questions, of course.

48:00 So the final question is, I feel like over the last five, six years, Python has been super, super

48:07 popular.

48:07 Have you seen the Stack Overflow article?

48:10 Yeah.

48:10 The incredible growth of Python?

48:12 Yeah.

48:12 It's in one of our slide deck versions.

48:15 So there it has this huge, huge growth and it's really positive.

48:19 It's so fun to be part of that community.

48:21 Oh, yeah.

48:21 Absolutely.

48:22 Super exciting.

48:23 It's super cool.

48:23 I feel like a lot of those, the new folks to that graph, they're not everyone, but a lot

48:30 of them are coming from the data science side of things.

48:33 What do you think about this?

48:34 Yeah, absolutely.

48:34 I've seen an article actually on Stack Overflow last year, analyzing why Python is growing

48:41 so fast.

48:42 I think one of the conclusions that author made was because of AI and machine learning becoming

48:48 more and more popular in just like any businesses.

48:52 Because it used to be a very tech-driven thing that's only high-tech companies are doing that.

48:57 But now we want to infuse AI everywhere in all different businesses.

49:00 And then we see that start to come up.

49:04 A lot of developers want to do something with AI.

49:07 And then Python being a natural choice for doing that job.

49:11 That's why we've seen a lot of Python growth coming definitely from that space.

49:16 And actually, one of the data points he pointed to was the fact how fast Pandas as a data package

49:24 has grown in the past few years.

49:26 as one indication that how much data science work, the workload has been growing in the

49:31 Python world.

49:32 So it's super exciting.

49:33 So like Python is now the number four on the most popular language list.

49:38 Went from like, was it six or seven last year?

49:42 Yeah, it's definitely growing.

49:43 And now it's number four.

49:45 Yeah, we're definitely kind of working on the most very hot area where there's Python and

49:52 AI and the intersection in between.

49:54 Oh yeah, you put those two together.

49:55 Thanks for keeping us employed.

49:56 Yes.

49:57 But yeah, I was like wrong with saying, I also think it's cool because I think you can see

50:02 the power of Python really quickly in a data science scenario, especially with the community

50:07 around it and all of the different packages you can use.

50:10 I think that really shines in data science and ML cases.

50:13 I mean, even just with like visualization libraries, there's so many and there's so many that do

50:17 such like cool, powerful things that I feel like when people are first getting exposed to

50:22 Python through data science, it really shines and shows the power of it.

50:26 Yeah, it's true because you can really quickly generate a graph or a model or something like

50:31 that.

50:31 Yeah, like a 3D plot.

50:32 It's awesome.

50:33 Yeah.

50:34 As opposed to like, let's build a little game, right?

50:36 Like it still takes a while to build tic-tac-toe.

50:38 Right, right, right.

50:39 And then it just looks like a terminal thing.

50:41 You're like, well, it's not so impressive.

50:42 Exactly, yeah.

50:43 Yeah, that's a pretty good point.

50:44 All right, cool.

50:45 Well, it's been a super interesting compare and contrast, the .py versus the Notebooks way

50:52 of working.

50:53 But thanks for both of you for sharing the stories.

50:55 Let me ask you a quick question before we get out of here.

50:57 Although I'm especially wrong, I'm sure I'm going to be able to guess your answer here.

51:01 So are you going to write some Python code?

51:05 What editor do you use?

51:06 We just do the code.

51:07 You guess right.

51:08 Right on.

51:10 I said, if I'm writing some Python code, usually Notebooks, but I used to be a Vimmer,

51:15 and now I use a Vim extension for VS Code when I'm in an editor environment.

51:20 That's cool.

51:21 You kind of brought them together.

51:22 Yeah, I brought them together.

51:23 It's just been great.

51:24 Yeah, it's nice.

51:25 I need my key mapping.

51:27 Yeah, it's all in there.

51:28 So when I can get that with the power of VS Code, I'm happy.

51:32 Then you're happy.

51:33 Awesome.

51:33 And then, you know, there's so many packages out there that people might know about.

51:38 So have you come across one that's like, oh, wow, this is really cool.

51:42 People should check it out.

51:43 Notable PyPI package?

51:45 I haven't like deeply doing this myself, but I looked at like Plotly, which does 3D plots.

51:52 It's one of the first things I tried when I first joined the team.

51:56 And we actually can render that inside VS Code too.

51:59 And you can interact with that plot.

52:00 And I thought that was super cool.

52:02 Oh, yeah.

52:02 That's pretty cool.

52:03 Yeah.

52:03 Plotly is great for graphics.

52:05 Yeah, I'm a sucker for a good visualization library too.

52:09 So yeah, I like Boca.

52:11 I like Plotly.

52:13 Yeah, I think those are probably my favorites.

52:15 I'm trying to think if there's any.

52:16 Yeah, those are really cool.

52:17 I'll throw one out that's notebook related that's not graphical is Paper Mill.

52:22 Have you tried Paper Mill?

52:23 Oh, yeah.

52:23 Yes, Paper Mill is very cool.

52:24 Yeah, kind of turn your notebook into almost like a function you can call or something like that.

52:29 Yeah.

52:29 It's pretty wild.

52:30 All right.

52:30 Well, thank you both for being on the show.

52:32 It's been a lot of fun to talk about it.

52:34 I'm happy to see the work you all are doing in VS Code around sort of editorifying notebooks.

52:40 And also, yeah, good work on the cloud stuff.

52:42 Yeah.

52:42 Thank you.

52:43 Awesome.

52:43 Super fun.

52:44 Thanks for having us.

52:45 Yeah.

52:45 Thanks for being here.

52:46 Bye.

52:46 This has been another episode of Talk Python To Me.

52:49 Our guests on this episode have been Rong Lu and Catherine Kampf.

52:53 And it's been brought to you by Linode and Backlog.

52:56 Linode is your go-to hosting for whatever you're building with Python.

53:00 Get four months free at talkpython.fm/Linode.

53:03 That's L-I-N-O-D-E.

53:05 With Backlog, you can create tasks, track bugs, make changes, give feedback, and have team conversations

53:12 right next to your code.

53:13 Try Backlog for your team for free for 30 days using the special URL talkpython.fm/backlog.

53:20 Want to level up your Python?

53:23 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

53:28 Or if you're looking for something more advanced, check out our new async course that digs into

53:33 all the different types of async programming you can do in Python.

53:36 And of course, if you're interested in more than one of these, be sure to check out our

53:40 Everything Bundle.

53:40 It's like a subscription that never expires.

53:42 Be sure to subscribe to the show.

53:44 Open your favorite podcatcher and search for Python.

53:47 We should be right at the top.

53:48 You can also find the iTunes feed at /itunes, the Google Play feed at /play,

53:53 and the direct RSS feed at /rss on talkpython.fm.

53:57 This is your host, Michael Kennedy.

53:59 Thanks so much for listening.

54:01 I really appreciate it.

54:02 Now get out there and write some Python code.

54:04 I really appreciate it.

54:24 Thank you.