Learn Python with Talk Python's 270 hours of courses

#330: Apache Airflow Open-Source Workflow with Python Transcript

Recorded on Thursday, Aug 5, 2021.

00:00 If you're working with data pipelines, you definitely need to give Apache Airflow a look. This pure Python workflow framework is one of the most popular and capable out there. You create your workflows by writing Python code using clever language operators, and then you can monitor them and even debug them visually once you get them started. So stop writing manual code or Cron job based code to create Data pipelines and check out Airflow. And to do that, we have three great guests from the Airflow community, Jarek Potiuk, Kaxil Naik, and Leah Cole. This is Talk Python To Me, episode 330, recorded August 5 2021.

00:46 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy and keep up with a show and listen to past episodes at 'talkpython.fm' and follow the show on Twitter via '@talkpython'.

01:04 This episode is brought to you by Us over at Talk Python Training, and the transcripts are brought to you by 'AssemblyAI'.

01:10 Welcome to Talk Python to Me. It's good to have you all here.

01:16 Thanks for having us thank you.

01:18 Yeah, it's really fun to be talking about Airflow. These are the types of tools that I think they don't get that much awareness, but they're the kind of thing that can be the real backbone of a lot of teams, a lot of organizations, and so on. So I think that'll be super fun to dive into. And we'll all learn a lot. And I suspect a lot of people listening and will realize, oh, here's a whole class of tools I didn't even realize I should have considered to solve my problem. Before we get down to that, let's start with your stories. Leah you go first.

01:46 How do you get in a programming Python. Python was the first language that I learned and do have a bachelors in computer science. And the school I went to that is the language that intro to CS in taught in.

02:00 I am so jealous.

02:01 My intro to CSclass was in scheme, which is derivative of Lisp, which didn't seem that practical. And then I was told I had to learn FORTRAN. It would be the most useful language I'd ever learned. Neither of which turned out to be true. I wish I learned Python.

02:14 So thanks Carlton College Northfield Minnesota for giving me Python early.

02:20 And yeah so, I loved Python from the beginning. You asked how I got into programming. So I actually do have a parent in tech. It is my dad and he tried to get me into programming a lot earlier. And like a true teen, I said absolutely not because it was suggested by my dad.

02:37 It really wasn't until I got to school. And I heard people say that intro to CS was a fun elective for those only listening that's totally in quotes as I'm saying it that I decided to take it, and I turned out I really liked it. And I decided to pivot from being a math major, which wasn't going very well to being a computer science major.

02:57 That's fantastic. I was also a math major and I find the programming side a lot of the same skill set you have to use, like the thinking through problem solving. You have these constraints or axioms in math and you work from them.

03:11 But in math you just come up with the next idea. That is the next problem. That is the next idea. And in computers, you built stuff that people use, and it's such a difference.

03:20 I find it's puzzles is the programming. And that was always the part of math that I liked. I never liked the writing proofs or the theoretical sign of things. I just wanted to solve puzzles with logic and rules.

03:33 Yeah, fantastic. Well, it sounds like you've landed in the right spot. That's awesome.

03:36 I'm doing okay.

03:38 Jarek, how about you?

03:39 Yes. You talked about your first language. So my first language in computer science during the studies was, I think, Delphi or Pascal. I can't even remember that. But actually, the first language I started programming in the real work was listen to that Cobalt. So I tend to joke that when I'm retiring, I will be very well paid 5 hours a week programmer of Cobalt because nobody's right.

04:08 No, you're gonna keep the trucks delivering and the warehouse is open and exactly.

04:14 You'll be on routine.

04:16 5 hours a week. And. Yeah, that's super cool job, I think. But then Python is actually quite new in my portfolio, let's say, of languages. I've learned it maybe six years ago, and with my experience, years of working in CS, it's relatively late, but I loved it from the first glance. I used to work in C, Java, C++, hundreds of a lot of other languages. That Python was just super easy from the start and super nice too, and super friendly like that. After years of programming in Java, I was like so much, oh, one line, you can do what I would do in five pages of Cobal and you can understand it is.

05:00 Yes.

05:01 So, yeah, I fell in love immediately and my absolutely favorite language right now.

05:08 Same here.

05:09 Kaxil. Yeah. For me, I did my Masters in Electrical Engineering, so didn't do anything over there. But when I came in the UK to do my Masters, we were taught R language and a Java. One fine day, we were ending the College in just a month or two. And there was a presentation from someone in the University who was telling us how to use data science in the industry and everything. They said you should know Python. And you were like, oh, but we were not taught Python and we are just one or two months away from our internships and everything and we don't know Python. So that's when I started looking into Python, I got an internship, and then I actually started learning more of a Python. So this was 2016 I'm talking about. And yeah, since there, it has been a wild, wild ride. I have written a lot of Java, a lot of R, but by Python seems to be very easy to write, easy to understand, plus the community behind it and the packages behind it are so vast that you can use it for anything.

06:09 And basically, to I saw a funny Tshirt once said, I learned Python. It was a great weekend, which I think is really funny.

06:16 Right.

06:17 Because on one hand, yeah, sure, you can go through. And actually, the language is simple, especially if you know something else that's like Java or C++. Oh, this is the breath of fresh air.

06:26 Right.

06:26 But on the other hand, I've been doing this for a long time time all day, every day, and I'm still learning Python every day. Right. So it's really interesting juxtaposition of you can learn the language really easily, but then there's the standard library, and then there's 300000 PyPI packages, like Airflow is just one of them. And that's our whole topic today.

06:43 Right.

06:44 So it's kind of both. Right.

06:45 Yeah. And the language keeps growing, so you got to keep track of the cool new things that are released and things that are true. And Python 2.7 are definitely not true today.

06:55 With Python 3.10, it's grown a lot, definitely has.

06:59 And I saw that Airflow is not supporting the older versions of Python basically as they get deprecated. So. Yeah, for that. Right.

07:06 Yeah. We've actually very strong rules following the Python release rules. So we learned from what Python learns on release schedule, and we just follow it very closely with how much we have when we support. When we stop, we support Python.

07:22 That makes a lot of sense.

07:23 Yeah.

07:24 It's difficult to maintain compatibility between Python two and three. That lot of over head.

07:29 , yes.

07:30 I have nightmares about cherry picking all the stuff from the main branch to the old release branch and adding Python to support.

07:37 But Caxias cannot complain. I mean, we both taxi did that a lot.

07:42 And thanks to that, we've been several times top committers on Apache organization.

07:48 There is like this week most commits made, and that was asked doing charities between three version and two seven versions. And once we made it both at the same time, top committers on Apache.

08:01 Fantastic.

08:02 I am mad at GitHub. The GitHub does not commit on any of the branch except Main or master not fair not fair.

08:10 Well, to do our own visualizations.

08:13 Exactly. Exactly. All right. For me on the live stream, Hawaii girls say Python is awesome, like all of us. Yes definitely thanks for being here. All right. Well, let's start this at a slightly higher conversation than just Airflow. So Airflow is one of these workflow management frameworks. Whoever wants to take this, why do I need that? When do I need that? What are these tools? As I hinted at the beginning.

08:38 I want to walk through some of the history, though, like in 2014, 2015 where like Data Engineering was not mainstream and everyone was just using Cron for scheduling their task. And then there came Luigi where people were using XML and those sort of languages to write their Tags workflows to make sure that the task runs on schedule.

09:03 So Dags directed Acyclic graph.

09:06 Yes, the let's be very clear, you cannot have a circle for tasks. The dependencies cannot be that people have tried that as a long time to finish those. Yes, a little long time.

09:21 Some of the slaves are still running. So

09:22 It takes a long time for them to start sometimes too.

09:27 But yeah, I think to complete history part, people just got bored writing the XML syntax and it's difficult to understand. Similar to what we were talking about, Java in Python, Python is much easier to read, easier to understand. There came Airflow, Maxim who wrote Airflow in his time at Airbnb and open source it to Apache software foundation. And that had a sigh relief for people working on Luigi and others as well, because then you could write your workflows in an easy to understand language that you're already very familiar with. You don't need to write those XML.

10:04 And who loves writing XML? First of all.

10:08 And so it's easy to understand just configuration as code. And there was also I think slight move towards everything else is code infrastructure as code with TerraForm, Ansible, and whatnot else? Airflow was just a perfect tool for workflow as code or Dags as code. And since I think 2016, 2018, Airflow popularity as Skyrocketed with the advent of separate specialized data engineering field, previous lighting software engineers used to do everything. But then people or companies also realize that it's a separate field. It's a lot of work.

10:47 You can also just not include machine learning engineer and let him do everything. It's a separate data engineer job to write a pipeline, knows how to handle the data and everything from start to stress retries that thousands of things which your first of all, your Cron expressions or Cron alone cannot handle. Those the task dependencies, the SLAs and whatnot.

11:09 So I think that's when with the advent of data engineering, people realizing the importance of data, Airflow popularity can massively between 2018, which also, by the way, coincided with when Airflow became the top level project in Apache software foundation. Until then, Airflow as just an incubating project in ASF and then became a top level project. And that was a big Milestone for Airflow and the community.

11:34 I think data engineering is really interesting because a lot of people when they think of what are the divisions of what you do with programming, especially in Python? Well, we've got Web programming to some degree, UI programming. And then we've got data science, sort of Web and data science of the two. But there's this middle ground where I feel like people kind of don't want to go there or that's the data you want to make sure if you get a bunch of data and you feed it to your model, your model is only as good as the data you get. If you're trying to automate some ingest of data or warehousing reporting, it's only as good as the reliability of the data coming in, the accuracy we've got things. Was it great expectations and stuff like that for testing, actually testing against the data, not the code that works with the data.

12:22 Let me just add, because also, Airflow is really an orchestrator. So I used to think in a choir for many years, and for me, this is really like the parallel between the conductor and the team playing. We don't do stuff in airflow. Airflow doesn't do stuff. It just tells others what to do. And there's data processing stuff.

12:42 So we don't know how.

12:44 Basically, we as data engineers because we are actually a data software engineers writing things for data engineers. So when we think about this cross of these both like software engineer and data engineers, so we don't know how to actually make a model machine learning model, or we have no machine learning. We we don't know how to do map reduce if you want to process a lot of data. But we know what to do with the data when it comes.

13:11 Who should do next what and how to pass it somewhere else.

13:15 And we can make it super complex to define or complex in terms of composed of many, many different steps in different formations. But Airflow makes it super easy to manage the whole thing so that it runs smoothly and you can operate it and you can deal with any problems that arise on the go so at this point.

13:35 I want to expand on it real quick. There's very human aspect to the workflow orchestration that I think both Kaxil and Jark have touched on, which is that having a workflow orchestrator really enables you to move from having the data scientist in their silo working on this pipeline alone to having a whole team of data scientists and data engineers working together because you have really specialized folks who can work on building those models. And that might not be the same group of people that's figuring out how to get the data from A to B and making sure that it's healthy and is what the model and the data scientists are expecting. So I think it just enables a lot more collaboration and help you have more specialists working together.

14:22 It becomes that well known, well tested way to flow data down into the specialties that people need. Right?

14:30 Exactly.

14:32 Talk Python to Me is partially supported by our training courses at Talk Python. We run a bunch of web apps and Web APIs. These power the training courses as well as the mobile apps on iOS and Android. If I had to build these from scratch again today, there's no doubt which framework I would use it's 'FastAPI' to me, FastAPI is the embodiment of modern Python and modern APIs. You have beautiful usage of type annotations. You have model binding and validation with Pydantic, and you have first class Async and Await support.

15:02 If you're building or rebuilding a web app, you owe it to yourself to check out our course. Modern APIs with FastAPI over at Talk Python Training and it'll take you from curious to production with FastAPI. To learn more and get started today, just visit 'talkpython.fm/fastapi' or email us at 'sales@talkpython.fm'.

15:22 One of the things you do in these types of frameworks is you build these tasks, give us an idea of what some of the tasks look like. You actually have a whole bunch of would that be the integrations in there.

15:35 Or is that something different providers that in that we are using in Early too

15:40 Yes.

15:40 So we have more than 70 of those right now. We do 70 services. We talk to external services or databases or whatnot?

15:51 70 entities. But within that, we have several hundreds of these So called operators or sensors or transfer operators which perform the task, and they're actually super easy. It's just one method execute that's it, right.

16:07 That's pretty much it.

16:08 Yeah. There's the three things, the sensors, the operators that transfers. Like an example of a sensor might be waiting to see if an object is in S3 or in Google Cloud storage. And a transfer is moving something from A to B and an operator. Those are the ones who probably have the most of a right Jack and castle. And it can be anything in a service I don't know.

16:32 Running. So I work in Google Cloud so the operator I'm most familiar with the Google one. So it's spinning up a Data proc cluster and then running a spark job on it or running something on a Kubernetes pod.

16:46 If you can dream it, either there is an operator for it or you can write an operator for it.

16:51 Yeah.

16:52 When I started with Airflow, like in 2017, we use Airflow for the same reason Airflow was decided for being a classic ETL tool or being an enabler of sort. So a lot of companies are migrating from on premises to cloud. We were doing a project in partnership with Google to move our customers Data to Cloud, and we were using NiFi for Data to be on GCs. But from there, everything was orchestrated by Airflow. So what's the data lands at Google Cloud storage. Then there's classics ETL that extract transform load from GCs. It goes to BigQuery, BigQuery. There's some manipulations, and Data goes to there's a dashboard data studio that shows our rich dashboard behind it. And this is all managed by Airflow. And it was so easy because we separated this using task. And we were using all the Hoops and operators that Lia and Jak were talking about, which is like GCs to GCs operator, move the data from the landing area to stating, so your landing area remains on test. So you can verify with our vendor that the data is as they sent, even in futures. And then there will be query operator to run SQL query. And then there are other operators for different GCS services.

18:11 So I think with Google, there was already a good amount of integrations back three, four years back. Similarly for Spark and other operators.

18:20 One of the things that stands out to me that might be really useful here is if something goes wrong, you talked about the contrast being Cron jobs or something like that, and if something goes wrong with that, or you need to scale out across different machines or whatever. And how do you get those timings right or other weird things? So what's the mechanism for dealing with? I'm going to get some data. It drops in the cloud, I'm going to pull it over. But then maybe it's Invalid data or something. What's that look like?

18:47 So at least for Air flow, all the operators that were written previously, or the idea behind them was the task, a single operator or a single task be idempotent. So even if you run them multiple times, it should produce the same result. So if a task for whatever reason fails, you could add more retries to it. There's a retry parameter that the base class takes, and you could say retries is four retraces five, and airflow will handle that for you. So if a task fails, it will rerun it for that amount of time.

19:16 It could fail as the database servers down. Or it could fail because it's never going to work right. It could be either one.

19:21 Exactly. And you want to be notified as well. So then we had all those on failure callback on success callback. Those emails get sent out saying the data didn't arrive at all. Or whatever the reason.

19:33 There is even more to that because we also have the mechanism of back failing the data. So even in this case, it's not like not a server failure, but your data has improved because you got a new metadata and you want to preprocess the data you've already processed for last week or only process part of the data because it takes a lot of time and you know that the data up to a certain point is good. But then you have to process just a part of your workflow, a part of your bag for the last week. You can do that with Airflow. So you can just tell and make a comment run a comment. Just reprocess with that data for this period of time.

20:12 Starting from this task, because this is where we know we have to reprocess the data because the data has been cleaned up, for example.

20:18 Right. You don't have to detect it. You don't have to copy it down. You've changed that locally and you wanted it to get fixed. I see. Okay.

20:24 And the super cool thing here is that this can be done by one person who doesn't know what those tasks are doing at all. There just all the language of how the tasks are written. The specification is written in the way that anyone can do that. And then this person operating can very safely just rerun parts of it and be sure that what comes out at the end. It's just what they are expecting.

20:46 And he like hundreds and thousands of that's written by tens and twenties or hundreds of people. Just one person can sit down and operate the whole of it without understanding a single thing, how it works inside. But knowing we see what happened, this is so powerful part of our flow.

21:04 It lets you focus on just the steps and not all fit together.

21:08 Right?

21:08 Yeah. Let's focus on a couple of things on the website here that I think are maybe worth calling out.

21:14 One of the things here is that the project has four principles that are really nice. Maybe you want to highlight those for people.

21:23 Yeah.

21:23 I think. Okay. So the four principles that Airflow is dynamic, extensible, elegant and scalable. And I am going to go ahead and pick my favorite one right here. And it's one that we've kind of touched upon without spelling out clearly, which is that Airflow is extensible.

21:40 Yark talks about how we have these 70 plus providers, these various integrations with all kinds of services, from the big cloud providers to things like Slack Snowflake, which are also kind of big to much smaller ones.

21:56 And if a provider doesn't exist, or if an operator doesn't exist for a task that you need to perform, you can write it, and you can either write it and be running it in your instance of Airflow, or if you're being a good steward of open source, you can write it and contribute back to the community. So other people who need to do that task can also benefit from what you've already figured out.

22:18 Yeah, that's really neat. A lot of these would be things like down here, only one person has to write. How do I connect to Hadoop or the if you go to Airflow.Apache.org you can go to the bottom, there's all these different operators, or what are these or the tests?

22:36 Those are integrations to integrations with the different services you have. Like Google, for example, is a big provider, but it consists of integration like Google Cloud KMS the datastore machine learning. So you have a number of integrations per provider, even sometimes.

22:52 Okay.

22:52 Cool.

22:53 And Leo, if I was going to create one of these, if I was going to be a good citizen. And like, oh, I want to create one with AWS Lambda, something like that, right. Does that get contributed back to Airflow? So when I pip install Airflow, does that come with it, or is there some external way to bring in?

23:09 Yes, we do actually have to double check with Yak and Call, because I know we've been messing around with how we do the installs lately.

23:16 So it used to be that Airflow operators were packaged along with Air Flow. And when you did pip install Air Flow, you would get everything. And I think that you do still get a certain number of base operators that are kind of like provider agnostic that come with Airflow. But the way we have things now is that all of these provider based operator sensors, all the all these provider tasks, things are packaged separately, and you add them just like you would any other kind of Python package. So, for example, if you want to install the Google cloud operators, you have that separately. And the advantage of that is that they're released on a separate release schedule and follow versioning that ensures they're compatible with versions of Airflow. And they're very clear about that. And it's a lot easier for Air Flow users to upgrade just the providers package than it is to upgrade the entirety of Airflow, which for folks running in production, that is not always feasible or practical.

24:20 You can actually click on documentation link on this page Michael, and then you will see all of those providers. So you see it the provider packages, and you can see the documentation of that version with different versions.

24:33 We release them very frequently, like every month. We have a bunch of providers released which are adding new functionality, and they are done completely separately. As Leah said or not the same release schedule as Airflow. You can start using them faster.

24:49 This is actually super cool that you can actually always find something there.

24:54 But if you don't, we don't actually force you to go this community. Those are all providers which are developed by community and maintained by the community of Apache Flow under the Apache Software Foundation rules, which is called Apache Way. So the way how Apache releases of software. But if you want, you can actually build your own custom provider. You can build your own custom operators, and you can release them separately, and somebody can install that. And we even have integration points that if people are writing the custom providers, they can use exactly the same feature as the community driven ones. And you can install them as a package, as another Python package completely independent from Airflow. And it just plugs in the UI overflow plug in into the whole framework, and you can start using it. So it's both community fantastic and custom.

25:42 Yeah. You can go either a path, right? That's neat. I think Leah what you're saying about the cadence, the release frequency, and maybe even the degree of seriousness with which you have to apply to these. You might want the main airflow to be treated differently than some edge package or integration.

26:01 Right?

26:02 Yes, definitely.

26:03 There was a proposal for requests, the very popular HTP library to be integrated in the Python to replace Pythons Http layer.

26:14 And the decision of the core devs, I believe, was we don't want to do that to request like it will actually make requests go much slower and only get released once a year, but changes rather than as quickly as it needs to go. Same thing for you. All right.

26:29 That was one of the biggest reasons for us to separate the providers, because when we were releasing 110 to 110 three one and ten four, it meant that all the development was happening in the main or master branch, and we were not releasing from Master branch because we were just releasing the minor or patch versions right now and because the core has to be tested thoroughly. Even if there's a small bug in one of the providers, let's say Google GCS bucket operator or something. It has to wait until the entire code has been tested and released so the cycle can be large. Whereas what everyone was thinking at least the committers and PMC members that providers can be released more frequently, even if it means it can be released. If we find a bug right now, we should fix it and go with the normal ASF releasing way, which is like three days of voting and release it. So it is quicker release rather than waiting for the next month to club it into the core airflow release. Plus that way it's easier to also check the changes that happens, because imagine checking the change lock for 70 odd providers, including their flow core in a single page. It will be a nightmare.

27:37 I do think about a coordination of well, there's some people working on the discord integration and someone's working on the Samba integration, and we're going to do a new release. You've got a kind of feature freeze all that stuff, so it makes a ton of sense.

27:51 A separate game, actually, this is super cool.

27:55 I'm the release manager for providers so far, so I was releasing that, I don't know, maybe six seven releases over the last year, and actually I do it myself in two or 3 hours. I'm able to bring all the change and put the release notes for all the 70 providers is a fully automated and we can manage release that without a worry that it will break something, because if one of those releases goes wrong, providers go wrong. We can yank. The yank this release. This is this fantastic feature of our pipeline. We can yank the release. And this actually happened yesterday. So we discovered that the possibilities quo. We released 2.10 version have incompatibility bag with previous version of Airflow we haven't discovered that during our testing we just a lot of things.

28:39 This one slipped through. But what we've done just yank this release and anyone can use the previous one. When they install Airflow and postvis operators, they will install the latest version. And in the meantime we can just fix the Postgres SQL and release a new version. And that's super cool actually, for maintenance, release and usability and stability of your installation.

29:01 Yeah, that's really good that you can change their alright, so I want to talk about first of all, let's talk about installing. So how do I get Airflow onto my computer?

29:11 It depends on if you want to host a managed version, like cloud composer which I work on before there is one for Amazon MWA. And there's also Astronomer, which is where capsule works, or if you want to do it yourself.

29:24 Yeah.

29:25 In general, though, we at least say that use the Constraints Constraints file. So every time when we delete an Airflow version, we also tag in GitHub. The Constraints file for each of the release constraints file contains the set of known dependencies that we have tested tested Airflow with on the CI because as far as a lot of dependencies. And lastly, before we started using constraints, there were a lot of instances where we just released Air flow. And on one of the dependencies released a breaking change in a minor or a patch version, which means users could install Airflow and to get over it. We came up with this idea of using Constraints file because Airflow is a library as well as an application. So for library users would want the latest versions. Where as for application. You want the stable versions of everything.

30:16 So we came up with this balance of using the Constraints file. So if you check that Air flows version 212, we get the Python version and then we test that Constraints file from GitHub and use that constraint in file because that way we can guarantee that it is reproducible and it will work for sure.

30:35 Yeah, very cool. So if I go to the documentation, there's a couple of options. I can run it locally. I can run it in Docker, run in Astronomer, but looking through the script to set things up here, it looks like there's a couple of steps. So there's a database that does something.

30:52 There's some users who execute the task or you don't want to run as root.

30:58 Most likely. I suspect that's something you all discourage. They're available and there's a web server and there's a scheduler who maybe tell us about that, whoever I'll take it.

31:09 So Airflow is pretty complex in setup because it has multiple components. Depending on the setup, you can talk to a Kubernetes cluster, it can execute the work over there. Or you can have Celery queue system processing your tasks and executing them on distributed workers. Because the scalability part, which was one one of those features of air flow. So you can have multiple workers, multiple machines, even several hundreds of them if you want. And Airflow can be installed using all those capacity. So we have salary workers, we have Kubernetes worker, we have schedule, we have a web server. And putting it together is not as simple as you would think.

31:49 Or actually you can think that it's complex. And it is. However, we've made recently, especially with a lot of effort to make the kind of very simple way of installing, or you just install it and works.

32:03 And also, if you want to scale it to a very complex one, you can also turn on all the knobs, put as many components you want in a way that fits you best.

32:12 So coming back a little bit to this installation manage. We have a Docker image. So that's the only guy I also worked for quite some time together with Caxel and the other maintainers. We iterated and perfected it. So we have a very nice Docker images that can be used to both around Airflow as it is.

32:29 Or build your own custom image, which contains all the new dependencies you want, or all the special packages that you want to install which are needed for you. And then from that, we have Docker Compose, which is kind of a quick start.

32:42 So and this is this running Airflow in Docker.

32:45 This part there when you run in Docker does say the web server running one container in the schedule or in another, or something like that. That's exactly what this component orchestrate it.

32:55 Yeah.

32:55 Okay.

32:55 Yes.

32:56 But it's super easy. It's really quick start. You just download the Docker Compose file, you just run two comments. If it goes a little bit down, then there's like a few comments to run. And then off you go, you have all these components talking together to each other and processing the dags. And you can start playing with that. It's not production ready the Docker one. But then there is the next step. So you have local installation, give Docker Compose, and then I will transfer it to Kaxi because he was working mostly on that.

33:26 So we also have the helm chart that we released the first version of Helm Chart we released in March of this year.

33:33 So that's what we recommend for production uses that uses the official Docker image. So we released a lot of artifacts for Airflow. And again, in the documentation for helm, if you click on documentation again at the top and scroll all the way down, you will see a separate documentation for the helmet.

33:50 Right.

33:51 So go to Helm Chart. Okay. Got it.

33:53 We have version all these documentation separately because they are different artifacts, and all of them have different release cadence and are release separately.

34:02 And Helm chart is what we recommend for users because it comes with all the configurations that we have tested it in production environments.

34:09 Astronomer donated helm chart last year, and we iterated on it a lot of time before we released it. We also me and Jarek, given a presentation in a recently concluded Airflow Summit. So if users are interested in it, we can probably drop a link at the end of this session.

34:26 I guess you all just have the Airflow Summit, right?

34:28 Yes.

34:29 Yeah. I'm a lot to say about this.

34:31 All right.

34:32 Well, you'll tell us community is definitely where the majority of my contributions to Airflow come in to our second and ever an Airflow Summit.

34:42 So far, it's been an annual thing, but I'm always nervous to say annual because I don't want to make promises, but it's looking good like we will have it again. So we had our first summit in 2020. We had originally planned to have it be that's 500 person in person event, and it's going to be in Mountain View. It's how I got involved because we are looking to host it at the Computer history Museum. And I said, oh, that's really close to where I work. I can be your liaison to the location.

35:09 And then, you know, there's a whole pandemic and everything. And we ended up pivoting to a totally virtual event, and it was a great success. We did it in partnership with software guru. They helped us run the summit last year, and we felt that it was such a good success that we did it again this year, and it just finished up in July. We had, I think, more than 10,000 at this point registered attendees from all over the world.

35:37 That's really good for an online conference.

35:39 The second edition, too. We're pretty proud. And we had it live streamed in a bunch of different time zones. So sometimes it is more America's friendly. Sometimes it is more friendly. Sometimes it was more APAC friendly. And we've had all variations of talk. We had ones that were customer use cases. So people who are running Airflow in production or running one of the hosted managed versions of Air Flow and what they're using it for. We had people who are contributors talking about their first time contribution experience and why you shouldn't be scared to contribute to Airflow, because we're really nice. I promise we are or at least we try to be. But and we had more experience. Contributors like Jar and Caxel talk about some of the more complex things that they've been working on over the past year and everything in between.

36:27 And there are so many talks and you have the summit page up right now. Actually, all of the recordings and slides for those presentations that have slides available are up there for you to watch. If you go to Airflow Summit.org there's many, many, many hours of content. I highly encourage you to watch whatever sounds interesting for you.

36:50 Yeah, I think this is great. Like I said, Congratulations on having 10,000 registered to. Yeah, that's pretty amazing. I think there's obviously a big group of people who know that this is like the right. Well, I think there's a lot of people who necessarily don't know for sure. Like, for example, there's on the Airflow GitHub page, it's 23,000 stars that's big time.

37:12 Yeah.

37:12 Django and Flask 50K. So that's a lot of people using this and interested in this and so on.

37:19 I think that the best part about Airflow is the community, and that's why we have the stars, but also why we have the Summit and Caxil. You're going to say something.

37:29 So I was just going to say that if you go by the PyPI stats, we have 3 million downloads a month or something like that, which is insane. I know a good number of those come from CI and automated processes, but hey, all the other packages also have the same thing, so you can at least compare them between packages. Yeah.

37:47 I think we also statement at least.

37:49 Right.

37:49 Exactly.

37:51 And likely I mentioned the biggest part about or be greatest thing about Airflow is its community. If you check the new contributors, I think we are to more than 16 hundred contributors to the Airflow project, which is great. And every day we at least get a few new contributors trying to contribute the project with whatever they can, and they must. Again, through your medium, I would encourage people to go to Airflow website. If you find anything contributed, fix it. If you have some ideas about hooks, operators, anything contributed, and we are there to help you. Not only three of us, that more than 30, 40 committers and PMC members, and there are users helping users in the Airflow selection. We have more than 16,170 members in the Airflow network space as well as cool.

38:41 So I actually want to give a quick plug for an Airflow Summit talk I gave this year that was authored by me and a colleague. It's called You Don't Have to Wait for Someone to Fix It for you. And it is about the kinds of contributions that you can make to Airflow, because there's all those things that Caxel mentioned. But my personal opinion is that one of the best and easiest ways to contribute to Air Flow or any open source project really is to find something that is driving you nuts and to fix that, or at least to articulate really well what's driving you nuts and what needs to change, because a really good issue can be just as good of a contribution as a PR, because you may have just made the foundation for someone else to write a fabulous PR with a really detailed issue.

39:28 And let me answer that as well, because the community is definitely the thing that I love most of others. Well, the people are fantastic here, and we are all of us, all the committers. There are so much into inviting people to come and to join us in order to give back for whatever they got from Airflow, it's a free software. Anyone can use it for free. The giving back is just super nice, but we don't stop talking, only talking about them because if you see scroll down a little bit above you, you would see that we also run a workshop during the Airflow Summit, and this workshop is about contributing to Apache hearphone. This year we have like 20 attendees coming and learning in 3 hours, how to make your first PR, how to communicate, how to be present in the community, how to make the most of it, how to be super helpful to others as well. And then it was like part of it was about coding, but all the rest was all about communication, about speaking to people, about being able to express yourself and all the stuff we just need.

40:38 Who should I ask about this and things like know exactly who you should ask.

40:44 So actually one of my favorite stories. But this year's Airflow as we had a Speaker, I forgot her last name. Her first name is Tatiana, and she's like a principal data engineer at the BBC. And she went to the workshop last year. And this year she was a speaker at the summit. And her talk about how to basically kind of debug when crazy stuff is going wrong in Airflow was fabulous.

41:07 Super. Okay. Yeah. And people can live stream that off those sessions. That's really cool. I mean, Airflow obstructions.

41:16 So that is an example of that workshop working.

41:19 Yeah.

41:19 Yeah.

41:20 Very cool. Yeah.

41:20 I was just saying Airflow Summit is also one of a kind conference. It's not like the normal conferences, mainly because we had the local meetup groups hosting that day of the event. So we had, like the London Meetup Group. We had the Banglore Meetup group, Melbourne, and Warsaw Meetup Group. And we were bringing the community together. So let's say the first day was hosted by the London Meetup Group, which was me, Ash and the folks we were hosting that event for just for the Monday slot. And then on Tuesday, there were other PMC members of the community, members from Japan hosting that, some from Melbourne hosting that. Similarly, those were the slots.

41:59 And someday even we had like some sort of overlap because we were trying to cover the Pacific time zone and the Asian time zones, which was incredible because now you have tons of content for their Airflow users to watch out.

42:13 Also, we had two community days. We started from Thursday. So we had Thursday, all the talks about community, how you can make the contributions and stuff like that Friday we had at the workshop.

42:25 And then from Monday to Friday, there were more about Airflow use cases. And why Airflow 2.0 was a big milestone for the project and what we are planning ahead for Airflow and stuff like that.

42:37 There's a ton of stuff here. I think people could watch for the rest of the year. And study this, and you get a lot out of it, too.

42:43 I do think so we actually even had networking events there Friday, and that was a blast. Actually, the networking this year was like people learn how to use it online. And that was like, well, not maybe as good as good as physical conferences. So I'm looking forward to next year, which hopefully we're going to be partially at least physical event, but it was good enough. And I think that was really cool to talk to those people about all the different things, not only Airflow. So we are not only Airflow and not only Python and not only programming, but also people.

43:19 Yeah. I feel like this is a project that would be easy to contribute to in the sense that if I'm going to say contribute as a newcomer to Django, that's going to be hard, because that's a highly polished single piece of software. And if you're going to make a change that affects millions of people, and it's not easy. Whereas here, if you want to add some kind of integration and it didn't exist before, you're not going to break anybody's code, you don't want to work with a bunch of legacy code. There's a bunch of sort of broad but shallow places people could jump in in and participate.

43:49 Well, and if a new cover does want to come in and really jump into the deep end, we do have this concept called AIP, which stands for Airflow Improvement Proposal, and it couldn't it sets you up to not run into heartbreak. If you open this, what you think is an amazing PR. We're like, oh, no, hold on. We're not ready for that, because it's almost like writing that outline before you write your essay. I know it sounds kind of dry, but what it really is it's an opportunity to fully flesh out this amazing idea you have and share it with the community.

44:21 And the community will give you feedback, and they will be productive about it, because if they're not, they're not aborting by community code of conduct.

44:30 Yeah. I find it very, very unfortunate. I feel really bad if people come into a PR to some project that I have in granted, these are all very small, open source projects, but if they come in, they actually do the work. And the first I know about it is boom. Here's a PR.

44:44 Yeah.

44:44 That's just not in the same Zen of what I'm trying to accomplish with this. And it's going to break the thing that makes it special so I have to reject it.

44:53 Right.

44:53 But you don't want to. I'd be much better to say I have this idea. If I built this, would you want it? Do you want the puppy? Not here's a puppy for Christmas.

45:01 Exactly.

45:02 This is precisely what we are teaching people at those workshops, because it's not obvious if you come from outside, you don't understand that we are not only teaching people about contributing the code, but also how to find yourself there. Like how to be empathetic, how to think about put yourself in our shoes, and on the other hand, how to tell what you want to tell in the way that we will understand it, and sometimes really different world with different people, different backgrounds, different expectations and assumptions.

45:31 So all the communication is the I'm a software engineer. I love to do software engineering, but like 30, 40, 50% of my times, communication is not actually call incoding, and this is cool

45:44 I really do. So this I actually want to call out a really important Apache value that I think that airflow embodies, which is the concept of the importance of community over code.

45:55 And I really feel that the airflow project lives that value.

45:59 But folks in the community really are trying to foster a positive community because they understand that if the airflow community is not healthy, then the airflow code will not live on.

46:12 It doesn't matter.

46:13 It doesn't matter. And if folks have questions about that, I do want to acknowledge that I am the one woman in the room. I am often the one woman in the room when it comes to airflow, and I would love to see that change and have more gender diverse folks come join. And so if you are someone who identifies with that and wants to hear Lea's unfiltered views on the community, feel free to reach out to me in the airflow slack or on my Twitter.

46:42 And like I said, I do think this is a project that if you want to get into open source, it's one that has relatively low barriers, technically speaking.

46:49 Yes.

46:50 Oh yeah. The keynote talk I gave in that summit on Thursday, the first talk. So if you go to the summit page, the first talk where I talk about my journey as well, because I was very afraid of contributing to open source because it feels intimidating at first on the everything will be public. Who knows? If I screwed something else, what would people say?

47:12 So we have my permanent record.

47:16 And I didn't know Python or didn't know it efficiently. So I talk about my journey of how I did it. I talk about ten minutes about that and then how a new user can start contributing to the project. Cause Airflow is relatively still a larger code base. And there are a lot of areas that people can target, because if you try to learn everything at once, it is going to be very difficult.

47:38 We have Helm charts, we have Docker images, we have Scheduler, which is core to airflow, we have Executors, we have CLI, Rest API, and a lot of things like that. So there are a lot of room for people to get expertise in a certain area. And then if you start including all the integrations, then it's a whole piece. You can just add your own integration, be an expert at that and become a contributor, committer EMC member just with debt contributions.

48:04 Well, and in the interest of empathy, I would like to share that. I do not know all of these parts. I think the that I'm most familiar with is the Google provider, and I have never touched the helm chart. And it scares me because I haven't taken the time to learn what it's all about. But the good news is that other community members know, and I know that I can look to them for help when I do need to mess around with it.

48:26 Yeah, that's fantastic.

48:27 That's the beauty of the project, right? If everyone knows everything, then why are we all here? Each one of us knows their part, then that's the community. Otherwise it's not a community to project.

48:37 Yeah, we're getting short on time. I do want to touch on a couple of things that I think we haven't got a chance to touch on that are really important. Let's talk about the user interface, because one of the ways you are positioned to this is you don't want to do this all with just Cron jobs and sort of little scripts that are put together and run on weird random triggers. And one of the real big benefits is you have this really beautiful UI for all sorts of visualization of running workflows and all kinds of stuff. Right. You want to tell us about that?

49:08 I'll do. That the simple version, because I think that Caxin Yark know more about it than me. But I'll tell you the two things I'm most excited about. One of them is that it just got a huge makeover with Airflow 2. So if you're an Airflow user and you haven't upgraded to Airflow 2, if you need one reason alone, it is that the UI is so much prettier and it is much more responsive. And as a former Cron user, I'll say that the best, easiest benefit you get from this is you can just see what's failing. You don't have to dig around and try to figure out what's missing, you know that something went wrong.

49:41 All right.

49:41 Yeah.

49:42 I can Castle that's my I'm off.

49:43 My SoapBox now. Yeah. Basically you have all the information you need, all the historical view in front of you. Like if you want to see which tasks fail historically, you could just check the review. And then this is the graph view where you can see how your task is proceeding. Plus, we now have auto refresh, like he mentioned from 2.0, which is like you don't need to press the refresh button, which was a bit annoying for the Airflow 10x version, which is very good. Your task will continue. You can see the progress that Airfare considering this task. If you click on that task, show you the logs of that task. So everything is very intutive and easy.

50:17 Or monitor for people who are listening and are not watching the live stream. You can go, for example, in the graph, it'll show you all your tasks that you would do. I download this file or run this bash script or whatever, and then it actually shows you how they're working together, and then they're colored. As you progress through this Dag of tasks you can actually visually see. Was this one skipped? Was this one successful? Which one failed? How far are you visually as a graph, which I think is awesome.

50:44 Yeah. And one of the interesting things over there is to understand the dependencies, which was very interesting when I initially started with Airflow. That for a user or for a company to understand what all the tasks they're working on an in single flow. How does the dependency graph work on? If you're depending on the data from a single slide, how does that go to your dashboard to that end to end view? It's an actual pipeline of sorts that you can see.

51:09 And just to add on that. So the visualization of the data flow is super important because with a glance you can see what's going on and you can go to any part of it and focus on that and understand what's going on. However, I will come back to kind of the roots because Airflow doesn't have a way by default, to define those flows visually. You can see them visually, but they are all defined as Python code. And this is like the beauty of it. And that was a very, very deliberate choice. And this is the reason why we are at the Python talks today, because Air flow is all about Python. So this visualization that you see here are really reflection of the code that you wrote as a writer. And it means also that the common language between people using Airflow are different parts of it is Python. This is the common language that we are using, and this makes it so powerful. And the visual part is pretty much an addition, and it's necessary, but it's more kind of result of the Python code.

52:11 Which is being written a lot of workflow systems try to go in reverse, right. They're like here's your draggy droppe set of tasks and options. You drag it all together, then you press go.

52:20 Yes.

52:20 This lets you live at the code level.

52:22 This all breaks at the very moment when you want to have some custom work, because if you are used to the drag and dropping, you will not do coding. You will not call it the kind of has some customization that you will ask someone else to do that in Airflow. This is quite reverse. I mean, everything is part everything. Dependences are Python, the code it service Python. The blocks are Python, but you can also write your own in the same place where you define your Doc, you can write your own custom operator without having to use a black box operator of sorts, and you don't have to leave the box of working on Python while doing that. And this is so powerful. I think this is the way why it is so popular between data engineers all over the world. I think this is like one of the most popular workflow orchestration engine in the world right now. That's it. I don't have hard data on it's. Just a feeling, but I think that's the case.

53:17 I mean, we did have 10000 people at the summit. .

53:22 Yeah, for sure.

53:23 And while it is written in Python, you can use the best operator to run your Java code, for example, or SCALA whatever. So while everything is in Python, you can use it to run any other languages too.

53:36 You can run Docker image to Kubernetes.

53:38 A lot of those workflows are also okay. We have Kubernetes.

53:42 So we run everything in Kubernetes, we run them as Docker containers. And that's the only way you can do that. Airflow can do that as well.

53:49 Problem, whatever there is. Kubernetes pod operator, you can spin off a new Kubernetes pod, run your task. But you can also have a Python code, which is very easy to put together and play with and run locally without all the overhead of building the Docker images and making them available to run you as a task.

54:07 It's so much more extensive and powerful.

54:10 Yeah, that's a very good point. There's a lot of escape hatches to bring in other technologies. That's cool. Let me give people just a super quick sense of what it's like to write code for this Python code. You would say with Dag with Directed Acyclic Graph, and you give it some details, and then you create these various tasks, like a task might be a bash operator or something like that, or like you said, a Kubernetes pod or whatever, and then you just run them. One thing I did want to ask you all about, what is this? T one double arrows into list a T two, T three for the tasks.

54:43 Good question. So you had those tasks matched to variables called T one, T two and T three. And this is how that visualization is to find using the the bit shift operators in Python. So this one would say that T two and T three run after T one and they run in parallel.

55:01 And there are different ways of setting dependency. If you scroll down or just search for setting up dependencies on the right side of the setting up dependency. Yeah, there you go. There are different ways you can set those dependencies between task you could do on it too.

55:19 Can right shift. You can left shift. You can double shift as a transitive type thing instead.

55:25 and setup streaming.

55:26 And the beauty of that, again, is that this is all Python code. Those are custom operators to shift left shifting right shift.

55:34 They are just custom Python operators over read them and you can override them in the task.

55:38 Just like path Lib override / domain combine parts of the path.

55:43 I wouldn't probably recommend that if you don't know that much, but the better thing there is that you can actually grammatically build the tasks and build the relationships. So this is not something that is predefined in one file. In the declarative way is an XML file. Json, this is a Python code, so you can pretty much dynamically build the Dag. So very complex, which we saw the Dags, which were like thousands, thousands of notes built with, like, 200 lines of code, because you could build those tasks, you know, which relationships you want to build. In what way?

56:19 Like for load, it's very hard to have a conditional in a JSON file or.

56:24 No. Yeah, that's the thing or loop actually. Look in JSON file. It's like, no, I mean, there is no way to do that.

56:32 I mean, we do have XSL team, I think. Come on.

56:35 Yeah, please.

56:37 No.

56:37 And so from Airflow 2.0 and up onwards, this is an explicit way of setting dependencies. But from two and onwards, there is also an implicit way of having dependences, which is like, if you say that your best operator takes an input from or another task, then airflow assets dependencies between them implicitly because you are depending on an output of another task. So it knows.

57:00 Yeah.

57:00 That makes a lot of sense. All right. So I think just two really quick things before we wrap it up, we are short on time here.

57:09 One is we talked about the web UI for the stuff we're looking at, but there's also you all describe a rich command line utility to perform complex surgeries on dags. Okay. Why would you perform a surgery on one of these things? And what is this all about? Who wants to take that one?

57:26 I don't know that I've done surgery with the CLI, but I have used the CLI to give me information about my environment to figure out when things are misbehaving. Yeah.

57:36 Okay. It's like for diagnosis and stuff like that.

57:39 Yeah.

57:39 Because we have this one command list Tags, and it also shows you how long the Dags are taking to load. So you can kind of see if one of them is your problem. Dag, if it's taking way longer to load than the rest, that usually means that I made a mistake.

57:54 Yeah.

57:55 That command also gives you the passing time and everything like that. So I can tell you that it took 5 seconds to pass your dag file, which means something is wrong in your dag file. You are probably importing a lot of things or doing some database calls on the top of your file, not inside the objects, so you could find those sort of issues. Also, you could use air flow back fields here, like comand to run all the back filling of data if you cut the data today and if you want to run it for last one year or so.

58:22 But also, what is not in in the document is this. Well, it is mentioned in the documentation. We have also a very powerful and rich and very well written API.

58:34 So we have a stable airflow two that was one of the improvements implemented. So if you go to the Apache Airflow and scroll down on the left, yes.

58:42 This this table. And even though they're below, there was like, stable, REST API.

58:48 Yeah. There you go.

58:49 Got you.

58:50 Okay.

58:51 This API is written in open API standards, which means that all the tools which you can imagine for managing access, for trying out things for testing. The API calls all the beautiful documentation that you see here with examples. This is all automatically generated from our API. This is super cool because you can actually and this is surprising because you said that the UI is fantastic. And yeah, it is. But there are some companies who have their own UI, their own ways of looking at the processing pipelines.

59:22 And many, many we've learned during their airflow. Many of those companies, they actually build their own UI. They don't use Airflow UI at all. They just use the engine to execute it. Right.

59:31 They maybe want to integrate it into some larger thing they already have or something.

59:35 Yeah, exactly. And this API makes it possible. So you can just query which dags you have, which are their relationships, how this all works, which is successful with not. And then you can build beautiful UI or even ugly UI if you want. But the UI that is something that you're used to without looking even at the Airflow UI. And this is also super powerful.

59:55 And this is straight up rest API.

59:59 While Python is awesome if you're not a Python person, but you still want to adopt this. Here's a way to integrate with it. Right.

01:00:05 And we already started creating clients in different languages, like we have a Java client for Airflow built on this API spec, users can create their own APIs for a specific language because one the old uses open API, so you can auto generate clients for different languages.

01:00:23 Yeah. Fantastic. All right. I think that is about time for us. I did want to point out that Astronomer in AWS, but Astronomer, where you work, Excel is a sponsor.

01:00:34 So if you want to run sort of Airflow as a service, that's kind of your job.

01:00:39 Right.

01:00:40 100%. And also Astronomer has their own registry. So if you do on registry Astronomer IO, it makes it very easy to search for builtin providers that are built inside Airflow. Or if users create and maintain their own providers, it is very easy to search step as well. I just posted a link if you want to check out any one comment on that.

01:01:02 Because we also have Google Cloud Composer. So we have Astronomer AWS Cloud Composer, These are big embrace of air flow as a service for us. It's like you can choose either you in on your own, you run it using astronomer which have great expertise in everything because we have a lot of people from astronomer or commuters. Then there are Amazon people. Then there are Google or Amazon offering and Google offering. And you are free to choose whatever you want. Like how you want to run air flow and you can move.

01:01:32 Probably if you decide you need to move.

01:01:34 Yeah.

01:01:35 Just need to do the infrastructure.

01:01:37 The dags will be the same. No matter where you take them. You might have to do a few changes when it comes to off and making sure your keys are up to date. That.

01:01:45 Yeah.

01:01:46 Cool.

01:01:46 All right.

01:01:47 Let's wrap this up with a little bit of future looking.

01:01:50 Just whoever has the right visibility in our group here where things go in the future, people are excited about airflow. What can they look forward to?

01:01:59 There's a really good talk from the Airflow summit that's called Looking Ahead Beyond Airflow 2.0. It is like Ash from Astronomer, Jamal from Google. And I think ash said over and over again is, well, there is no roadmap, but we do always have things going on over to you now.

01:02:18 No promises, but there are lots of that. So, yeah, we pretty much know that the direction we are heading to. So we want Airflow be the orchestrator you want to use for whatever work flows you want to run. That's it. And there are lots of things to happen in order to get there, because we are so specialized on one hand on what we are opening up. But we are on the road to really make it easy to accommodate more use cases, make it easier to make it faster. I serve those cases, which currently cannot be served because of some reasons, historical reasons. Mainly, this is definitely some the direction we are heading to open up even more cases without losing the single focus. Like, we want to be great at scheduling tasks and orchestration. That's it. We don't want to do processing. We don't want to go into this direction. That doesn't make sense for us. We want others to do processing.

01:03:12 And we will do orchestration the best way it's possible in that two immediate things that we're already working on, and we are almost close to merging it on the main branch, making the airflow schedule more powerful. That is, users will have more power than just like expressing it in Cron. Users will also be able to say run it on the third of the third trading day of the month or something like that. Like that level of powerful timetable we want to provide to the users. We call it time tables. We will have Cron timetable. We'll have time Delta timetable of sorts. We are figuring this that out but we'll have that. Plus, as something called defered operators, I mentioned about the sensors which are currently we poke for the API. calls, until it succeeds, we are going to have a new component called Triggerer that will use Python Async library to use resources in a more optimized manager instead of polling.

01:04:11 You just wait for it to happen and then boom off it goes. Yeah. Okay, that sounds cool.

01:04:15 Just one comments to this scheduling because they'll get a great example of one of the cases we want to serve. There is a real astronomer, not company. Real astronomer using Airflow, and he wants us to start dags when there is a sunset and sunrise and when you are astronomer and sending around air. That's a little bit complex. So the whole scheduling is going to be there to implement this astronomer request base. Yeah.

01:04:39 Fantastic. It sounds really useful. All right, well, I think that's it for covering Airflow, but let's quickly wrap up with, I guess, just one of the questions, since we're a little bit over time that I usually ask at the end, I'll ask you about your editor, Eric, if you're going to work on Airflow and other stuff. But what editor you use for Python. On the daily basis

01:04:56 I use IntelliJ ultimate. That's my favorite editor, however, very, very frequently my favorite editor is VI. I mean, I'm old type guy, and VI is always going to have to do something that week somewhere where I don't have the editor started VI either. And I have it in my life. Fingers. I know how to quit VI They it's easy.

01:05:18 I can learn you.

01:05:20 I can teach you. No problem.

01:05:21 Fantastic. Yeah. I love that joke about random strings.

01:05:25 Kaxil for me PyCharm. It's debugging. It's going to the source code. And those intelligent help it gives.

01:05:34 Just a big fan of PyCharm, right?

01:05:36 Leah, I use combination vs code, and I also have soft spot for VM. And then it's going to be fast vs code.

01:05:44 If it's not going to be for a while, let's get down to it. Yeah, right on. Well, thank you all for being here. It's been really great. Final call to action. People want to get started either using Airflow or contributing to Airflow. What do you tell them?

01:05:58 I tell them to go to the community each on the Airflow website, and they tell them to sign up for the DAP list and to join the Airflow slack.

01:06:07 Fantastic.

01:06:08 All right.

01:06:08 Well, thanks again. Thanks for being here.

01:06:10 Thank you so much in it was a great time.

01:06:13 Thank you.

01:06:14 Thanks.

01:06:14 Bye bye.

01:06:15 This has been another episode of Talk Python to me.

01:06:19 Our guests in this episode were Jarek Potiuk, Kaxil Naik, and Leah Cole. And it's been brought to you by Us over at Talk Python training, as well as the transcripts have been brought to you by 'AssemblyAI, do you need a great automatic speech-to-text API?

01:06:33 Get human level accuracy in just a few lines of code.

01:06:36 Visit 'talkpython.fm/assemblyAI'. Want you level up your Python, we have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and Async.

01:06:50 And best of all.

01:06:51 There's not a subscription in site. Check it out for yourself at 'training.talkpython.fm'.

01:06:55 Be sure to subscribe to the show, open your favorite podcast app, and search for Python. We should be right at the top. You can also find the itunes feed at /itunes, the Google Play feed at /Play, and the Direct RSS feed at /rss on Talk Python FM. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at 'talkpython.fm/youtube'.

01:07:22 This is your host, Michael Kennedy. Thanks so much for listening.

01:07:25 I really appreciate it.

01:07:26 Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon