Machine Learning at the Large Hadron Collider

Episode #144, published Tue, Dec 26, 2017, recorded Thu, Dec 14, 2017

Episode Deep Dive Links Transcript

We all know Python is becoming increasingly important in both science and machine learning. This week we journey to the very forefront of Physics.

You will meet Michela Paganini, Michael Kagan, and Matthew Feickert. They all work at the Large Hadron Collider and are using Python and machine learning to help make the next major discovery in Physics.

Episode Deep Dive

Guests Introduction and Background

Dr. Michael Kagan

A research scientist at SLAC (Stanford Linear Accelerator Center).
Leads and contributes to collaborative groups within the ATLAS experiment, focusing on classification algorithms (machine learning) for particle identification.
Works with a broad team of postdocs and students researching how to apply next-generation machine learning techniques to data analysis at the Large Hadron Collider (LHC).

Michaela Paganini

A PhD student affiliated with CERN, specifically working with the ATLAS collaboration.
Specializes in building and training neural networks, particularly generative adversarial networks (GANs), to speed up computationally intensive simulations.
Emphasizes algorithmic thinking to improve particle identification, such as spotting bottom quarks (b-quarks) or Higgs-boson decay signatures in LHC data.

Matthew Feikert

A PhD candidate at Southern Methodist University, stationed at CERN.
Contributes to ATLAS by developing analyses for Higgs boson measurements, leveraging tools like Python, C++, and advanced machine learning libraries.
Works on the crucial “trigger” system to help filter the billions of collisions into manageable subsets for data analysis.

What to Know If You're New to Python

If this is your first time hearing about Python's role in data science and machine learning at a large scale, here are a few essentials to keep in mind:

Python seamlessly integrates with powerful numeric libraries like NumPy and scikit-learn, enabling scientists to efficiently manage and analyze huge datasets.
Tools like Keras and TensorFlow make it straightforward to build neural networks that simplify complex physics tasks such as particle identification.
The openness of Python's ecosystem (PyPI packages, open-source frameworks) helps researchers at places like CERN share, modify, and collaboratively improve their code.
Jupyter notebooks often serve as an interactive hub for exploring data, experimenting with models, and visualizing particle collisions.

Key Points and Takeaways

Machine Learning’s Critical Role at the Large Hadron Collider The ATLAS team uses machine learning to sift through enormous volumes of collision data (petabytes per year). Neural networks help classify events, detect rare particles, and accelerate simulations.
- Links and Tools
GANs for Accelerated Simulation Generative Adversarial Networks (GANs) can approximate and replace parts of computationally heavy physics simulators, leading to 100,000x speed-ups in certain workflows. This lets physicists produce more simulation data in less time.
- Links and Tools
  - H5Py for reading HDF5 data in Python
  - NumPy for array operations
Open Source and Community Collaboration Researchers emphasized that an open ecosystem, Python’s libraries, frameworks, and the collaborative structure at CERN, makes it easier to share code and results. Proprietary software can hinder this free exchange of ideas and solutions.
- Links and Tools
  - GitHub for version control and open collaboration
ATLAS and the Other LHC Experiments ATLAS is one of the four main experiments at the LHC, along with CMS, ALICE, and LHCb. ATLAS is a “general-purpose” detector capturing broad signatures of particle interactions to explore new and existing physics phenomena.
- Links and Tools
Enormous Data and the Trigger System The LHC collides protons at a rate of around 40 million times a second, producing far more data than can be stored. A multi-level trigger system reduces it to about 1,000 interesting events per second for permanent recording.
- Links and Tools
  - FPGAs (Field Programmable Gate Arrays) used for ultra-fast event selection
  - C++ frameworks for real-time data handling
Python + C++ Integration Although the core physics frameworks at CERN often rely on C++ for speed, Python provides user-friendly interfaces, scripting, data exploration (especially in Jupyter), and advanced ML libraries. Many teams bind these together to get the best of both worlds.
- Links and Tools
  - Jupyter Notebooks: jupyter.org
  - SciPy and scikit-learn
Rare Events and Training Data Even for extremely rare particles (like the Higgs boson), machine learning is possible thanks to decays into well-known particles (e.g., electrons, quarks). These “building block” signatures can train ML models that later detect rare, combined signatures.
- Links and Tools
  - Matplotlib for visualization
  - Python imaging techniques (e.g., convolutional networks to analyze calorimeter “images”)
Future Directions in High-Energy Physics Unanswered questions, like integrating gravity into the quantum framework, searching for supersymmetry or hidden particles, fuel continuous data collection at higher energies. As the LHC upgrades for higher luminosities, advanced ML ensures no crucial details are missed.
- Links and Tools
  - Particle Fever Documentary
  - We Have No Idea book by Daniel Whiteson & Jorge Cham
Careers at the Intersection of Python and Physics Non-physicist developers can meaningfully contribute through software engineering, data analysis, and algorithmic optimizations. CERN encourages experts in Python to get involved in open-source projects and HPC pipelines.
- Links and Tools
  - CERN official website with open positions and contact info
  - Software Carpentry
Collaboration and Public Engagement CERN fosters an open environment with international teams sharing knowledge. Students, researchers, and software enthusiasts are encouraged to join tours or explore the open data to learn how large-scale scientific computing takes place.

Links and Tools
- CERN Tours for scheduling visits
- LHCB Starter Kit as an example initiative

Interesting Quotes and Stories

"I definitely learned on the job. I think that's something that a lot of people could relate to." -- Michaela Paganini

"You can’t record all that data. So we have a set of systems called the trigger, which allows us to go from 40 million collisions down to about a kilohertz." -- Michael Kagan

"I joined as an undergrad, and my advisor said, 'Do you want to work at the Large Hadron Collider?' and sat me in front of a Linux terminal. That's when everything clicked." -- Matthew Feikert

Key Definitions and Terms

ATLAS: A general-purpose detector at the Large Hadron Collider designed to see a wide range of physics events.
Trigger System: A real-time filtering setup that decides which collision events to keep for deeper analysis.
Calorimeter (Calo): A detector component measuring energy deposited by particles passing through.
Generative Adversarial Networks (GANs): A pair of neural networks contesting each other to generate realistic data, used here to emulate detector responses.
Higgs Boson: A fundamental particle associated with the Higgs field, discovered at the LHC in 2012; gives other particles mass.
Luminosity: A measure of the number of potential collisions in a particle accelerator over a given period.

Learning Resources

Python for Absolute Beginners training.talkpython.fm/courses/explore_beginners/python-for-absolute-beginners?utm_source=talkpythondeepdive Great if you’re starting from scratch or need a solid foundation in Python.
Data Science Jumpstart with 10 Projects training.talkpython.fm/courses/data-science-jumpstart-with-10-projects?utm_source=talkpythondeepdive Learn practical machine learning and data manipulation by creating real-world data projects, helpful for scientific computing tasks.
CERN Open Data Portal opendata.cern.ch Explore real physics datasets that anyone can access to practice or do their own analysis.

Overall Takeaway

Python has a vital place in cutting-edge science at the Large Hadron Collider. By combining intuitive tools like Keras, TensorFlow, and Jupyter with powerful C++ backends, researchers achieve rapid data analyses and groundbreaking simulations. This synergy helps unlock the mysteries of particle physics, revealing rare events like the Higgs boson and laying groundwork for new physics discoveries. Above all, an open-source culture, and the willingness of physicists and software developers to collaborate, keeps the LHC community innovating at the forefront of both science and technology.

Links from the show

Michela on Twitter: @WonderMicky
Michael on Twitter: @Michael_A_Kagan
Matthew on Twitter: @HEPfeickert

LHCb’s Starterkit: lhcb.github.io/starterkit

Packages
Keras: keras.io
Scikit-optimize: scikit-optimize.github.io
Scikit-HEP: scikit-hep.org
PyTorch: pytorch.org

Movie trailer: Particle Fever: youtube.com/watch?v=Rikc7foqvR
Video: Processing LHC Data: youtube.com/watch?v=jDC3-QSiLB4

Books:
Present at the Creation: Discovering the Higgs Boson: amzn.to/2jUYErc
We Have No Idea: A Guide to the Unknown Universe: amzn.to/2Bohl09
Episode #144 deep-dive: talkpython.fm/144
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #144 deep-dive: talkpython.fm/144

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 We all know Python is becoming increasingly important in both science and machine learning.

00:04 This week, we journey to the very forefront of physics.

00:07 You'll meet Michaela Paganini, Michael Kagan, and Matthew Feikert.

00:11 They all work at the Large Hadron Collider and are using Python and machine learning

00:15 to help make the next major discovery in physics.

00:18 Join us this week on Talk Python To Me, episode 144, recorded December 14, 2017.

00:24 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:43 ecosystem, and the personalities.

00:45 This is your host, Michael Kennedy.

00:47 Follow me on Twitter, where I'm @mkennedy.

00:49 Keep up with the show and listen to past episodes at talkpython.fm.

00:53 And follow the show on Twitter via at Talk Python.

00:55 This episode is brought to you by Linode and Talk Python Training.

00:59 Be sure to check out what the offers are for both of these segments.

01:03 It really helps support the show.

01:04 Hey, everyone.

01:06 Before we get to the interview, I want to share a quick update about our Python courses with

01:09 you.

01:09 Do you work on a software team that needs training and could really use a chance to level up their

01:14 Python?

01:14 Maybe your entire company is looking to become more proficient.

01:17 We have special offers that make our courses here at Talk Python the best option for everyone

01:22 you work with.

01:23 Our courses don't require an ongoing subscription like so many corporate training options do.

01:27 And they're roughly priced about the same as a book.

01:31 We're here to help you succeed.

01:33 Send us a note at sales at talkpython.fm to start a conversation.

01:36 Now, let's get to the interview.

01:38 Michael, Michaela, Matthew, welcome to Talk Python.

01:42 Hi, everyone.

01:43 Thanks for having me.

01:44 Yeah, it's an honor to be on the show.

01:46 Thanks so much for inviting us.

01:47 Yeah, you bet.

01:48 I'm really, it's, you know, the honor is mine, to be honest.

01:51 You guys are doing amazing stuff.

01:52 You are pushing the boundaries of science.

01:54 And I'm just really excited that you're here to share what you're working on and how that

01:59 intersects with the Python space with my listeners.

02:01 So we're going to talk about large Hadron Collider, particle physics, and machine learning and how

02:07 those three amazing things go together.

02:09 But before we get into them, let's just start real quickly with how you guys got into programming

02:15 in Python.

02:16 Michaela, you want to go first?

02:17 Yeah.

02:17 So actually, when I started programming, it certainly wasn't Python for me at first.

02:21 It was IDL and MATLAB and some undergraduate physics and astronomy labs.

02:26 But then when I got to CERN and I started my career as a graduate student in Atlas, that's

02:31 when I first encountered Python and C++.

02:34 And so just like everybody else, I guess I learned on the job.

02:38 And I'm really thankful that I had great mentors that really taught me what good code and bad

02:42 code looked like, such that I could start writing good code instead.

02:46 Yeah, I feel like so much of what people do in programming really is learning on the job.

02:52 Even for people who have degrees in computer science, you study one thing, but then you go

02:57 and you actually build something different.

02:58 So I think, you know, it's probably not that different, actually, than what most people went

03:03 through.

03:03 Yeah, absolutely.

03:04 I think that's something that a lot of people could relate to.

03:06 Yeah, nice.

03:07 And IDL, that's the astrophysics type language.

03:10 Is that right?

03:10 That's correct.

03:11 That's correct.

03:12 Yeah, I started as an astronomer in undergraduate.

03:15 And so that was my first encounter, I would say, with real coding.

03:18 I see.

03:19 So you started out with really tremendously large things.

03:21 You're like, no, no, let's look at the really small stuff instead.

03:23 Yes, it was quite a transition.

03:25 All right.

03:26 Michael Kagan, how about you?

03:27 Yeah.

03:28 So I guess my story is pretty similar to Michaela's.

03:32 But so I took a few classes in undergraduate.

03:36 And actually, the first language I really used on the job was Fortran, because a lot of the

03:40 old physics simulation and code for doing calculations is built in Fortran.

03:45 So that was kind of my first real projects.

03:48 And then once I got to graduate school, everything on the modern high energy physics experiments

03:54 was C++.

03:54 And that's where I kind of learned on the job.

03:56 And then Python is kind of used there a bit.

03:59 And so I was beginning to learn Python through some of the glue scripting that's done there.

04:05 And then once I kind of got into more machine learning stuff and realized how many great

04:09 tools there were, then I dove a lot more into using Python.

04:12 Yeah, that's cool.

04:12 At CERN, a lot of the actual data processing is done in C++.

04:18 But then the consumption of that is done in Python, if I understand it right.

04:22 Yeah, that's right.

04:22 So basically, the enormous kind of database of code that kind of crunches on the data is

04:28 all in C++.

04:29 And then there's a wide range of different Python scripts that help guide the set of code

04:36 that helps crunch on data from one type of experimental apparatus and then from another

04:41 piece of experimental apparatus and then put that information together.

04:44 Nice.

04:44 Matthew, how about you?

04:46 Yeah.

04:46 So similarly, the first time I really did any programming was in a CS 101 course at university

04:52 where I learned a little bit of C and some MATLAB.

04:55 And nothing was really kind of clicking then.

04:58 But then I joined an undergraduate.

05:01 Well, as an undergraduate, I joined the research group at the University of Illinois.

05:05 And my advisor at the time was like, hey, do you want to do this stuff?

05:08 And it seemed really exciting.

05:09 He was telling me about his work at the Large Hadron Collider.

05:12 So I said, sure.

05:13 And he was like, great.

05:14 And he sent me down in front of a Linux terminal.

05:16 And that's when I first got introduced to Bash and C++ and things like that.

05:21 Then actually, I was starting to get to see how I could use programming to actually solve

05:25 problems in analysis.

05:26 That's kind of when I had this aha moment and it started to click.

05:30 But I really didn't start using Python much until I got to grad school.

05:34 And yeah, as Michael said, once I started to get a bit more introduced to how the physics

05:43 community is using machine learning, then Python.

05:46 And it's a really great ecosystem for machine learning tools.

05:51 That kind of became a really obvious choice for me to start to hone my Python skills.

05:57 But for me, pretty much all of my programming was just learned on the job.

06:01 That's really cool.

06:01 I think one of the things that really draws people in once they get started with Python

06:06 is all the extra libraries that you could use.

06:09 Oh, you can just pip install TensorFlow.

06:11 You can just do this, just do that.

06:13 And it's like, wait a minute.

06:14 It's just, it's all right here.

06:16 That's been fantastic, especially with these like very, very quickly growing set of libraries

06:20 and tools.

06:21 And even in the machine learning domain, there's a new package pretty frequently that you want

06:25 to try out.

06:26 And yeah, pip and Kanda and these kinds of these package managers that make this so quick

06:31 is really much faster development time than a lot of the stuff we're doing in C++, where

06:37 a new package comes out.

06:38 Given the code base is enormous, can take a relatively long amount of time just to compile

06:43 any changes you want to make.

06:44 Yeah, that's for sure.

06:46 So it's really, I think it's a really, I think of it as one of Python's superpowers.

06:50 It's really cool.

06:52 Michaela, one of the things I wanted to ask you about is it feels like this open source

06:56 aspect of Python fits really well with the sort of open science.

07:01 You know, this research is for public consumption.

07:04 A lot of stuff you can do here seems like it's much better done with an open source set of

07:10 software than say like MATLAB and proprietary paid add-ons or whatever.

07:14 Is that for Michael or Michaela?

07:16 Michaela.

07:17 Yeah, Michaela.

07:17 Oh, hi.

07:18 Sorry.

07:18 Yes.

07:19 No, I totally agree with you in the sense that it's a lot easier for us being a large

07:25 collaboration to be able to share tools and be able to collaborate across so many different

07:32 domains, even within physics, using these libraries that one can very simply, as you both

07:37 said, install in their environment, as opposed to perhaps using something closed source, which

07:42 then needs to be distributed correctly.

07:44 And so I totally agree that it fits very well with the design that at least I have in

07:49 mind for what our workflows should look like.

07:53 And I think a lot of the other people on this podcast would agree with me.

07:56 Yeah, very cool.

07:56 I was going to say, I would just add, I mean, and this is something like even the experiments

08:00 take quite seriously, even, you know, kind of within any language, even if there's proprietary

08:05 software that could be quite useful, we're very, very hesitant about, you know, kind of engaging

08:09 with it because of our inability to look inside the box and our inability to really know that

08:13 it's doing what we want to do.

08:14 So, which is why even within the experiment, we end up writing a lot of our own code, even

08:19 if there's a proprietary solution, just because it's not really adequate for us to be able

08:23 to do the research we need to do.

08:24 That makes a lot of sense.

08:25 Is that new?

08:25 You know, is that something like 15 years ago would have people said the same thing?

08:30 Yeah, absolutely.

08:31 Because a lot of these decisions that I'm even thinking about are about packages from

08:35 15 years ago at the beginning of the experiment where there's even a nice neural network package

08:39 that from 10 years ago that was decided not to really be used because it was proprietary

08:43 at the time.

08:44 Okay.

08:44 Yeah, very interesting.

08:46 I would say that those decisions were even more important back then than they are right now.

08:51 I think these days, like we see more adoption of standard tools, still open source tools, but

08:57 a lot of the standard tools from industry, whereas I would say that back in the days, maybe

09:02 like decades ago at CERN, we would tend to really customize every single piece of code

09:06 and write it ourselves.

09:09 I guess you're probably right.

09:10 That's an interesting point because today we're swimming in open source software and these ideas

09:15 and it kind of seems more accepted.

09:16 So yeah, it's even more important to have that sort of early when it wasn't so obvious.

09:21 Nice.

09:22 So let's talk about what you guys each do day today.

09:25 You all are doing such amazing stuff.

09:27 You're all involved in Atlas to some degree.

09:31 And we'll talk about the Large Hadron Collider a little bit, but maybe you could just touch

09:34 on what that is.

09:35 So maybe Matthew, let's start with you.

09:36 Yeah, sure.

09:37 So I'm a graduate student at Southern Methodist University in the United States, but I'm stationed

09:44 over at CERN.

09:44 And so like you said, I work on Atlas.

09:46 So as a graduate student, in some sense, you could say like one of my main responsibilities

09:51 is to make plots.

09:53 But I mean, what I mean by that is that I'm both like learning how to do analyses and then

09:58 I'm actually one of the people who's kind of going in and writing the code.

10:02 And so I do work on a specific analysis that's looking at trying to measure specific properties

10:09 of the Higgs boson given a specific decay channel that it might have or that it does have.

10:16 And then the way that I'm kind of doing this right now is actually by using Jupyter notebooks.

10:22 So I can actually go in and use some of the great Python tools like Keres to be able to

10:30 interact with the data and actually write some neural networks and actually try and do some

10:36 exploratory data analysis.

10:38 So in addition to that, I also do some operations work on Atlas.

10:42 So I work on something that's called our trigger system.

10:44 We can talk a bit more about that if there's time.

10:47 Yeah, the trigger system is really amazing and critical as well, right?

10:51 Yeah.

10:52 We'll definitely talk about that.

10:53 Yeah.

10:54 So there I'm basically just writing a combination of C++ and Python to try and do optimization studies

11:01 and provide support for the trigger system.

11:03 Okay, cool.

11:04 How about you, Michaela?

11:05 What do you do day to day?

11:06 What are you doing on all these experiments?

11:08 Yeah, okay.

11:08 So, well, these days I always joke about this, but what I spend most of my time on is preparing

11:14 talks and posters for conferences and workshops and interviews and all of that.

11:18 But I guess my average day as a PhD student in Atlas consists primarily of three things,

11:23 I would say.

11:23 First of all, training neural networks.

11:25 And then while those are training, I contribute to the experiments or my analysis code base.

11:31 I read lots of papers on the archive, but I would say certainly roughly 90% of my time

11:36 is spent on coding and documenting the code.

11:39 And I'm also involved in various reconstruction and analysis groups.

11:43 And my goal personally is to bring better algorithmic thinking to the table to help identify bottom

11:50 quarks or pairs of Higgs bosons in my case or any other particle that we might be interested

11:55 in at the Large Hadron Collider.

11:57 I can go into more detail if you want about some of the work that I'm doing with neural networks.

12:02 Specifically, I'm working on speeding up a part of our simulation that is very competentially

12:08 intensive.

12:08 And my idea is to use generative adversarial networks to have a higher accuracy, but at the

12:16 same time, a faster simulator that is powered by deep learning.

12:19 That is really awesome.

12:20 And I do want to ask you more about that.

12:22 But one question that came to mind while you're describing that is, you say you spend 90% of

12:26 your time writing code.

12:28 When you got into physics, did you think, and you thought about, what am I going to do as

12:33 a physicist once I'm done with the books?

12:35 Like, is that what you actually saw yourself doing?

12:37 Absolutely not.

12:38 I was not ready for that at first.

12:41 It came as a surprise.

12:42 It was a pleasant surprise, I have to say.

12:44 It turned out that I love coding a lot more, perhaps, than I like more traditional ideas

12:49 of physics.

12:50 But it was a transition, let's say, because not much of our coursework prepares us for what

12:58 the reality of the work of a graduate student in an experiment like Atlas is really like.

13:02 And as I said, the majority of it is coding.

13:04 Yeah, for sure.

13:04 I didn't take any graduate classes in physics, but I took a number of high-level ones.

13:08 And I don't remember doing hardly any coding for them.

13:11 It was all pen, paper, you know, prove this, prove that.

13:15 A lot of equations, not much actual software.

13:18 I would certainly advocate that some of the curriculums should be probably updated to reflect

13:24 what the real life of a graduate student in experiments in high-energy physics looks like.

13:28 Yeah, that makes a lot of sense.

13:31 All right, Michael, how about you?

13:32 I'm a research scientist at Slack without the K. So that's the Stanford Linear Accelerator Center.

13:38 Not the chat thing, but the really fast thing at Stanford, right?

13:41 Exactly. Yeah.

13:42 So it's a linear accelerator center.

13:44 It's a kind of one of the DOE, Department of Energy National Laboratories, that's run by

13:49 Stanford.

13:49 And actually, most of what they...

13:52 This used to be a high-energy physics lab, primarily.

13:55 And that's kind of been turned into basically a big X-ray laser, a free electronic laser.

13:59 But anyways, I don't really work on that stuff.

14:01 I'm part of the team that works on Atlas.

14:03 And so kind of as a research scientist, you know, we work in Atlas, which is this 3,000-person

14:10 collaboration.

14:11 And, you know, each of us kind of work for our own institutions.

14:14 And so one aspect of that is with 3,000 people trying to, you know, run, improve and run a

14:20 large piece of equipment and then do all this data analysis is we have to organize.

14:25 So I found myself now in kind of a phase of my career where, you know, I'm more and more

14:31 a part of that organization.

14:32 So I'm helping to run one of the groups which looks at a certain kind of particle, which we

14:38 might find in our detectors, which is called a bottom quark.

14:40 And we basically develop algorithms to find those particles in the detector.

14:46 And so once we find them, we can then give those algorithms out, supply them to any other

14:52 analyzer on the experiment who wants to, you know, look at some data and find out, you

14:56 know, in a given collision, how many bottom quarks were there in that collision.

14:59 So that's, I spend a lot of my time kind of running that group, which is kind of maybe 50

15:04 or 60 people kind of organized together, working together to get that moving forward and working.

15:09 And then, you know, with the free time that I have left, you know, working with kind of

15:15 postdocs and grad students on data analysis.

15:17 I mean, I've worked both with Michaela and Matthew on various data analysis projects.

15:21 And then also on exploring how we might take some new ideas in machine learning or even

15:27 develop some when needed to solve some of our specific tasks.

15:30 Yeah, that sounds really, really interesting.

15:32 And you're using some machine learning and those types of algorithms to create these techniques

15:39 for discovering these bottom quarks?

15:41 Yeah, absolutely.

15:42 So kind of machine learning, you know, machine learning is, you know, has been around for a

15:46 while.

15:47 And especially, you know, so we've had algorithms that that kind of work.

15:51 And I guess this is probably a pretty, you know, still a very common and powerful paradigm,

15:55 which is we look at our data.

15:57 And, you know, based on our domain knowledge, we can we can develop all sorts of features

16:01 or all sorts of use all sorts of algorithms to say, OK, well, this looks like the properties

16:07 of a bottom quark.

16:08 And we can compute all these features and then run them, train machine learning algorithms to,

16:11 you know, for instance, classify whether this set of data really was a bottom quark or not.

16:16 And so that's been around for a while.

16:18 And one of the things we're also working on then is saying, OK, well, let's if we can take

16:21 a step back and look at this data and see if, you know, if we can think about it in different

16:26 ways, sometimes that maps on to problems like problems in vision or even natural language

16:31 processing, where we can then start to use some of these super modern techniques, you know,

16:36 coming from things like deep learning so that, you know, we can even improve our classification

16:40 or in some cases we're doing regression or even generative type problems.

16:45 So there's a lot of work.

16:47 Yeah, you basically are working.

16:49 Yeah, you're working with a huge camera, basically.

16:51 Exactly.

16:52 With Atlas.

16:53 Yeah, it's it's it is a keep thinking of it in meters, but it's like a 75, 75 foot tall and

16:59 100 and 120 foot long detector that sits 100 meters underground.

17:04 And it's built of like many different kinds of cameras.

17:07 And so we have to kind of take that and these different kinds of cameras detect different

17:11 kinds of particles.

17:13 And we take all that information together to build a picture of what happened every time

17:16 we collide.

17:17 Yeah.

17:17 So maybe that's a great place to segue into just like a really quick summary of this particle

17:23 physics stuff.

17:24 So Matthew, maybe we'll start let you start this off.

17:27 You know, we were told I was told in, I don't know, fifth grade or something that atoms, that's

17:34 the smallest stuff that everything is made of.

17:36 Right.

17:36 But, you know, not so much.

17:38 Right.

17:38 Tell us about it.

17:39 Yeah.

17:39 So we I think everyone kind of has this idea going through like schooling that, you know,

17:44 you have the periodic table where you have like your atoms that are made of protons and

17:49 neutrons and electrons.

17:50 And that's it.

17:51 Right.

17:51 Well, yeah.

17:52 So it turns out that's not really the whole story.

17:54 And that while electrons, as it turns out, do seem to be fundamental particles, protons

18:00 and neutrons, these are these are actually composite particles that are made of even more

18:06 fundamental particles that we call quarks.

18:08 And so and there's also gluons in there, which are other subatomic particles.

18:13 So it turns out that there's kind of this whole particle zoo, if you will.

18:17 But kind of the amazing thing is that when we go and kind of explore the world and and

18:23 actually look, it turns out that there's 12 matter type particles in the world that there's

18:29 six of these quark particles that make up the protons and neutrons.

18:32 We call these things protons and neutrons more generally hadrons.

18:36 And then there's things like the electron that are called we call leptons.

18:42 And there's both electrically charged things, electrically charged leptons like the like the

18:48 electron.

18:48 And then there's also kind of their ghostly neutral cousins that hardly even interact with

18:54 matter at all.

18:55 So I think maybe Michaela and Michael can talk about the fundamental forces and other

18:59 things.

18:59 Yeah, sure.

19:00 Michaela, take it away.

19:01 Yeah, of course.

19:02 But on top of all of these matter particles that Matthew just described, we also have what

19:07 we call the force carriers or gauge bosons in a way.

19:12 And these are particles that can be thought of as being exchanged among other fundamental

19:17 particles to mediate some of the forces and attractions that connect these that are more

19:22 fundamental particles.

19:23 And so we can think of the photon, for example, as being the force carrier for electromagnetism.

19:29 And then I think to complete the puzzle, we have the most recently discovered particle

19:35 for the standard model of particle physics, which is the Higgs boson.

19:38 And by the way, the standard model of particle physics is just this great theory that we've

19:43 come up with over the years that kind of puts all of our fundamental particles in this periodic

19:49 table that Matthew and I have just tried to describe to you.

19:52 So there is a little bit of an analogy maybe to chemistry, but at even more fundamental

19:56 level than the atom itself.

19:58 Yeah.

19:59 And it's really amazing that this was created somewhat theoretically.

20:02 And then the machine to go find things like the Higgs boson was built.

20:07 And then it really was there.

20:08 That's true.

20:09 It's very fascinating.

20:10 I think at a certain point in history, experiment was ahead of theory.

20:15 And then theory surpassed experiment once again.

20:18 And so it's a very fascinating field to be in.

20:22 And I think at different historical moments, things were very different from what they look

20:26 like today.

20:27 Yeah.

20:27 How much do you think that comes from people just getting better at theory versus computational

20:33 techniques assisting theory?

20:36 I think it's probably part of what you said, but as well as the energy regimes that we are

20:43 trying to probe now.

20:44 So I think most particle physicists would agree that in terms of the energies that we were able

20:50 to probe right now, we think we have a good understanding of all of the particles that could exist there.

20:56 But what we're really searching for right now is something that is an even higher energy.

21:01 And that's the main issue.

21:03 At that point, the complication is not so much whether the theory is there or not, but it's

21:09 being able to produce machines.

21:12 So it's the hardware technology even to go search for these particles.

21:16 And I think certainly software will help us get the most out of the hardware that we currently

21:20 have and the next hardware that we will build in the future generations.

21:24 But it's both hardware and software, in my opinion.

21:27 Yeah.

21:27 Okay.

21:28 Really cool.

21:28 All right.

21:29 Maybe the last one on this physics intro stuff.

21:33 Michael, what was the big deal about finding the Higgs boson and what did people learn from

21:38 it?

21:38 Yeah, absolutely.

21:39 The Higgs boson is probably the hardest of the particles to describe in some ways.

21:43 So it sounds enormous, but the job of the Higgs boson is many things.

21:49 And it's probably the easiest way to explain is it gives mass to all the other particles.

21:54 And so the way you can think about that is as particles move around, they bump into Higgs

22:00 bosons.

22:01 And the Higgs bosons kind of slow them down and effectively give them mass.

22:06 Kind of like if you're trying to walk through water, it takes a lot more force or a lot more

22:10 effort to kind of move your body.

22:11 And that's the same way that's kind of happening with the particles.

22:14 This is an imperfect analogy.

22:15 So if any theoretical physicists are listening, I apologize.

22:19 That's roughly the idea.

22:21 But it turns out the Higgs boson really played a fundamental role in making this theory that

22:26 Michaela and Matthew explain, which is incredibly predictive, maybe the most predictive theory

22:30 ever, kind of makes sense.

22:32 Without the Higgs boson, the theory effectively predicts things that have probabilities larger

22:37 than one, which we knew didn't make any sense.

22:39 And that's kind of a little bit how you were saying that that knowledge of the theory breaking

22:44 down really helped drive what became a 40 or 50 year long experimental search.

22:50 Yeah, it's amazing how people get excited when they're wrong.

22:54 It's like, oh, the theory might be wrong.

22:56 It would be so exciting.

22:57 Yeah, absolutely.

22:59 I think, you know, that's every time something doesn't make sense, the theoretical physicists

23:03 get incredibly excited that they're going to have to come up with a completely new theory.

23:06 Yeah.

23:06 And I don't think we're done with that at all.

23:09 That's cool.

23:10 So if people are out there listening and they really want to get a sense for what's going

23:15 on LHC and particle physics, I definitely want to recommend the documentary Particle Fever,

23:20 which I think is available on Netflix.

23:22 But I'll link to at least the trailer.

23:24 And there's a book called Present at Creation, Discovering the Higgs boson, which is great.

23:28 And then there's another one, We Have No Idea, A Guide to the Unknown Universe.

23:32 Who recommended that one?

23:33 Yeah, so I recommended that.

23:34 That's actually co-written by our Atlas colleague, Daniel Whiteson, and then also the famous PhD

23:41 comics cartoonist, Jorge Sham.

23:43 I really like that book because I think it's both like a celebration of how much we still

23:48 don't know about the universe and how now is really a great time to get into science because

23:54 we're truly in an age of discovery.

23:56 But it also talks about just how much we do know as well.

24:00 So I think it's both, it's just a great celebration of science.

24:03 And it also, given that it's co-written by a physicist, it really does convey ideas really

24:08 well.

24:08 Yeah, excellent.

24:09 Yeah, so people can check all three of those things out.

24:11 They're really good background information.

24:13 This portion of Talk Python To Me is brought to you by Linode.

24:18 Are you looking for bulletproof hosting that's fast, simple, and incredibly affordable?

24:22 Look past that bookstore and check out Linode at talkpython.fm/Linode, L-I-N-O-D-E.

24:29 Plans start at just $5 a month for a dedicated server with a gig of RAM.

24:33 They have 10 data centers across the globe.

24:35 So no matter where you are, there's a data center near you.

24:38 Whether you want to run your Python web app, host a private Git server, or a file server,

24:42 you'll get native SSDs on all machines, a newly upgraded 200 gigabit network, and 24-7 friendly

24:48 support, even on holidays, and a seven-day money-back guarantee.

24:51 Want a dedicated server for free for the next four months?

24:55 Use the coupon code python17 at talkpython.fm/Linode.

25:00 So let's catch up on the LHC just a little bit.

25:03 Michaela, maybe could you just give people a sense of the scale?

25:07 Like Michael said, there are 3,000 people working on Atlas, and Atlas is just part of one of the

25:12 experiments.

25:12 Give us a sense of what this place is like.

25:14 Yeah, absolutely.

25:15 I really love CERN.

25:16 It's a group of fantastic people, some of the brightest minds in the world.

25:20 And it brings together researchers and engineers and computer scientists from all around the

25:25 world, probably hundreds of different nationalities.

25:28 And one of my favorite thing about it is that I can absolutely say that every single time I

25:32 am there, I'm never hanging out with more than one or two people from the same nationality.

25:37 So to me, that's the best thing about it.

25:39 Again, there's probably tens of thousands of people across the various different experiments.

25:45 We've been talking a lot about the LHC, a large hadron collider that hosts four experiments.

25:51 So not only Atlas, but also LHC-B, ELISE, and CMS.

25:56 But then again, the large hadron collider is only one small part of the entirety of CERN.

26:02 There are a lot of other experiments going on.

26:04 For example, some antimatter experiments at the antiproton decelerator, as well as even

26:11 astronomy experiments, as far as I know.

26:13 So it's a large laboratory that spans across two different countries at the intersection,

26:19 at the border, basically between France and Switzerland.

26:22 So fantastic place to work at.

26:24 Yeah, it sounds really, really cool.

26:25 And people can go tour it, right?

26:27 They can set up a tour?

26:27 Absolutely, yes.

26:29 Anybody can just show up and do the quick tour of, for example, Point One, which is where

26:36 Atlas is located.

26:37 So you can visit our control room and learn more about our experiment.

26:42 And if you're lucky enough to be able to visit during a shutdown period, so oftentimes in the

26:47 winter, that's when we have quick shutdowns.

26:49 You might even be able to go underground and visit the actual experiment, which is absolutely

26:54 breathtaking.

26:55 Yeah, I'm sure that it is.

26:56 Just the scale from the pictures, it looks like, is just incredible.

26:59 Yeah.

26:59 I just want to jump in and just make an additional comment about what Michaela said, in the sense

27:05 that CERN really is an open laboratory.

27:07 And it's a big part of CERN's mission to make sure that the scientific discoveries that are

27:12 made there are made for all of humanity.

27:14 So CERN really welcomes the public getting involved and being curious and coming and asking

27:21 questions.

27:22 So yeah, if you're ever traveling through nearby Geneva, Switzerland, sign up for a tour.

27:27 Come visit.

27:27 Yeah, that sounds great.

27:29 Michael, the data that flows through these experiments out to the collectors and then into the trigger

27:36 that Matthew mentioned, and then on to the larger computing structures that are there is

27:42 pretty insane.

27:43 Do you want to give us like a kind of an overview of the scale of data?

27:46 Yeah, absolutely.

27:46 The way that kind of the LHC works is, well, I'll give you a little bit of the physics

27:51 background about why we have to design the systems this way.

27:54 So a lot of the things we're searching for are very rare.

27:57 And the physics that we deal with is probabilistic.

28:00 So we might be looking for something that's interesting that only happens in one out of a trillion

28:04 collisions or maybe even less frequently.

28:06 And so we have to collide protons as many times as possible.

28:10 So we collide protons 40 million times a second.

28:13 And those collisions fly out into the massive detectors that Michaela was describing or that

28:20 we've been discussing.

28:22 So the thing is, we can't record all that data.

28:25 So we can only record a fraction of that data because it would simply be too much.

28:29 So we have a set of systems called the trigger, which allows us to go from 40 million collisions,

28:34 of which many are not super interesting, down to about kilohertz, so about 1,000 a second.

28:40 And each of those collisions, the data that comes out of the detector, I think it's about a megabyte

28:45 per collision.

28:46 So that means we're taking about a gigabyte of data per second.

28:49 That's the data that made it through the trigger that was not discarded by the hardware, right?

28:53 Exactly.

28:54 That's just the data that made it through the detector.

28:56 And the actual processing there is a combination of custom-built hardware and FPGAs, which are

29:03 fast enough to deal with looking at the data really quickly at 40 million times a second and

29:07 helping us pipeline it down through various both hardware and software systems down to this

29:15 kind of kilohertz rate.

29:16 So we don't run the detector all the time.

29:18 We run it a lot.

29:19 There's shutdown periods.

29:20 There's times when you have to kind of refill the beam.

29:22 And I think we accumulate something like, what is it, like three or four petabytes of data

29:28 a year.

29:29 Yeah, that's just crazy.

29:30 And then it's not just there in Geneva.

29:33 It's also broadcast out, right?

29:35 Right.

29:36 So I think I was, yeah, I think there's something like 170 institutions around the world that make

29:41 up the kind of worldwide computing grid or the LHC worldwide computing grid.

29:44 And there's something like 300,000 or maybe more at this point computing cores, which then

29:50 can, well, we distribute the data around the world.

29:53 And then we often need to process and reprocess and analyze that data.

29:56 And that's done on this enormous computing grid.

29:59 And so that's kind of once it's stored and distributed, that's how we go and analyze it is by basically

30:04 sending jobs to this grid, which you can kind of think of as a precursor to what the cloud

30:09 is now.

30:09 Right.

30:09 And so there's so much data probably that I suspect putting it on your laptop doesn't

30:14 make a lot of sense, right?

30:15 You need to send your computation to the data rather than bringing the data to you, right?

30:21 Exactly.

30:22 I know.

30:23 I just said exactly.

30:24 That's exactly how it works.

30:25 You certainly wouldn't want to be downloading petabytes of data onto your computer.

30:29 So thankfully, we have this grid of tier zero, tier one, tier two, tier three locations spread

30:36 all around the world where you can send your scripts and they'll be run and then you'll

30:41 get the results back.

30:42 Yeah.

30:42 So give me a sense of what that's like.

30:44 You have a question you want to ask about the data.

30:45 You could write some C++.

30:47 You could write some Python.

30:48 You could write maybe even Fortran.

30:50 What is the mechanism from?

30:52 I have this thing, this Python, let's say Python here.

30:55 And I would like to make it run there and analyze the data.

30:59 What's the steps look like there?

31:00 From the user's perspective, which is the one that I get, it's very simple because of the

31:05 work of hundreds of people who've made it simple for us.

31:08 So we simply have an interface with our computing grid where we can specify specific locations

31:15 if we want to do so.

31:16 We can specify the length of the job, the number of cores that we're requiring, the number of

31:21 nodes, et cetera.

31:22 And then we can simply submit our script as long as it's in a format that is compliant

31:28 with what our systems are able to handle.

31:31 And then you specify what data to operate on, whether it's true data that has been collected

31:36 from the LHC or if it's simulated data, you can also operate on that.

31:41 And then you're able to monitor all of your jobs and eventually get the results back and

31:46 download the histograms or whatever format your results will come to you as.

31:52 That sounds really cool.

31:53 And Michael, you said 300,000 computing cores?

31:56 It's something I think it might even be larger at this point.

31:58 As of 2010, it was more than 200,000.

32:01 It's ever growing with the amount of data we have and the amount of computing we need.

32:06 Just as Michaela was saying, we kind of send our jobs out and it's kind of built on top of

32:11 a kind of virtual machine file system where we actually, all these sites are kind of working

32:18 in coherently with having the same distribution of the Atlas software located all these sites.

32:23 So you can send your job with a known even version of the software and it's already available

32:28 locally to you.

32:28 Wow.

32:29 Okay.

32:29 That sounds really, really fun.

32:30 I guess one thing that I was wondering looking at the LHC is we have Atlas, we have LHCB and

32:39 Alice and CMS.

32:40 What is the purpose of Atlas?

32:43 Like what is Atlas trying to do relative to the larger goal of LHC?

32:47 Maybe Matthew take that.

32:48 So Atlas and CMS, these are two of our general purpose detectors.

32:52 And so the idea there is these detectors were explicitly designed and then built to be sensitive

32:59 to a wide range of interesting physics.

33:02 Whereas for example, backtracking a little bit like Atlas and CMS, they kind of have, if

33:07 you will, like they're sometimes referred to as like cylindrical onions in the sense that

33:11 the architecture is you kind of have your beam pipe and then you have successive layers of

33:17 very detailed and detectors going out around them.

33:21 Is that so you can basically take a 3D picture?

33:23 Yeah, exactly.

33:24 Because when we have these collisions, when you have the really hard collisions, then in

33:30 some sense you have like just the result of the collision is that you have sprays of

33:35 particles kind of coming out in all directions.

33:37 And so you want to have as much coverage as possible.

33:41 And the idea is if you have both like calorimetry systems and tracking systems, then you're

33:46 able to get a much more detailed picture of what actually happened.

33:49 Because we're trying to reconstruct essentially point-like interactions that are happening at

33:55 the subatomic level.

33:56 But we're doing that by seeing what kind of is just coming splattering through our detector.

34:01 So it's kind of like trying to reconstruct what might have happened if you took like two,

34:06 if there is some sort of like car crash.

34:09 And the only way you could investigate what happened is if you were able to look at what

34:14 the walls of a tunnel or something nearby.

34:16 But so Atlas and CMS, they're general purpose detectors.

34:19 And then Elise and LHCB, their geometries are a little bit different.

34:24 And so they're more specialized detectors that are looking at specific types of physics.

34:29 All right.

34:29 Okay.

34:29 Yeah.

34:30 Yeah.

34:30 Very cool.

34:30 So one of the things I want to dig into with each one of you is what you're doing day to

34:36 day and sort of how Python and machine learning fit into that.

34:41 So Michaela, let's start with you.

34:42 You're doing some, you already mentioned your generative adversarial networks and some really

34:47 amazing stuff.

34:47 You said that you were able to speed up some of these simulations like 100,000 times.

34:52 That's correct.

34:53 Cool.

34:53 Techniques.

34:54 That's incredible.

34:54 Can you talk about how you're doing that?

34:56 Yeah.

34:56 So these are obviously preliminary results and there's a lot more R&D that is now just being

35:01 started within our collaboration.

35:03 But the point is that we built this great prototype we call the Calogan.

35:08 Calo stands for calorimeter.

35:10 That is one of the detector layers inside of this big onion-like structure that we just described

35:17 our detector to look like.

35:18 And the calorimeter measures the energy deposited by certain particles as they travel through,

35:24 as Matthew was just describing.

35:25 The issue is that because some of these physical processes that the particles undergo are so

35:32 complicated, simulating these traversals of the particles through the calorimeter is really,

35:40 really computationally intensive.

35:41 And that's actually taking more than half of the computing grid power that we were just

35:47 describing.

35:47 So it's billions of CPU hours per year.

35:49 So what I'm working on is this new technique to speed up that part of the simulation, which

35:54 currently occupies the majority of our computing resources worldwide.

35:57 And what I'm using is generative adversarial networks.

36:00 I think some of your audience maybe will recognize this word.

36:05 So we use GANs to provide a function approximator to our simulator while retaining, hopefully, the majority

36:14 of the accuracy that the slower physics-driven simulator possesses.

36:19 And again, as I said, multiple preliminary results have been put out so far.

36:23 And we are achieving speed up times of over 100,000 times.

36:27 But now the complicated part will really be to learn how to calibrate all of these, this machinery and

36:33 bring it, port it into the real simulation within the experiment.

36:39 But speaking of Python, I think the cool thing for everybody to know is that this is very

36:44 easily built using Keras and TensorFlow.

36:46 So very standard machine learning tools from Python, as well as other standard tools from the Python

36:53 ecosystem, such as NumPy, scikit-learn, H5Py, and Matplotlib all make it into my project.

36:58 Yeah, that's really cool.

36:59 I think people are probably familiar with most of those.

37:01 But H5Py, what is that?

37:03 Oh, it's just the interface for HDF5 in Python.

37:08 HDF5 being like a very standard data format that can be ported across various different languages,

37:13 C++, Python, and H5Py certainly saved my life in terms of being able to open these files.

37:20 Yes.

37:20 Of course.

37:21 That's really cool.

37:22 Michael, you want to talk a bit about how you're using machine learning for what you're up

37:26 to?

37:27 Yeah, absolutely.

37:28 So kind of in the past, I've been working a lot on, again, these kind of ideas of taking

37:33 detector measurements and turning them into classifying whether this data was from a given

37:38 kind of particle or not.

37:39 And so I've been working on connecting kind of the data that we have with ideas of machine

37:44 learning.

37:44 So we have, so when we were talking about quarks, and when you produce quarks, it turns out

37:49 they produce kind of collimated streams of particles that smash into these calorimeters and

37:54 kind of leave a bunch of energy with kind of distributed in space.

37:58 It turns out we can connect those distribution of energy in space with basically imaging type

38:02 approaches.

38:03 And then we can, we've been running a lot of computer vision type techniques to study those

38:07 jets, those quarks.

38:09 And so that's, you know, really jumped into connecting with things like convolutional neural

38:14 networks and modern computer vision.

38:16 And so, yeah, I've been working on that just like Michaela with tools like Keras and TensorFlow

38:21 and, you know, built on top of core packages like SciPy and NumPy.

38:26 Well, one of the things I was going to ask you is it seems like the things you're actually

38:30 looking for are quite rare.

38:32 Like there was only a few collisions that produced the Higgs boson, for example, back in 2013,

38:38 14.

38:38 Right.

38:39 And my sense, I haven't done a lot of machine learning, but my sense is you have to give a

38:44 lot of examples to machine learning.

38:47 And then the machine learning can find more of those, even if they're subsequently rare.

38:51 But how do you like bootstrap this?

38:53 How do you get it started and where there's enough?

38:55 How do you teach them?

38:56 I guess is what I'm asking.

38:57 And so that they can then go find these things, especially when those occurrences are rare.

39:01 In some sense, it is kind of a bootstrap, but it's based on this idea that those super

39:04 rare particles like Higgs bosons, we don't observe them directly.

39:07 They decay into other things that we know about, like electrons.

39:11 I see.

39:11 So they're easy to find.

39:12 Decay result is easy to find.

39:15 And you can teach it to find those.

39:16 And when you find them in certain configurations, you're like, oh, this may have originated

39:20 from what we're looking for.

39:22 Exactly.

39:22 So I work a lot on making sure we can find not electrons, but other types of particles that

39:27 are produced copiously.

39:29 And then we can say, okay, in this case, okay, I know how to find electrons.

39:33 I can train an algorithm using our simulation, which is very precise.

39:37 I can go look for those in our real data to make sure that the simulation makes sense in

39:42 a different configuration.

39:43 And then I can go hunting with my well calibrated and well tuned algorithm for finding electrons.

39:49 I can go hunt for configurations with four or five electrons.

39:52 And that might be a really rare thing you want to look for.

39:54 Okay.

39:55 Yeah.

39:55 Very interesting.

39:55 I see how that works.

39:56 Because I was thinking about this.

39:58 You know, how do you get started trying to find very rare things?

40:02 But I can see that now.

40:03 Okay.

40:04 I'll throw this question out to everyone.

40:06 Are any of you using special hardware?

40:08 Or are you just leveraging the many, many cores out there on the computing environment?

40:13 Like are you using the tensor CPU type things or GPUs or those sorts of things?

40:19 I think the majority of us are using GPUs to train some of our machine learning models these

40:25 days on top of obviously the worldwide grid that we just described.

40:29 But that's more reserved for simulating samples or doing, you know, more standard types of analysis.

40:36 But I think most of us rely on GPUs these days.

40:39 I would add, I think one of the interesting potential future directions on top of all the

40:44 work with GPUs and CPUs is if we ever want some of these algorithms to work in our trigger

40:49 in that super fast system that needs to operate at 40 megahertz, this may be a place where,

40:54 you know, there's some people already in the community beginning to look at, you know,

40:57 putting neural networks onto FPGAs so that we can actually run them at super high speed.

41:02 So that may be a future direction in the field move.

41:04 Okay.

41:04 Yeah.

41:04 That would be really amazing.

41:06 Maybe Matthew, this is a good time to talk about that trigger thing.

41:09 So you have to take 40 million observations across this thing and get it down to 1000.

41:16 And you've got to do that extremely, like every second.

41:20 Yeah.

41:20 So how do you do that?

41:22 Yeah.

41:22 So Michael's already given a very nice intro summary there, but the trigger is an immensely

41:27 complex system.

41:28 And so, I mean, I definitely don't understand how all of it works.

41:31 I actually just work on a subsystem of the trigger subsystem.

41:36 So if you want to actually do an analysis, then you need to have, let's say like you're looking for a Higgs decay that goes to two B quarks.

41:47 Well, if you want to go searching for that, then you want to have some confidence that there ever could have been a recording of an event in the Atlas detector that had two B quarks in it.

42:00 So you want to make sure that some of these interesting collisions aren't getting thrown out.

42:05 And so we have, that's one of the reasons we have the trigger system is that we kind of have what we call a trigger menu, which is a list of basically logical sequences in that we are looking for in the different subsystems of the detector to say that, oh, okay.

42:21 So I think that's a lot of the way.

42:26 So I think that's a lot of the way.

42:30 So I think that's a lot of the way that we're looking for a lot of the things that we're looking for.

42:34 So I think that's a lot of the way that we're looking for a lot of the things that we're looking for.

42:36 We're looking for a lot of the things that we're looking for, but we're looking for a lot of the things that we're looking for, but we're looking for a lot of the things that we're looking for.

42:43 But we want to make sure that as we go up to higher energies and as we go up to even more collisions, so basically what we call luminosity is the number of collisions that we have per crossing increases.

42:56 We want to make sure that our trigger system can still deal with this because in a single crossing of the beam, we don't just get one kind of collision point.

43:07 We might get right now somewhere between like 30 and 50, but as we go to higher energies and to higher luminosities, we're looking at getting something like 200 collisions that are happening every beam crossing.

43:19 So if you're trying to say, oh, I have like an electron or I have a B-jet that looks interesting over here, I wonder if it might also have a partner that can tell me if I had a collision, you have a really difficult problem because you're trying to now pick out what might these other energy deposits or tracks be from 200 other collisions that are happening at the same time.

43:42 Yeah, it's one of these combinatorial types of things.

43:46 How many different relationships between, you know, 30 or 200, right?

43:50 That turns out to be astronomically bigger, right?

43:52 Yeah, it gets pretty crazy.

43:53 I'm sure it does.

43:56 I'm sure it does.

43:57 So that's mostly like embedded C++.

44:01 Is that right?

44:02 But what other stuff are you doing there?

44:04 Is there any Python at that layer?

44:05 So actually like in the trigger system, no.

44:08 But so as a student, what I'm doing a lot more is performance studies of how our trigger is doing.

44:16 And so there I do use kind of both a hybrid of Python and C++.

44:20 We have colleagues at the University of Chicago that have written a really nice analysis framework that a lot of people in the collaboration use.

44:29 And that's it's all implemented in C++.

44:32 But then there is also a way for us to interact with it from Python.

44:37 So that's really nice because then that allows us to, for example, I was able to write some tools, some command line interface tools using arg parse and things like that.

44:47 So I can give, if I want to do a quick performance study, I can, without really having to ever write a line of C++, I can kind of spin up a small analysis to say like, oh, okay.

44:58 How is the trigger performing under these scenarios and things like that?

45:03 And using things like Jupyter Notebooks and stuff to sort of do that exploration?

45:07 For the trigger, I don't use Jupyter Notebooks so much.

45:09 That's more regular.

45:11 I use Jupyter more for the exploratory data analysis that I'm using for, if you will, like my actual like PhD research.

45:18 But for the operations work that I do on Atlas, that's more just kind of going into your favorite editor and then writing some command line tools in Python.

45:28 And then also making sure that it interfaces well with these C++ frameworks that we've developed.

45:34 Okay.

45:34 Very interesting.

45:35 So, Michaela, maybe we could talk a little bit about where you guys are going in the future.

45:40 LHC has been a really long running project.

45:44 It started producing major results in 2013, 14, but it was started way before then, right?

45:50 So where's the future going now with what you're up to?

45:54 Of course.

45:54 So there are still so many open questions, I think, about the standard model.

45:59 We know, as Michael said, that the standard model has been validated over and over again.

46:04 And it's probably one of the most predictive theories in the history of all theories.

46:10 But there are some missing pieces.

46:12 So, for example, we don't have a complete quantum theory of gravity, for example.

46:17 So there is this hypothetical particle called the graviton that some are searching for.

46:22 So I think there are very many open questions that could potentially be answered through some of the theories that have already been proposed, whether it's supersymmetry or others.

46:34 And so I think right now the goal is to continue looking for those while we also continue making very high precision measurements of all of the particles that we know and love.

46:47 So I think some ideas for what else could be out there doesn't necessarily come from directly searching for these new particles, but it could come from measuring properties of particles that are already a part of the standard model.

47:01 So I think that's a good thing.

47:03 So I think that's a good thing.

47:03 I think that's a good thing.

47:04 So I think that's a good thing.

47:05 So I think that's a good thing.

47:06 So I think that's a good thing.

47:07 So I think that's a good thing.

47:08 So I think that's a good thing.

47:09 So I think that's a good thing.

47:10 So I think that's a good thing.

47:11 So I think that's a good thing.

47:12 So I think that's a good thing.

47:13 So I think that's a good thing.

47:15 So I think that's a good thing.

47:16 So I think that's a good thing.

47:17 So I think that's a good thing.

47:19 So I think that's a good thing.

47:20 So I think that's a good thing.

47:21 So I think that's a good thing.

47:23 It could look entirely right.

47:25 But there could be these subtle, subtle deviations.

47:27 Like just look at Newton and gravity and then Einstein and gravity, right?

47:32 Newton looked right.

47:33 Yes, of course.

47:34 I mean, to first degree, it was.

47:36 But obviously, there are then in certain regimes, some deviations from the more simplistic theory.

47:43 And so we think that perhaps there could be other corrections to what we're measuring that

47:49 could come from more complicated theories that we're still to validate.

47:52 Cool.

47:53 So, Michael, one thing I wanted to ask you is, how do you feel that machine learning and

47:59 doing that with Python and TensorFlow and all these things have changed physics and the

48:04 physics exploration?

48:05 Like how much would have it just been more work or stuff actually being discovered that

48:10 wouldn't have been discovered?

48:11 What do you think?

48:12 Yeah, I think a lot of algorithms may not have been discovered or maybe, or I should say,

48:17 maybe applied or implemented in a reasonable timescale.

48:20 I mean, so, you know, we're not, since we're not professional machine learning, you know,

48:25 machine learning engineers or machine learning researchers, a lot of the times what, you know,

48:28 the tools we're using, we either built from scratch or use what's available.

48:32 And so the kind of this availability of things like TensorFlow make it really easy for us to implement and just try things out on our data and then try to make these connections between the machine learning world.

48:43 So things like these generative models, you know, they might not, we may not have access to these kind of really, potentially really fast speed ups or new ways to look at the data, like, you know, with vision or natural language processing approaches.

48:57 You know, these algorithms may not even be implemented anywhere.

48:59 Nowadays, you can just kind of download the model if you want and use that as a place to start.

49:04 And I think that one of the things that's been really helpful is the core kind of C++ libraries that we use at CERN are built with this, with kind of internal dictionaries with the kind of have this idea of reflection so that they can really easy be bound to Python.

49:18 And we can just take our data that's in one format and really quickly switch into Python and just start pounding or, you know, running through all these different kind of algorithms and see what comes out.

49:30 That's really cool.

49:30 So it's a little bit like what NumPy did for numerical analysis.

49:34 You kind of have something similar where it's got this sort of native layer really close to Python.

49:40 Exactly.

49:40 And so, yeah.

49:42 And in fact, then we can also interface directly with NumPy and pull our data right into formats that are kind of standardized in the machine learning and data science community.

49:50 Yeah, very cool.

49:51 Michaela, maybe I could ask you the same question because your work has made computational so much faster.

49:59 And it seems like the machine learning played an important role there as well.

50:02 If you could really make these simulations 100,000 times faster, that's almost like going from regular computers to quantum computers.

50:10 It's such a jump.

50:11 Yes, of course.

50:12 I mean, obviously, we would like to empower more physicists to be able to do the analysis that they want to do at the precision levels that they require.

50:21 And right now, obviously, that simulation bottleneck means that some analysis cannot necessarily be performed with the statistical uncertainties that we would like to have.

50:32 Or some of them cannot be performed at all in terms of the required accuracy that they need.

50:38 And so hopefully this type of projects will enable more physics to be done and more analysis in the future of the LHC.

50:45 Yeah, very cool.

50:47 So, Matthew, do you think there's room for normal developers, people who want to just go contribute to open source, maybe some kind of open science they want to play around and just like, hey, they're really good at programming, but they're not physicists.

50:58 Is there a place for them to kind of jump in and be part of this?

51:01 Yeah, I think so.

51:02 I mean, so we have, as Michael mentioned earlier, it's really important that our code is open source.

51:08 And so if people actually want to go take a look at it, it's out there.

51:12 And then there's also, I think there's some efforts to get something like this started on Atlas, but then on one of our other, one of the, at LHCB, one of the other experiments, they have what's called the LHCB Starter Kit.

51:26 And that was started by two PhD students, Kevin Duns and Tim Head, who are now, have now left the field to go work at Google and be a data science consultant.

51:35 But they thought like, hey, you know, a lot of our students are coming in and we don't have, and we're all physicists, but we're not all necessarily software experts.

51:44 So they created this thing called the LHCB Starter Kit, and they've done some partnering with softwarecarpentry.org to actually go ahead and hold training seminars where people from software carpentry come in and help them learn how to actually get started to do data analysis and programming and physics.

52:03 So I think software carpentry and data carpentry, those are really great organizations.

52:08 So if people want to get involved there, they definitely can as well.

52:11 Yeah, that's really cool.

52:12 And I had the software carpentry guys on the show a couple of months back.

52:15 Yeah, that was a good episode.

52:17 Yeah, thanks.

52:17 Yeah, it was really nice.

52:18 So I think that's, I think it's great.

52:21 I think, you know, there's more and more of these tools are becoming accessible to everyone.

52:25 And I suspect uploading to the whole computing system and running stuff there is probably restricted to you guys.

52:31 But still, people can work on these algorithms.

52:34 And in a sense, they're contributing, right?

52:35 People who make TensorFlow or Keras better, they're in a sense making it better for you guys as well, right?

52:40 Yeah.

52:41 Absolutely.

52:41 Yeah, there's a big push right now.

52:44 I think Michaela is a great example of this to really kind of go out into the communities that are going and actually building these great tools and interact with them directly and contribute.

52:55 So, I mean, we want to try and use really powerful tools to do the best analysis we can.

53:01 So the people that are making the Python tools that we use better, they're really directly impacting science.

53:09 So we appreciate it.

53:11 Yeah, that's amazing.

53:12 All right.

53:13 Well, I have so many more questions to ask you all, but we're pretty much out of time.

53:18 So I want to be respectful of that as well.

53:20 Before we go, I'm going to ask you all the two questions.

53:23 I'll start with Michael, favorite editor.

53:25 Emacs.

53:26 Michaela.

53:26 Sublime.

53:27 Matthew.

53:28 Adam with Vim key bindings.

53:29 All right.

53:30 Right on.

53:30 And Michael, notable PyPI package?

53:33 PyTorch and Scikit Optimize.

53:35 I don't think I've talked about PyTorch on the show before.

53:38 Maybe just really briefly talk about what that is.

53:41 Yeah.

53:41 So PyTorch is a deep learning library that's built on a different way of building the graphical representation of your deep neural network for all the downstream computations.

53:51 But actually, the API is just really, it's very, I find it very smooth and easy to really quickly spin up a neural network and have it running.

53:59 Excellent.

54:00 Michaela, how about you?

54:01 I'm a huge fan of Keras.

54:03 I've been using it since it was first started as a project.

54:05 And it's been enabling me to go from idea to experimentation very quickly.

54:11 And then huge shout out, I think, to Matplotlib.

54:14 It allowed me to make all the plots that I've ever made during my PhD.

54:18 So those two for sure.

54:19 All right.

54:20 Excellent.

54:20 Matthew?

54:21 Definitely Keras.

54:22 I mean, like Michaela said, like, that's kind of a bread and butter thing for us.

54:26 And then also Scikit-HEP, which is a...

54:29 HEP is high energy physics?

54:30 Yeah.

54:30 So similar to scikit-learn, how that was supposed to be a toolbox for machine learning, Scikit-HEP is a collection of Python tools that have been developed inside the high energy physics community, but are meant to try and help us interface with things like with NumPy and Pandas data frames and make our lives a lot easier when we're actually trying to go from our root data file format to things like NumPy.

54:55 Yeah, very cool.

54:56 I'll give you guys a chance for a final call to action.

54:59 And people are excited about what they heard.

55:01 They want to get involved.

55:02 Michael, you can go first.

55:03 Sure.

55:04 Yeah.

55:04 I think there's a lot of ways in which you can get involved, especially through the CERN, you know, open data portal and CERN Open Science.

55:10 You can download some of our data.

55:13 You can play around with it and also just help spread knowledge about, you know, science and scientific reasoning and, you know, how that can benefit society, both from advancing science and advancing the way we think.

55:24 All right.

55:25 Excellent.

55:25 Michaela?

55:26 Well, certainly if you're interested in working at the intersection of science and machine learning or data science, CERN, I think, is quickly becoming one of the best places in the world to do that because the scale and the fascinating problems that we're working on are completely unparalleled.

55:39 And we're always looking for new skilled software engineers that are curious about mysteries of the universe.

55:45 And as I said before, the community of CERN is truly fantastic.

55:49 So we can always use more Python experts.

55:52 And you absolutely don't have to be a physicist to make a huge impact at CERN.

55:56 Oh, excellent.

55:57 Sounds really fun.

55:58 I'd love to work there.

55:59 All right, Matthew, you got the final word.

56:00 Yeah.

56:00 So physicists, we love to collaborate and we like to think at least that we have really cool and really hard problems.

56:06 So we were always looking for it.

56:09 And you consider those to be the same thing, right?

56:11 Cool equals hard.

56:12 Yeah, probably.

56:12 So and we've we've started in the last couple of years to have some really fruitful collaborations with our CS colleagues.

56:21 So, you know, if you if you know a friendly neighborhood particle physicist and you want to talk with them, please do.

56:28 We we're happy to talk and we really want to try and have our field grow with other fields and try and do the best science we can.

56:37 All right.

56:37 Thank you all so much for being on the show.

56:40 It's been really interesting and I've learned a lot.

56:43 Thanks for sharing what you're doing and keep up the good work.

56:45 Thank you.

56:46 Yeah.

56:46 Thanks for having us on, Michael.

56:47 Yeah, thanks.

56:48 And thanks for for hosting this great show.

56:50 You're welcome.

56:51 Bye, everyone.

56:51 Bye bye.

56:52 Bye.

56:52 This has been another episode of Talk Python To Me.

56:55 This week's guests have been Michaela Paganini, Michael Kagan and Matthew Feigert.

57:00 And this episode has been brought to you by Linode and Talk Python Training.

57:04 Linode is bulletproof hosting for whatever you're building with Python.

57:09 Get your four months free at talkpython.fm/Linode.

57:13 Just use the code Python 17.

57:16 Are you or a colleague trying to learn Python?

57:19 Have you tried books and videos that just left you?

57:21 bored by covering topics point by point?

57:23 Well, check out my online course Python Jumpstart by building 10 apps at talkpython.fm/course

57:29 to experience a more engaging way to learn Python.

57:32 And if you're looking for something a little more advanced, try my write pythonic code course at talkpython.fm/pythonic.

57:39 Be sure to subscribe to the show.

57:42 Open your favorite podcatcher and search for Python.

57:44 We should be right at the top.

57:46 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

57:55 This is your host, Michael Kennedy.

57:57 Thanks so much for listening.

57:58 I really appreciate it.

57:59 Now get out there and write some Python code.

58:01 Thank you.

58:02 Thank you.

58:02 I'm out.