Jean Luca Bez

SLAM Talk Title: "Where’s the Bottleneck?"

How did you originally get interested in science?

I have always liked exploring, understanding, creating, and building things. I took a chance on computer science after my first option of undergraduate course was not available. I could not be more grateful for the choice I made! This has allowed me to interact with science in a novel and unexpected way by collaborating in different science domains, exploring new ideas, and creating new software, tools, and models.

What is your favorite place at the Lab?

As I joined the lab mid-pandemic, I did not have a chance to explore all the complex yet. However, from the places I've been to, I do love the bay view from the patio between buildings 50A and 50B.

Most memorable moment at the Lab?

Definitely the day I got my badge! It was quite some time after I joined the Lab since we were all full-remote then, and it was nice receiving an in-person welcome. It was also the day I had an unexpected company at the bus stop: a living, breathing turkey! Not sure which bus it was waiting for though.

What are your hobbies or interests outside the Lab?

Reading, Playing Video Games, Cooking, Traveling

JEAN LUCA's Script - "Where’s the Bottleneck?"

Do you recall that children’s puzzle called Where’s Waldo or Where’s Wally? It’s the one where you have to find a character in red and white stripes and blue pants. You know how hard it can be to find him, right? That is because there is a lot going on in the picture!

Like in Waldo's puzzle, a lot is happening in a supercomputer where multiple scientific applications are running. Scientists use these machines to run experiments to understand the world we live in, from atoms to stars. Those experiments often take hours, days, or even weeks to complete, generating massive amounts of data, and most of the time, not in an efficient way.

For instance, when running a regional-scale earthquake simulation, we can get 75 TB of data from a single experiment. That is roughly the same amount of data if you would watch Netflix in 4K resolution for over one and a half years. That is a huge amount of data, and data movement performance must be efficient! Because of the complexity of today's supercomputers, it is harder to pinpoint the data movement performance problems than to find Waldo!

Here at Berkeley Lab, in the Scientific Data Division, we are working towards enabling faster science so scientists can get the data they require as fast as possible without having to worry about performance issues. We do that by using metrics comprising all aspects of a supercomputer complexity and applying different techniques from statistical analysis, data mining, and machine learning to identify those issues and map those to a subset of viable solutions so applications can run faster. It’s like devising an efficient way to find Waldo instead of randomly looking at the picture.

Because we work at the system level, we have a unique opportunity to interact with distinct science domains where applications can benefit from our solutions. For instance, in our initial results, an astrophysics application got 4 times faster and a linear-algebra application almost 200 times faster. All of this by intelligently detecting data movement problems and mapping to solutions based on the unique characteristics of each application and domain!

We strongly believe our research can help find and fix data movement performance issues automatically so scientists can focus on what matters the most: the science they do. You can think of it as a very smart, fast, and automatic strategy to find Waldo in the puzzle.