Biology and big data are now completely inseparable.
Most modern biology produces datasets too large to manage by conventional standards, and the challenge will increase exponentially as the sophistication of science grows.
The Center for High-Throughput Computing (CHTC), a partnership between UW-Madison School of Computer, Data & Information Sciences and the Morgridge Institute, sees this data onslaught and says: Bring it on.
“We have set ourselves the goal of never letting the amount of data limit the experimental approach of scientists,” says Miron Livny, the founder of high-throughput computing (HTC). Livny has been an HTC advocate for more than three decades as a computer scientist at UW-Madison, and most recently as a senior research computing scientist at the Morgridge Institute.
HTCondor is a software-based approach to task scheduling that essentially breaks a larger computational task into smaller chunks, allowing researchers to analyze more data (hence the term “high throughput”). The team now manages 250 to 300 projects a year, twice as many as five years ago, and uses hundreds of millions of hours of computing time.
And that’s right at UW-Madison. The Global Open Science Grid provides HTC resources to the world, where it is the backbone system for Nobel Prize-winning projects such as the detection of gravitational waves and the discovery of new subatomic particles. This year again, he caused a stir for his contribution to the discovery of a massive black hole at the center of our galaxy.
This service is gaining adherents on campus because scientists are learning that it’s more than someone asking, “What technology do you need?” Research computing is a collaboration, and the people HTC brings to the equation are more important than the technology.
Livny says the HTC Facilitation Team is a great example. The focus on enablers was way ahead of its time, almost unheard of in computing circles. It’s the translators who can work their magic between technology and bench experiments – finding the best way to maximize data for scientists.
Livny uses a hospital metaphor. Like a hospital emergency room, the HTC is not dedicated to one disease or family of health conditions. It takes all comers – whether it’s particle physics, brain science or COVID 19. Facilitators help decide: what’s the right computational ‘medicine’ for each individual?
HTC’s UW-Madison and Morgridge sides work together seamlessly – by design, you can’t tell where one begins and the other ends. But there is one unique ingredient that Morgridge provides. Livny says the institute’s hiring flexibility allows the group to hire unconventional talent who may not be the best for tenure-track positions, but who are perfect for advancing HTC as a basic service.
Brian Bockelman came on board in 2019 as a Research Computing Scientist at Morgridge, having decades of HTC experience with major physical science projects such as the CERN Collider in Switzerland and Ice Cube at the South Pole. He was able to apply this experience to the massive computational needs we see in biological research today.
For example, he led the development of the data management platform for the new cryo-electron microscopy (Cryo-EM) center on campus. As a technology that presents both large-scale and processing challenges, cryo-EM will keep the research computing team busy for years to come. “The real success of research computing is when researchers change the way they do science because of the questions we ask, as well as the computing we provide them, opening their eyes to things they didn’t know they were possible,” says Livny. “Ultimately, established scientists are able to think differently about science itself, rather than just solving a separate problem.”
#HighThroughput #Computing #Fostering #Limitless #Data #Science #Morgridge #Institute #Research