22 November 2017
As science works to answer the big questions facing modern society, researchers, organisations and companies are creating more and more data...and the challenge of how best to access it is becoming more and more complex.
Matt Pritchard is the Operations Manager of JASMIN; a globally unique platform built to address this challenge. JASMIN is revolutionising the way scientists collaborate on environmental science data, turning the process of analysing 30 years of data into a task that takes just days instead of months or years.
We spoke to Matt about how finding a new way to organise this important data efficiently is a ‘dream come true’. And with 6000 computer cores spread across 1000 virtual machines, a processing cluster and a high performance network at its core, it’s easy to see how JASMIN is making that dream happen.
This is the first in a series of profiles of our computing experts – check back in the next couple of weeks to hear from Erica Yang and Tony Hey.
“Plans for JASMIN started as early as six years ago in 2011. I had some involvement in the initial design, in that I was part of the Centre for Environmental Data Analysis (CEDA) team that ran the data archive for the Natural Environment Research Council (NERC). We knew we’d outgrown the IT infrastructure we had at the time and just storing the data wasn’t enough, so a new solution was needed. Fortunately, a funding opportunity came along at just the right time, and JASMIN has grown several times since.”
“After years of struggling to find space for all the scientific data we look after for NERC, it was like a dream come true to be able to organise that data efficiently and to enable users to access that data in-place without needing to download it. There is even space for scientists to store and share their own analysis work alongside it. But the data keeps on growing: we now talk in petabytes instead of terabytes, and the things that users want to be able to do have grown in scale and complexity as well!”
“On a practical level, JASMIN is helping UK researchers to better support government policy in building resilience to natural hazards, such as flooding, developing a more detailed understanding of how climate change will affect the UK and its economy.”
JASMIN has had a big impact on the UK, both in the science sector and the technology and computing sector, on matters that affect everyone. As Matt explains:
“It’s all about maintaining the UK’s scientific and competitive edge. Environmental science has some big challenges ahead and the organisations involved in JASMIN are already taking a leading role for the UK in many international efforts to address these challenges.”
“Physically, it’s a combination of computing power, massive storage and a very fast internal network that enables data analysis at scales that just weren’t possible before.” Says Matt.
It’s this combination that makes JASMIN a data-intensive computing facility – not quite a supercomputer and much more than just a data store – and what makes it so special.
“Many datasets are now too big to move around, so you have to bring the processing to the data or ideally develop both in the same place...Often researchers want to take the output of a climate model that’s been produced on a supercomputer, compare it against all the other models in the long-term archive and against observational datasets like entire satellite missions, then share those results with their collaborators. There’s only one place to do that at the moment and that’s JASMIN.”
While CEDA runs the services the enable users to access JASMIN and provides the public interface and help-desk for the user community, the physical parts of JASMIN – the computers, storage and network – are run by STFC’s Scientific Computing Department.
“It makes economic sense for the resources to be put in one place where they can be shared by the whole community, but that means we have to do a good job to buy the right kit, to operate it well, and to make sure that the scientists can get the most of those expensive resources, including the data itself.”
“Another unique feature of JASMIN is its private cloud. We can provide science projects with their own space in the JASMIN cloud, where they can be in charge of their own computing and storage resources and run the service they need to, showcasing their work in ways they’ve tailored to their own communities of users.”
“JASMIN is already pushing the boundaries of what’s possible with scientific cloud computing and large-scale storage: we’re currently the largest single deployment of the storage technology we use worldwide.” Says Matt.
As JASMIN’S user base continues to grow, Matt and his team are looking for new ways to expand its capabilities.
“One part of JASMIN is a computing cluster which can schedule and run users' processing tasks in an efficient way across thousands of computing cores. But the ability for users to ‘spin up’ their own virtual cluster in the cloud, with the specific amount of resources and time they need, is one very exciting development which we’re currently working on.”
“It’s even possible that this could work with the commercial cloud (like Amazon) to scale up their processing even further. This would open up lots of possibilities for very large processing tasks and therefore lots of future applications.”
For Matt, it is working with JASMIN’s user community that is really satisfying: “It’s great to be working in a really cutting-edge environment, surrounded by people who are doing things that no-one else is doing, at least at this scale and all at the same time.”
This means that Matt not only has to keep JASMIN running – he has to keep it at the cutting edge.
“It’s a constant challenge to balance the day-to-day running of JASMIN with the exploration of what the next big trends in scientific workflows will be – then making sure that JASMIN will be able to support those. We have a user conference every year and it’s fascinating to see the range of problems which are being addressed with JASMIN, and the innovation which is going on – I feel proud to be a part of all that.”