Transforming academic and industrial research with big data – with Tony Hey

07 December 2017

STFC’s Tony Hey on driving industrial and scientific breakthroughs with data intensive science.

From discovering the Higgs boson, to designing a new model of car, innovations in big data and high performance computing are helping transform virtually every aspect of academic and industrial research.

Researchers, both in the business and academic world, are depending more and more on big data management to analyse, visualise and make sense of the ever increasing amounts of complex data available to them, whether it’s for making complex business decisions, risk management, fighting crime, predicting natural disasters or developing a new shampoo.

STFC’s Chief Data Scientist, Tony Hey, has been leading the development of STFC’s E-infrastructure Strategy for both data-intensive science and high performance computing. We caught up with Tony to find out how he became involved in scientific computing, and what it’s like to be responsible for such a high profile strategy for STFC.

From physics student to data specialist – what was the inspiration behind your career in scientific computing?

My interest in science started with Fred Hoyle’s book The Nature of the Universe, in which I learned how the heavy elements that made up the Earth had been made in supernovae. (Although now we know that the heaviest elements are very probably made in the neutron star collisions detected recently by the LIGO detector!) Nevertheless, it was thanks to Hoyle that I chose to focus on physics, chemistry and maths. I didn’t have a particular career in mind at this point, I just knew I wanted to be a scientist of some sort – I was just fascinated by it all, and there was so much to learn.

I began my career with a PhD in particle physics from the University of Oxford in the UK, closely followed by research positions at the California Institute of Technology and CERN, and a professorship at the University of Southampton in England. There I became more and more interested in parallel computing and made the transition into computer science. My fascination with the idea of Big Data, and its significance to society led me to Microsoft and then to the University of Washington, before coming home to the UK, and to STFC just over two years ago.

As STFC’s Chief Data Scientist, what’s a typical project that you get involved with?

It seems that there’s no typical project when you’re working on ‘Big Scientific Data’ and computing at STFC – my role is wide-ranging and it’s been an amazing learning curve finding out all the myriad of activities undertaken at RAL and Daresbury Lab. Computing is an increasingly core necessity for all the major experimental facilities in the UK – such as ISIS Neutron and Muon Source and the Diamond Light Source – and major international research facilities – such as the LHC at CERN, data from which is managed and stored at RAL. Similarly, STFC’s Hartree Centre at Daresbury Laboratory is advancing the development of supercomputing software that will be able to handle the huge amounts of data created by future large-scale research initiatives, such as the Square Kilometre Array (SKA), which will be the largest radio telescope ever constructed. The data collected by the SKA in a single day would take nearly two million years to playback on an iPod! Large scale projects like these are at the cutting edge of information technology, and I get to be involved with all of them in some way or another.

I particularly like having the freedom to look across all ‘Big Data’ opportunities and issues across STFC as a whole. As part of this I have helped develop the STFC’s ‘E-infrastructure Strategy , which looks at all aspects of computing resources, software, data, networks and the essential people skills that are needed for STFC to deliver on this strategy. An important component of the strategy is the research network needed for high bandwidth transfers of large amounts of ‘Experimental and Observational Data’ (EOD) both within the STFC sites and also to universities across the UK. I’m also looking at how we support STFC’s open access strategy for open science. This will enable a new generation of researchers to share and communicate big scientific data and discoveries in a way that has never been done before. I have also recently been working with the US Department of Energy on both their Exascale Computing Project and on how the DOE Labs implement the US Government’s directive to make the results of their research more accessible to the public.

You have recently celebrated your second year anniversary at STFC – what are the most exciting challenges you have worked on so far? What does it feel like to work on such ground-breaking projects?

I am very excited about the Machine Learning (ML), AI and the cognitive computing revolution. With the major experimental facilities on the RAL site I am working to create a set of Big Scientific Data ML ‘benchmarks’. These will not only serve as a valuable training set for university researchers in applying ML techniques to the data analysis of their data but also enable them to solve some of the remaining data analysis challenge they face in order to extract new science from their data.

Another of my key current challenges is working with STFC’s Hartree Centre and its cognitive computing and big data collaboration with IBM. Cognitive computing refers to systems that can be trained to learn, reason and interact. It is an exciting area to be involved with right now, and will remain a very dynamic and evolving area of research, for both academia and industry, for the foreseeable future. The vision for the Hartree Centre is for UK industry to fully embrace and integrate the latest digital and cognitive technologies into business, to grow the economy and keep the UK at the forefront of industrial innovation.

The opportunities for cognitive computing to make a major difference to society are limitless. The Hartree Centre’s work with IBM and Alder Hey Children’s Hospital to create the UK’s first cognitive hospital is a great example of this; the mission of this project being to make a child’s stay at hospital, which is often an anxious time, more comfortable and less stressful.

I’m also particularly excited about the work of the JASMIN ‘SuperData cluster’ facility with the Centre for Environmental Data Analysis (CEDA). Hosted and operated by the Scientific Computing Department at RAL, and funded by NERC and the UK Space Agency, JASMIN handles environmental data from research that spans space, climate and the world’s oceans, providing data storage and analysis to academics and industry researchers, particularly around climate change modelling. This innovative ‘High Performance Data Analytics’ system is revolutionising the way scientists collaborate on environmental science data, turning the process of analysing 30 years of data into a task that takes just days instead of months or years.

What are your aspirations for the future?

Big data and high performance computing are making their mark on all our lives, faster than ever. It took 13 years to sequence the first human genome, but we can now sequence a human genome in a matter of hours. And it is in the worlds of science and research - environment, climate, medicine, health, manufacturing and working with industry – that we are experiencing the significant social and economic impacts of big data, in all aspects of life.

These are exhilarating times and we have to make sure our future generations are equipped to rise to the challenges. I have a passionate interest in communicating the excitement of science to young people and would like to do more of this in the future. In fact, I think that if I was not now working with STFC, I would most probably have gone back to being a university professor!

Science and Technology Facilities Council Switchboard: 01793 442000