Scientific data archive reaches 10 petabytes

Thursday 9 May 2012

Growth in STFC scientific data archive storage

The April 2012, usage of the on-line tape robot data archive in the STFC scientific data centre at the Ruthurford Appleton Laboratory broke the ten petabyte barrier for the first time. Ten petabytes is enough to store 133 years of continuous high definition digital television, or 100 billion pictures on Facebook.

Since 1997 the growth in usage shown in the graph looks exponential. In the last 18 months the storage used has doubled from five to ten petabytes, as it doubled from 4PB to 8PB in the 18 months from Feb 10 to Jun 11.

The graph shows the four categories of data stored in the tape robot archive:
  • CERN high energy physics data from the Atlas, LHCb, CMS and Alice experiments on the Large Hadron Collider (LHC) as well as other experiments including NA62 and MICE. These data are stored using the CERN designed Castor infrastructure. RAL Tier-1 computers processed their first data from CERN LHC detectors in Nov 2009. Since then LHC data in Castor has been the major source of data growth in the archive.
  • Castor is aso used to store data from STFC's UK facilities (FaC).
  • ADS is the old archive with lots of small backups.
  • DMF includes the preserved documents from the CEDA Repository at the British Atmospheric Data Centre, and Tesella Safety Deposit Box (SDB) services to preserve data collected on the ISIS, Diamond and CLF facilities.

The scientific data centre holds another 10 petabytes of data which is actively used on disk, but the growth in storage demand is well illustrated by the evolution of the archive which has two robots that have 10,000 tape slots each. Currently, the slots hold drives and tapes bought over the last few years as technology has developed. Growth can be managed with a fixed number of tape slots if data is continuously migrated to higher capacity tapes. Many current tapes only store 200GB, 500GB or 1TB per tape, while the largest current capacity is 5TB for the most recent tapes. The potential storage capacity of the archive upgraded to this technology would be 100PB, which only a few years ago sounded like enough to last for decades. But given the current rates of growth in scientific data volume, this will only last five more years until 2017. But by then tapes will be able to store 10TB and maybe even 20TB per drive.

Contacts



Related links:

© 2013 Science and Technology Facilities Council - All Rights Reserved.