New Data Management Capabilities at the APS

 

Data are essential to the scientific discoveries enabled by the experimental techniques performed at the APS. At present, the APS collects an estimated 2 PB (2×1015 bytes) of raw experimental data per year. However, data volumes and rates are quickly increasing, due to beamline advances, such as improved detectors, high-throughput instrumentation, and multi-modal instruments that can acquire several measurements in a single experiment. This trend is expected to continue in the future and will be boosted even further by the improved source and instruments planned as part of the APS-U project. Successful management of big data is of particular importance to the current and future scientific productivity of the APS.

Historically, the task of managing and distributing data at the APS has been left to individual user groups and beamline staff. This process usually consists of manually copying large amounts of data to removable hard drives, which users either carry or ship to their home institutions or beamline staff store in offices. Data are rarely cataloged, and when this is done, paper logbooks are the most popular method. This process is very tedious, prone to errors, and cannot scale with anticipated data sizes. Two recent advances will better aid the APS in managing big data.

In the beginning of the 2017-1 run, the APS brought the Extrepid data storage system into production use, making 1.5 PB of storage available for APS experiments. Managed by the APS, and using surplus hardware, Extrepid is housed in a Computing, Environment, and Life Sciences (CELS) data center located in Building 369. Extrepid is connected to the APS via two dedicated 10 Gbps network links, which can be increased if needed. See Fig. 1.

To best use this new storage capability, data management software has been developed within XSD that integrates with beamline data workflows to transfer and catalog data, including tracking the experimenters who ran the measurement. Additional metadata may be added to a metadata catalog. The system sets access permissions so that researchers can download data at their home institutions using the Globus Online data transfer tool.

These resources and tools are now deployed at many APS X-ray Science Division (XSD) beamlines and can be made available to non-XSD beamlines by special arrangement. Due to these advances, the APS is now better equipped to realize the data management tasks critically needed to deal with the deluge of data the APS produces at present as well as the much larger volumes to come.

Nicholas Schwarz ([email protected]), Sinisa Veseli ([email protected]), Roger Sersted ([email protected]) and Brian Toby ([email protected])

Acknowledgements: Many helped to make this work possible – APS/XSD: Jon Almer, Francesco De Carlo, Barbara Frosik, Arthur Glowacki, Doga Gursoy, Peter Kenesei, Faisal Khan, Wenjun Liu, Suresh Narayanan, Jun-Sang Park, Jon Tischler, Stefan Vogt, and Ruqing Xu; APS/AES – Brian Pruitt, Brian Robinson, Giampiero Sciutto, Ken Sidorowicz, and Dave Wallis; CELS/ALCF: William Allcock and Mike Papka; Globus Services: Rachana Ananthakrishnan and Ian Foster. Work at the APS is supported and maintained by the XSD Scientific Software Engineering & Data

 

O                   O                   O                   O                   O                   O                   O                   O                   O                   O

The Advanced Photon Source is a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.

The U.S. Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.

 

Published Date