Taming the Big Data Beast With Machine Learning

An illustration showing several red circles with one blue circle, signifying a phase transition as recorded by the X-TEC method. Sometimes it really is possible to have too much of a good thing. As X-ray facilities such as the Advanced Photon Source – a U.S. Department of Energy (DOE) Office of Science user facility at DOE’s Argonne National Laboratory – increase steadily in their capabilities, such as source brightness and detector technology, researchers find themselves swamped in an ever-growing tsunami of data. 

Sifting out the scientific jewels that may be hidden in all that information, especially concerning new or obscure phenomena previously undetectable by earlier-generation facilities, is a daunting challenge. A group of physicists and computer scientists from Cornell University, joined by colleagues from several other institutions and the National Institute of Standards and Technology, has responded to that challenge by developing a machine learning strategy that can extract charge density wave (CDW) – an ordered modulation of electrons – and intra-unit-cell (IUC) parameters from high volumes of X-ray diffraction data at multiple temperatures. Their work was published in the Proceedings of the National Academy of Sciences.

The team's approach, called X-TEC (X-ray diffraction temperature clustering), exploits the fundamental role that temperature plays in both long-range and short-range structural correlations. The resulting effects can be extremely subtle and virtually impossible to detect amidst the huge amount of Bragg peak information that is the subject of conventional crystallographic analysis. With X-TEC, order parameters and structural fluctuations related to CDW and IUC order can be discerned across many thousands of Brillouin zones and within many gigabytes or even terabytes of data in a few minutes, a task far beyond manual analysis techniques.

Working at APS beamline 6-ID-D, the researchers first benchmarked the X-TEC algorithm using a compound in the quasi-skutterudite family with a proposed CDW quantum critical point. They were able to uncover the CDW order parameter and its dependence on calcium concentration, and thus provide its phase diagram. Comparison of X-TEC results with manual methods showed excellent agreement.

X-TEC allows the user to choose between two data clustering modes: smoothed (X-TEC-s), which is best suited for detecting order parameters, and detailed (X-TEC-d), intended to pick up finer scattering details that reveal the nature of fluctuations. The team demonstrated both modes to study IUC order and order parameter fluctuations in a pyrochlore metal whose structural phase transitions have been the subject of much debate.

Using both modes of X-TEC, the investigators were able to determine an effective selection rule from X-ray diffraction measurements, comprising 8 terabytes of data and thousands of Brillouin zones, that allowed them to extract atomic-scale information tied to temperature. Because the X-TEC system is able to work with the entire amount of available data, rather than fitting a limited set of Bragg peaks as would be required using a manual approach, the procedure can achieve far greater precision and accuracy, ferreting out new discoveries that would otherwise be lost in an ocean of data.

X-TEC can easily be integrated into the experimental workflow at the beamline to allow real-time guidance and refinement of experiments. Researchers can control data analysis even as information is being collected, focusing on particularly interesting aspects as they are uncovered. The strategy is also not limited to handling X-ray diffraction data but can be readily adapted to other disciplines that need to handle prodigious amounts of information.

The X-TEC approach is an example of physicists and computer scientists working together, using the tools of one field to solve problems in another. Instead of an overwhelming burden, the vast amounts of information gathered by the ever-growing capabilities of the latest state-of-the-art X-ray facilities can be used to their fullest potential.  – Mark Wolverton

 


 

See: J. Venderley1,, K. Mallayya1, M. Matty1, M. Krogstad2, J. Ruff1, G. Pleiss1, V. Kishore1, D. Mandrus3, D. Phelan2, L. Poudel4, 5, A. G. Wilson6, K. Weinberger1, P. Upreti2, 7, M. Norman2, S. Rosenkranz2, R. Osborn2, E-A Kim4, “Harnessing Interpretable and Unsupervised Machine Leaning to Address Big Data from Modern X-ray Diffraction,” Proceedings of the National Academy of Sciences 119 (24) e2109665119 (June 2022)

Author affiliations: 1Cornell University; 2Argonne National Laboratory; 3University of Tennessee; 4University of Maryland; 5National institute of Standard and Technology; 6New York University; 7Northern Illinois University

The experiments (M.K., S.R., R.O., P.U., and D.P.), and the subsequent machine learning analysis and theoretical interpretations of the results (E.A.K., V.K., J.V., M.N., and K.M.), were supported by the US Department of Energy (DOE), Office of Science, Office of Basic Energy Sciences, Division of Material Sciences and Engineering. Initial development of X-TEC (E.A.K., A.G.W., K.W., and G.P.) was supported by NSF HDR-DIRSE (Harnessing Data Revolution - Data Intensive Research in Science and Education) award OAC-1934714, and testing on TiSe22 data was supported by US DOE, Office of Basic Energy Sciences, Division of Materials Science and Engineering, under Award DE-SC0018946 (J.V.). M.M. acknowledges support by the NSF (Platform for the Accelerated Realization, Analysis, and Discovery of Interface Materials) under cooperative agreement DMR-1539918 and the Cornell Center for Materials Research with funding from the NSF MRSEC (Materials Research Science and Engineering Centers) program (grant DMR-1719875). This research used resources of the Advanced Photon Source, a US DOE Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under contract DE-AC02-06CH11357. Research conducted at CHESS (Cornell High Energy Synchrotron Source) is supported by the NSF via awards DMR-1332208 and DMR-1829070.

The U.S. Department of Energy's APS at Argonne National Laboratory is one of the world’s most productive x-ray light source facilities. Each year, the APS provides high-brightness x-ray beams to a diverse community of more than 5,000 researchers in materials science, chemistry, condensed matter physics, the life and environmental sciences, and applied research. Researchers using the APS produce over 2,000 publications each year detailing impactful discoveries and solve more vital biological protein structures than users of any other x-ray light source research facility. APS x-rays are ideally suited for explorations of materials and biological structures; elemental distribution; chemical, magnetic, electronic states; and a wide range of technologically important engineering systems from batteries to fuel injector sprays, all of which are the foundations of our nation’s economic, technological, and physical well-being.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC, for the U.S. DOE Office of Science.

The U.S. Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.

Published Date