NMC Scientists Attend SC23 Conference

NMC Scientists Attend SC23 Conference

NMC Scientists Attend SC23 Conference

New Mexico Consortium scientists Qing Zheng, John Bent, and Alex Lovell-Troy recently attended the Super Computing (SC) Conference, held in November 2023 in Denver, Colorado. This conference is an International Conference for High Performance Computing, Networking, and Storage. The goal of the conference is to bring together the HPC research and user community for a discussion of the latest trends and applications of HPC.

While at the conference, Qing Zheng assisted at a LANL/SK hynix co-demonstration at SK hynix’s exhibition booth, and also presented a short vision paper at the 8th International Parallel Data Systems Workshop (PDSW). This workshop, held in conjunction with SC’23, is a cooperation of the IEEE Computer Society and the Association for Computer Machinery. Zheng’s presentation was titled, “Toward Standardized, Open Object-Based Computational Storage For Large-Scale Scientific Data Analytics.

Open standards facilitate interoperability, community support, and vendor neutrality. Just as NFS sets the protocol for network attached storage and ANSI T10 defines SCSI devices’ Object-based Storage Device (OSD) command set. This presentation advocated for a similar standardization effort for object-based computational storage due to the growing interest in object storage as a major data analytics data source, the increasing recognition of data movement as a key latency factor for long queries, a lack of open standards for delegating filters or other types of data reduction functions to object-based storage to speed up data access, and the advent of NVMe as a modern replacement for the SCSI storage device interface.

Zheng and his colleagues envision a standard high-level computational object storage interface and a low-level Object-Based Computational Storage Device (OBCSD) command set for NVMe-based storage devices. A typical setup would consist of one or more gateway servers implementing the high-level interface and a collection of NVMe devices implementing the low-level interface. In his presentation, he discussed the rationale behind such an open, object-based computational storage stack, its relationship with existing protocols, and its potential in speeding up large-scale data storage, management, retrieval, and analytics.

To learn more about PDSW 2023 see: https://www.pdsw.org/index.shtml

John Bent, also attended SC23 and his student Meng Wang from the University of Chicago presented a paper titled, “Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers”.

Multi-level erasure coding (MLEC) has seen large deployments in the field, but there is no in-depth study of design considerations for MLEC at scale. In this paper, Bent and colleagues provide comprehensive design considerations and analysis of MLEC at scale.

They introduce the design space of MLEC in multiple dimensions, including various code parameter selections, chunk placement schemes, and various repair methods. They quantify their performance and durability, and show which MLEC schemes and repair methods can provide the best tolerance against independent/correlated failures and reduce repair network traffic by orders of magnitude. To achieve this, they use various evaluation strategies including simulation, splitting, dynamic programming, and mathematical modeling.

They also compare the performance and durability of MLEC with other EC schemes such as SLEC and LRC and show that MLEC can provide high durability with higher encoding throughput and less repair network traffic over both SLEC and LRC.

To learn more read the entire technical paper at:

https://sc23.supercomputing.org/proceedings/tech_paper/tech_paper_pages/pap349.html

Last, Alex Lovell-Troy launched Ochami at SC23. OpenCHAMI, which stands for Open Composable Heterogeneous Adaptable Management Infrastructure, was founded in 2023 as a collaboration between Los Alamos National Laboratory, the National Energy Research Scientific Computing (NERSC) Center at Lawrence Berkeley National Laboratory, the Swiss National Supercomputing Center (CSCS), Hewlett Packard Enterprise (HPE) and the University of Bristol.

Due to the challenges of running sophisticated applications including complex simulations, data analytics, artificial intelligence and heterogenous workflows at scale in hybrid computing environments, Ochami was formed as an open-source community to develop and support a framework for better systems management.

Ochami will support new and existing applications and workflows, and embraces modern systems-management methods while building in flexibility for sites to develop and deploy their preferred tools to enable plugins, micro-services and multi-tenancy solutions.

To learn more see the Ochami website at: https://www.openchami.org/