Graduate Student, Carnegie Mellon University
Qing is a 3rd-year Ph.D. student in the Computer Science Department at Carnegie Mellon University. He is a member of the Parallel Data Lab and he is working with Prof. Garth Gibson on designing big-scale parallel file systems for high-performance computing. Their IndexFS paper has won Best Paper Award at Supercomputing Conference (SC) 2014. Qing is currently working with Brad Settlemyer and John Bent at USRC on designing DeltaFS that extends IndexFS, and MarFS V2 that are envisioned to replace Lustre.
PhD Student, Computer Science & Engineering, University of N Texas
Ziming Zhang obtained his bachelor's degree in computer science and engineering from Beihang University in China 2009. He began to pursue his doctorate at New Mexico Tech in 2009 and worked as research assistant under Dr.Song Fu's supervision. In 2010, he transferred to the University of North Texas to continue his education. His research interest is primarily in power management and system dependability in virtualized cloud computing environment.
At the USRC, Zhang is working on power-aware task scheduling and placement techniques in HPC system.
PhD Student, Computer Science & Engineering, University of California Riverside
Panruo Wu got his bachelor’s degree in mathematics from University of Science and Technology of China in 2011. He is currently a PhD candidate in UCR. His research interests include fault tolerance in parallel and distributed systems and numerical algorithms.
At the USRC he works on F-SEFI fault injector and developing highly fault tolerant algorithms that can run correctly and efficiently in the presence of numerous architectural faults.
PhD Student Computer Science, Illinois Institute of Technolodgy
I am now a second-year Ph.D candidate of the Department of Computer Sciecne (CS) at Illinois Institute of Technology (IIT). I am a member of the Data-Intensive Distributed Systems Laboratory (DataSys)
at IIT, and working with Professor Ioan Raicu
. My research work and interests are in the general area of distributed systems.
At USRC I am working on simulations to build general framework for distributed system services, such as key-value store, job launch, job scheduling, and distributed file system, with up to exascales (millions of nodes). Our simulator supports different architectures, centralized server, centralized server with multiple aggregation servers, distributed servers with fully connected topology, and distributed servers with chord protocol. Also, we implemented different models, churn model, replication model, replication model and consistency model. Our goal is to understand with all these models, to a certain, which architecture makes sense for system services.
Undergraduate Student, University of St. Thomas
Emily Vecchia is an undergraduate student majoring in Mathematics with a minor in Computer and Information Sciences at the University of Saint Thomas in Saint Paul, Minnesota. This summer, she is working with her mentor, Laura Monroe, on a probabilistic computing project testing the resilience of algorithms to faults. In the fall, Emily will return to St. Thomas where she will be a senior.
PhD Student, Computer Science, University of Illinois at Urbana-Champaign
Lewis Tseng is a PhD student in the department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests include fault-tolerant distributed algorithms and systems. In particular, he mainly focus on related issues of consensus or consistency in theoretical models that capture the characteristics of large-scale systems in both wired and wireless network.
At USRC, Lewis designed and developed a distributed key-value store to support service registration and lookup service for HPC systems. The main goals of the new key-value store are: (i) autonomous system management, (ii) configurable per-key consistency and replication at the data granularity, (iii) flexible and configurable deployment based on different policies for power, performance and resilience.
PhD Student, Computer Science, Florida State University
Zhou is a Ph.D. student in the Department of Computer Science at Florida State University, and he received his B.S. from Millsaps College. His research includes interconnection network, parallel application performance modeling and data mining. At the USRC, Zhou worked on performance evaluation and communication modeling on MPI applications to provide fast classification assessment based on the understanding of the characteristics of various parallel applications in production HPC systems.
Dr. Li Tan,
Postdoctoral Researcher, Los Alamos National Laboratory
Li Tan graduated with a Ph.D. degree in Computer Science from University of California, Riverside (UCR) in 2015. His chief research interest is High Performance Computing (HPC), in particular improving resilience/reliability and energy/power efficiency for high performance scientific algorithms and applications, and software debugging in large-scale HPC environments. At USRC, he works in fine-grained resilience and low-power modeling and provisioning for HPC applications, using fault injection and near-threshold voltage reduction techniques. He served as a reviewer for prestigious conferences and journals on high performance parallel and distributed computing, such as SC, IPDPS, PACT, CCGrid, IEEE TPDS, IJHPCA, and JSS. He is a recipient of Dean's Distinguished Fellowship from UCR in 2010. He is a Member of the IEEE and a Member of the ACM.
Graduate Computer Science Dept, University of New Mexico
Philip is an ex-game developer turned computer science student. In Spring 2005 he was accepted into the computer science department at the University of New Mexico and in Summer 2005, he joined the UNM CS department's Scalable Systems Lab. He graduated with his BS in CS in Spring 2011 and joined the USCR in Summer 2011. His research interests span the realms of virtualization and supercomputing. On his free time he does autocross on his FC RX-7 - an older, Japanese sports car that is turning out to be a continuous project and a lesson in patience.
Philip worked on fault tolerant, scalable system services for ultra-scale computing.
Graduate Student, Ohio State University
Ryan Slechta received his Bachelors degree in Mathematics and Computer Science from the University of St. Thomas in May 2016. He works with NMC staff and scientists on problems of algorithmic resilience, and is currently working to improve the reliability of erasure code techniques. In the fall, he will be joining the Topology, Geometry, and Data Analysis group at The Ohio State University.
PhD Student, School of Computing & Information Sciences, Florida International University
Doug is a PhD student from Florida International University's School of Computing and Information Sciences in Miami. It was here that he began his graduate studies after graduating in the Summer of 2011 and receiving the CS program's award for Outstanding Graduate.
While in Miami, Doug works at lab for Virtualized Infrastructure, Software and Applications at FIU doing research on caching for large scale systems and scheduling optimizations for SSDs.
During his time at USRC, Doug will be working to extend a preexisting project "Transparently Consistent Asynchronous Shared Memory" by exploring applications in check pointing and in-situ data analysis.
Graduate Student, New Mexico State University
Nicholas received his bachelor's degree in micro-biology from Cornell University in 2010. After working in Flow Cytometry and Scanning Electron Microscopy he entered the computer science graduate program at New Mexico State University. Nicholas is researching parallel programing in high performance computing (HPC). At the USRC, Nicholas worked on benchmarking OpenSHMEM in contrast to MPI.
Adam P Morrow,
Undergraduate Student, Brigham Young University
Adam is an undergraduate intern at the New Mexico Consortium where he is working on mapping error clusters to originating faults and errors in DRAM units on leadership-class supercomputing machines. He is pursuing a B.S. degree at Brigham Young University in Applied Computational Mathematics and Computer Science.
PhD Student, Florida State University
Atiqul received his Bachelor’s degree from Bangladesh University of Engineering and Technology. Currently he is a PhD student at Florida State University. His area of research interest include Software Defined Networking(SDN), Interconnection Networks, Data center networks and Parallel Processing using MPI.
At USRC, he is working on implementation of dynamic and scalable services in HPC systems using OpenFlow/SDN.
PhD Student Computer Science, University of Houston
Kshitij is a high performance computing (HPC) researcher and programmer. He works on developing Parallel I/O interfaces for OpenMP and shared memory machines as part of his Phd. Additionally, he works on performance modeling of parallel I/O operations. His specialties are C, Posix Threads, OpenMP, Parallel I/O, MPI, and modeling.
At the USRC
Kshitij worked on integrating HDF5 with PLFS, writing a plugin using HDF5's Virtual Object Layer (VOL) that uses LANL's PLFS.
PhD Student, Computer Science, Florida State University
I am a PhD candidate in the Computer Science department at Florida State University (FSU), Tallahassee, Florida, US. My areas of interest include general areas of Computer Networks, Interconnection Networks, Parallel Architectures, Storage Area Networks, Data Centers, High Performance Computing (HPC) clusters. His current area of research is routing and load-balancing in high performance computing (HPC) clusters and the Data Center networks.
Santosh worked with the USRC Systems group on Interconnection networks evaluating different topologies across various routing schemes to identify the best networking paradigms for the next generation super-computers.
Master's Student, Computer Science, Florida State University
Jason Lee is a master's student at Florida State University and received a B.S. from Rensselaer Polytechnic Institute. His interests include cryptography, parallel programming, and networking.
At the USRC he is working on software defined networking with Infiniband for high performance computing systems.
Undergraduate Student, Coastal Carolina University
Scott is an undergraduate student from Coastal Carolina University, pursing a B.S. in Computer Science with a minor in Applied Math. He has worked on algorithm-based fault tolerance (ABFT) and the inherent resilience of integer operations. At the USRC, he is investigating the resilience of currently used supercomputer applications.
Student, Los Alamos National Laboratory
Mitchell Klein is a summer intern at Los Alamos where he is working on a project that tests the resilience of algorithms. He recently earned his B.A. in Applied Mathematics from the University of St. Thomas in St. Paul, Minnesota. In the future, Mitchell plans to pursue graduate studies in mathematics or a related field.
PhD Student Electrical & Computer Engineering, U of Illinois at Urbana Champaign
Stevenson is a PHD student in the department of Electrical and Computer Engineering, at the University of Illinois at Urbana Champaign. His research interest has been on reducing the power and storage overhead of error resilient memories.
Stevenson was at USRC
to study the reliability of different strengths of chipkill correct memories under different faulty DIMM replacement policies.
Ph.D. Student, Department of Computer Science and Engineering, University of North Texas
Song Huang is a Ph.D. student in the Department of Computer Science and Engineering at the University of North Texas. He works in the Dependable Computing Systems Lab directed by Dr. Song Fu. His research interests include power and energy consumption on the HPC system, disk failure modeling and analysis, resilience and fault tolerance techniques on the HPC system.
Currently, he works at USRC on characterizing the power consumption on the Haswell machines and resource allocation and scheduling on the HPC system.
PhD Student in Reconfigurable Computing System Lab, University of North Carolina at Charlotte
Bin is a PhD student in Reconfigurable Computing System Lab at the
University of North Carolina at Charlotte. His research interest
includes novel computer architecture using FPGA technology and
resilient many-core chip.
At USRC, Bin conducted research on a resilient runtime system framework for heterogeneous many-core architecture which combines a few reliable cores and many less reliable cores.
PhD Student Computer Science, Illinois Institute of Technolodgy
I am a Ph.D. student at the Department of Computer Science, Illinois Institute of Technology. I received my B.E. and M.S. from Hunan University, China. Currently, my research interests are parallel file systems, I/O analysis and optimization.
My homepage is http://mypages.iit.edu/~jhe24/.
For my work at USRC, I am working on developing and evaluating algorithms by which patterns in the PLFS (Parallel Log-structured File System) metadata can be discovered and then used to replace the current metadata, in order to reduce metadata size
Rusty H Davis,
Graduate Student, Clemson University
Rusty graduated with his B.S. in computer science from the School of Computing at Clemson University in May 2016. He will begin pursuing his masters of Computer Science at Clemson University in Fall 2016. Rusty has been working with the USRC since the summer of 2014. His initial work was with Dr. Nathan DeBardeleben and Dr. William Jones concerning Algorithmic-Based Fault Tolerant Matrix Multiplication. His current work is focused on quantifying the resiliency of Algorithmic-Based Fault Tolerant Fast Fourier Transforms and creating an interface for the F-SEFI fault injector. His research interests include High Performance Computing, Operating Systems, and Resilience/Fault tolerance.
PhD Student, Texas Tech University
I received my B.E. and Master degrees in computer science at the Hunan University, China. I am currently a PhD student in the Department of Computer Science of Texas Tech University since 2011. My research interesting in data intensive computing, parallel computing, and storage.
My work at USRC is developing and evaluating a new paradigm for Data-intensive applications to reduce the impact of I/O limitations.
PhD Student, Computer Science, University of New Mexico
Zhenjie is a graduate student at The University of New Mexico and joined Scalable Systems Lab at CS@UNM from 2012. He mainly focus on scalable system and fault tolerance. He likes hiking, snowboarding, skating and coding, and ...
At USRC, Zhenjie is working on integrating Scalable Information Propagation service into LIBI(Lightweight Infrastructure-Bootstrapping Infrastructure), in order to improve the performance of bootstrapping numerous processes especially the wire-up procedure.
Undergraduate Student, Computer Engineering and Computer Science, University of Kentucky
Michael Carlton is an undergraduate with a dual major in Computer Engineering and Computer Science at the University of Kentucky. This summer at LANL he worked with his mentors, Nathan DeBardeleben and Sean Blanchard, to develop a new method for handling memory hardware errors. This project was a success and will hopefully be utilized in the near future on production machines. Michael recently returned to college to begin his senior year and will be working as both a Teaching Assistant and Resident Advisor during the academic year.
PhD Student Computing & Info Sciences, Florida International University
Daniel is a Venezuelan student in the School of Computing and Information Sciences of Florida International University. He has been there since August 2010 to pursue his PhD degree and has been working on research projects in Operating Systems, specifically in Storage Systems. He also works as a Research Assistant in the Computing Department. His undergraduate degree is in Computer Engineering, and he obtained it in the Universidad Simon Bolivar located in Caracas - Venezuela, where he also worked as a System Administrator of the entire CS network. Daniel is currently working on the project "SoftPM: Software Persistent Memory", inspired by converting memory structures in persistent structures living in different types of storage media, all of this in a manner transparent to the developer.
Daniel's work at USRC was to extend the SoftPM work for parallel computing and to examine the research challenges therein.
PhD Student, University of Notre Dame
Michael Albrecht is a Ph.D. student at the University of Notre Dame and part of the ND Cooperative Computing Lab. His primary research area is in distributed systems, with a focus on distributed storage, data-aware computing, and active storage.
Michael developed and tested the feasibility of a hierarchical data and workflow manager for exascale supercomputing to deal with both data-intensive computing as well as simulation checkpointing and restart.
PhD Student Computer Science, New Mexico Tech
I received my B.E. in Metallurgical and Materials Engineering from Middle East Technical University, Turkey. I then received a M.S. in Computer Science from New Mexico Tech where I am currently pursuing a doctoral degree in the same program. My earlier research was focused on web technologies and cloud computing specialized in access control in those environments.
At the USRC, I am investigating OS and system software scalability for exascale computing.