paper with an award icon

Proposal by engineering’s Yinghui Wu, Alp Sehirlioglu selected for NSF Cyberinfrastructure for Sustained Scientific Innovation program

The National Science Foundation Cyberinfrastructure for Sustained Scientific Innovation program accepted a proposal by Yinghui Wu, assistant professor of computer and data sciences, and Alp Sehirlioglu, associate professor of materials science and engineering.

Their proposal is titled “Crowdsourced Materials Data Engine for Unpublished XRD Results.” The project will begin Aug. 1 and aims to develop a new data engine that is a knowledge graph in its core.

Although data-driven analysis has been heralded as a new paradigm in fundamental material science such as X-ray Diffraction (XRD) analysis, high-value material datasets are often not made public and are underutilized. This project aims to design and develop CRUX, a crowdsourced data infrastructure, and services to curate, discover, share, and recommend unpublished XRD data and analytical results. CRUX will promote the underutilized high-quality material science data by allowing the sharing and exploration of unpublished data with state-of-the-art crowdsourcing, knowledge harvesting, and machine-learning techniques. 

CRUX provides a crowdsourced knowledge base to allow scientists and the general public to share and access unpublished data resources. It also provides:

  • A novel search engine that supports simple keyword search, can provide relevant data resources when the exact keyword matching does not exist and self-evolves to improve the search quality; and 
  • A “data feed” service to allow users to easily receive and track updates of specific data resources of interest. 

The proposed infrastructure and tools enable an open, collaborative and sustainable platform that can facilitate exchanging of unpublished XRD data and discoveries, unlock new research problems (e.g., predictive analysis of materials compositions with multi-phase data), and inspire the novel design of machine learning pipelines (e.g., deep neural networks) for data-driven materials science. 

Their code project also engages developers to contribute to fundamental computational tools to accelerate materials discovery and innovation.