In this work, we applied advanced machine learning and natural language processing techniques to construct a dataset of 35,675 solution-based synthesis "recipes" extracted from the scientific literature. Each recipe contains essential synthesis information including the precursors and target materials, their quantities, and the synthesis actions and corresponding attributes. Every recipe is also augmented with the reaction formula. Through this work, we are making freely available the first large dataset of solution-based inorganic materials synthesis recipes.
This repo contains necessary codes and modules built to create the solution-based synthesis dataset. If you find the codes and data useful, please cite our papers:
Dataset:
- Wang, Z., Kononova, O., Cruse, K., He, T., Huo, H., Fei, Y., Zeng, Y., Sun, Y., Cai, Z., Sun, W. and Ceder, G. Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature. Sci Data 9, 231 (2022). https://doi.org/10.1038/s41597-022-01317-2
Paragraphs classification:
- Huo, H., Rong, Z., Kononova, O., Sun, W., Botari, T., He, T., Tshitoyan, V. and Ceder, G., 2019. Semi-supervised machine-learning classification of materials synthesis procedures. npj Computational Materials, 5(1), p.62. https://doi.org/10.1038/s41524-019-0204-1
Materials Entity Recognition (MER):
- He, T., Sun, W., Huo, H., Kononova, O., Rong, Z., Tshitoyan, V., Botari, T. and Ceder, G., 2020. Similarity of Precursors in Solid-State Synthesis as Text-Mined from Scientific Literature. Chemistry of Materials, 32(18), pp.7861-7873. https://doi.org/10.1021/acs.chemmater.0c02553
Synthesis Action Retrieval:
- Wang, Z., Cruse, K., Fei, Y., Chia, A., Zeng, Y., Huo, H., He, T., Deng, B., Kononova, O. and Ceder, G., ULSA: unified language of synthesis actions for the representation of inorganic synthesis protocols. Digital Discovery (2022) https://doi.org/10.1039/D1DD00034A