Akiba T, Sano S, Yanase T, Ohta T, & Koyama M (2019). Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2623–2631)
Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA (2023) Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 15(1):73
Article PubMed PubMed Central Google Scholar
Chen T, & Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794)
Chen L, Pilania G, Batra R, Huan TD, Kim C, Kuenneth C, Ramprasad R (2021) Polymer informatics: current status and critical next steps. Mater Sci Eng R Rep 144:100595
Doan Tran H, Kim C, Chen L, Chandrasekaran A, Batra R, Venkatram S, Ramprasad R (2020) Machine-learning predictions of polymer properties with polymer genome. J Appl Phys. https://doi.org/10.1063/5.0023759
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat. https://doi.org/10.1214/aos/1013203451
Liu G, Zhu Y, Chen J, & Jiang M (2025) NeurIPS - Open Polymer Prediction 2025. https://kaggle.com/competitions/neurips-open-polymer-prediction-2025, 2025. Kaggle.
Gartner TE III, Jayaraman A (2019) Modeling and simulations of polymers: a roadmap. Macromolecules 52(3):755–786
Grinsztajn L, Oyallon E, Varoquaux G (2022) Why do tree-based models still outperform deep learning on typical tabular data? Adv Neural Inf Process Syst 35:507–520
Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inf Comput Sci 35(6):1039–1045
Hancock JT, Khoshgoftaar TM (2020) Catboost for big data: an interdisciplinary review. J Big Data 7(1):94
Article PubMed PubMed Central Google Scholar
Ishii M, Ito T, Sado H, Kuwajima I (2024) NIMS polymer database polyinfo (I): an overarching view of half a million data points. Sci Technol Adv Mater Methods 4(1):2354649
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, ... & Liu T Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inform Process Syst 30.
Kim C, Chandrasekaran A, Huan TD, Das D, Ramprasad R (2018) Polymer genome: a data-powered polymer informatics platform for property predictions. J Phys Chem C 122(31):17575–17585
Kuenneth C, Rajan AC, Tran H, Chen L, Kim C, Ramprasad R (2021) Polymer informatics with multi-task learning. Patterns. https://doi.org/10.1016/j.patter.2021.100238
Article PubMed PubMed Central Google Scholar
Kuenneth C, Ramprasad R (2023) PolyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat Commun 14(1):4099
Article CAS PubMed PubMed Central Google Scholar
Landrum G (2013) Rdkit documentation. Release 1(1–79):4
Liang Z, Li Z, Zhou S, Sun Y, Yuan J, Zhang C (2022) Machine-learning exploration of polymer compatibility. Cell Rep Phys Sci. https://doi.org/10.1016/j.xcrp.2022.100931
Lin TS, Coley CW, Mochigase H, Beech HK, Wang W, Wang Z, Woods E, Craig SL, Johnson JA, Kalow JA, Jensen KF, Olsen BD (2019) Bigsmiles: a structurally-based line notation for describing macromolecules. ACS Cent Sci 5(9):1523–1531
Article CAS PubMed PubMed Central Google Scholar
Martin TB, Audus DJ (2023) Emerging trends in machine learning: a polymer perspective. ACS Polym Au 3(3):239–258
Article CAS PubMed PubMed Central Google Scholar
Meaney C, Wang X, Guan J, Stukel TA (2025) Comparison of methods for tuning machine learning model hyper-parameters: with application to predicting high-need high-cost health care users. BMC Med Res Methodol 25(1):134
Article PubMed PubMed Central Google Scholar
Park J, Shim Y, Lee F, Rammohan A, Goyal S, Shim M, Jeong C, Kim DS (2022) Prediction and interpretation of polymer properties using the graph convolutional network. ACS Polym Au 2(4):213–222
Article CAS PubMed PubMed Central Google Scholar
Prokhorenkova L, Gusev G, Vorobev A, Dorogush A V, & Gulin A (2018) CatBoost: unbiased boosting with categorical features. Adv Neural Inform Process Syst, 31.
Rodríguez-Pérez R, Bajorath J (2019) Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem 63(16):8761–8777
Rodríguez-Pérez R, Bajorath J (2020) Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J Comput Aided Mol Des 34(10):1013–1026
Article PubMed PubMed Central Google Scholar
Shwartz-Ziv R, Armon A (2022) Tabular data: deep learning is not all you need. Inf Fusion 81:84–90
Stuart S, Watchorn J, Gu FX (2023) Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials. NPJ Comput Mater 9(1):102
Tran H, Gurnani R, Kim C, Pilania G, Kwon HK, Lively RP, Ramprasad R (2024) Design of functional and sustainable polymers assisted by artificial intelligence. Nat Rev Mater 9(12):866–886
Xu P, Ji X, Li M, Lu W (2023) Small data machine learning in materials science. NPJ Comput Mater 9(1):42
Zhang X, Duh K (2020) Reproducible and efficient benchmarks for hyperparameter optimization of neural machine translation systems. Trans Assoc Comput Linguist 8:393–408
Zhong X, Gallagher B, Liu S, Kailkhura B, Hiszpanski A, Han TYJ (2022) Explainable machine learning in materials science. NPJ Comput Mater 8(1):204
Comments (0)