LiteBoost: a lightweight and explainable boosting model for predicting polymer density from SMILES data

Akiba T, Sano S, Yanase T, Ohta T, & Koyama M (2019). Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2623–2631)

Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA (2023) Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 15(1):73

Article  PubMed  PubMed Central  Google Scholar 

Chen T, & Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794)

Chen L, Pilania G, Batra R, Huan TD, Kim C, Kuenneth C, Ramprasad R (2021) Polymer informatics: current status and critical next steps. Mater Sci Eng R Rep 144:100595

Article  Google Scholar 

Doan Tran H, Kim C, Chen L, Chandrasekaran A, Batra R, Venkatram S, Ramprasad R (2020) Machine-learning predictions of polymer properties with polymer genome. J Appl Phys. https://doi.org/10.1063/5.0023759

Article  Google Scholar 

Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat. https://doi.org/10.1214/aos/1013203451

Article  Google Scholar 

Liu G, Zhu Y, Chen J, & Jiang M (2025) NeurIPS - Open Polymer Prediction 2025. https://kaggle.com/competitions/neurips-open-polymer-prediction-2025, 2025. Kaggle.

Gartner TE III, Jayaraman A (2019) Modeling and simulations of polymers: a roadmap. Macromolecules 52(3):755–786

Article  CAS  Google Scholar 

Grinsztajn L, Oyallon E, Varoquaux G (2022) Why do tree-based models still outperform deep learning on typical tabular data? Adv Neural Inf Process Syst 35:507–520

Google Scholar 

Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inf Comput Sci 35(6):1039–1045

Article  CAS  Google Scholar 

Hancock JT, Khoshgoftaar TM (2020) Catboost for big data: an interdisciplinary review. J Big Data 7(1):94

Article  PubMed  PubMed Central  Google Scholar 

Ishii M, Ito T, Sado H, Kuwajima I (2024) NIMS polymer database polyinfo (I): an overarching view of half a million data points. Sci Technol Adv Mater Methods 4(1):2354649

Google Scholar 

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, ... & Liu T Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inform Process Syst 30.

Kim C, Chandrasekaran A, Huan TD, Das D, Ramprasad R (2018) Polymer genome: a data-powered polymer informatics platform for property predictions. J Phys Chem C 122(31):17575–17585

Article  CAS  Google Scholar 

Kuenneth C, Rajan AC, Tran H, Chen L, Kim C, Ramprasad R (2021) Polymer informatics with multi-task learning. Patterns. https://doi.org/10.1016/j.patter.2021.100238

Article  PubMed  PubMed Central  Google Scholar 

Kuenneth C, Ramprasad R (2023) PolyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat Commun 14(1):4099

Article  CAS  PubMed  PubMed Central  Google Scholar 

Landrum G (2013) Rdkit documentation. Release 1(1–79):4

Google Scholar 

Liang Z, Li Z, Zhou S, Sun Y, Yuan J, Zhang C (2022) Machine-learning exploration of polymer compatibility. Cell Rep Phys Sci. https://doi.org/10.1016/j.xcrp.2022.100931

Article  Google Scholar 

Lin TS, Coley CW, Mochigase H, Beech HK, Wang W, Wang Z, Woods E, Craig SL, Johnson JA, Kalow JA, Jensen KF, Olsen BD (2019) Bigsmiles: a structurally-based line notation for describing macromolecules. ACS Cent Sci 5(9):1523–1531

Article  CAS  PubMed  PubMed Central  Google Scholar 

Martin TB, Audus DJ (2023) Emerging trends in machine learning: a polymer perspective. ACS Polym Au 3(3):239–258

Article  CAS  PubMed  PubMed Central  Google Scholar 

Meaney C, Wang X, Guan J, Stukel TA (2025) Comparison of methods for tuning machine learning model hyper-parameters: with application to predicting high-need high-cost health care users. BMC Med Res Methodol 25(1):134

Article  PubMed  PubMed Central  Google Scholar 

Park J, Shim Y, Lee F, Rammohan A, Goyal S, Shim M, Jeong C, Kim DS (2022) Prediction and interpretation of polymer properties using the graph convolutional network. ACS Polym Au 2(4):213–222

Article  CAS  PubMed  PubMed Central  Google Scholar 

Prokhorenkova L, Gusev G, Vorobev A, Dorogush A V, & Gulin A (2018) CatBoost: unbiased boosting with categorical features. Adv Neural Inform Process Syst, 31.

Rodríguez-Pérez R, Bajorath J (2019) Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem 63(16):8761–8777

Article  PubMed  Google Scholar 

Rodríguez-Pérez R, Bajorath J (2020) Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J Comput Aided Mol Des 34(10):1013–1026

Article  PubMed  PubMed Central  Google Scholar 

Shwartz-Ziv R, Armon A (2022) Tabular data: deep learning is not all you need. Inf Fusion 81:84–90

Article  Google Scholar 

Stuart S, Watchorn J, Gu FX (2023) Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials. NPJ Comput Mater 9(1):102

Article  CAS  Google Scholar 

Tran H, Gurnani R, Kim C, Pilania G, Kwon HK, Lively RP, Ramprasad R (2024) Design of functional and sustainable polymers assisted by artificial intelligence. Nat Rev Mater 9(12):866–886

Article  CAS  Google Scholar 

Xu P, Ji X, Li M, Lu W (2023) Small data machine learning in materials science. NPJ Comput Mater 9(1):42

Article  Google Scholar 

Zhang X, Duh K (2020) Reproducible and efficient benchmarks for hyperparameter optimization of neural machine translation systems. Trans Assoc Comput Linguist 8:393–408

Article  Google Scholar 

Zhong X, Gallagher B, Liu S, Kailkhura B, Hiszpanski A, Han TYJ (2022) Explainable machine learning in materials science. NPJ Comput Mater 8(1):204

Article  Google Scholar 

Comments (0)

No login
gif