Data Mining for Enhanced PEM Electrolysis

Abstract
Despite numerous publications on PEM electrolyser production, a lack of standardized methodologies and reporting guidelines complicates direct comparisons. We conducted a systematic exploratory data analysis, creating a database comprising more than 1,000 samples from 127 publications, which considered over 85 parameters encompassing material selection, MEA fabrication, cell assembly, and characterization. Through statistical analysis, we identified trends and hidden influencing factors. Furthermore, we utilized an Extreme Gradient Boosting model quantifying feature importance, revealing critical factors often underreported relative to their impact. This work provides a foundation for standardizing research data, showing that systematic data management is key in overcoming the comparability challenges and accelerating development.
The main goal of this work was to tackle the reproducibility crisis in PEM electrolysis research. Many studies focus only on the catalyst or other subparts of the cell and do not report all the parameters that influence the performance. We created a database with more than 1,000 samples from 127 publications to get more insights into the influence of different parameters on the performance.