Data-Driven Modeling of Gas–Liquid Two-Phase Flow Regimes Using Ensemble Machine Learning and High-Resolution Experiments
Main Article Content
Abstract
Gas–liquid two-phase flows occur in many engineering systems, where complex interactions between phases generate distinct flow regimes such as bubbly, slug, churn, annular, and stratified flow. These regimes influence momentum, heat, and mass transfer, and they strongly affect pressure drop, phase distribution, and equipment reliability. Traditional flow regime identification relies on empirical maps and mechanistic correlations that were constructed for restricted geometries and operating ranges. As operational envelopes expand and new working fluids are introduced, these classical correlations face limitations in generalization and uncertainty quantification. At the same time, modern experimental facilities provide detailed measurements with high temporal and spatial resolution, including high-speed imaging, tomographic reconstructions, and local probes. These data streams offer an opportunity for data-driven modeling that can complement mechanistic descriptions. In this context, ensemble machine learning methods, which combine multiple predictive models, are suitable tools for exploiting high-dimensional and heterogeneous experimental data. They can approximate nonlinear decision boundaries for regime classification and provide calibrated probabilistic outputs. This paper presents a comprehensive formulation for data-driven modeling of gas–liquid two-phase flow regimes using ensemble machine learning, grounded on high-resolution experimental observables and physically interpretable features. The methodology integrates physical scaling, partial differential equation based flow descriptions, numerical modeling concepts, and statistical validation. The discussion emphasizes consistency with basic conservation laws, robustness to measurement noise, and interpretability of learned models in terms of dimensionless groups and regime transition mechanisms. The study also examines limitations related to data coverage, extrapolation, and uncertainty quantification, and it outlines possible ways of combining mechanistic and data-driven models for future development.