Data mining model to prediction thermal efficiency in ORC

: The Organic Rankine Cycle (ORC) is an electricity generation system that uses organic fluid instead of water in the low temperature range. The Organic Rankine cycle using zeotropic working fluids has wide application potential. In this study, data mining (DM) model is used for performance analysis of organic Rankine cycle (ORC) using zeotropik working fluids R417A and R422D. Various DM models, including Linear Regression (LR), Multi-Layer Perceptron (MLP), M5 Rules, M5 Model Tree, Random Committee (RC), and Decision Tree (DT) models are used. The MLP model emerged as the most effective approach for predicting the thermal efficiency of both R417A and R422D. The MLP’s predicted results closely matched the actual results obtained from the thermodynamic model using Genetron software. The Root Mean Square Error (RMSE) for the thermal efficiency was exceptionally low, at 0.0002 for R417A and 0.0003 for R422D. Additionally, the R -squared ( R 2 ) values for thermal efficiency were very high, reaching 0.9999 for R417A and R422D. The findings demonstrate the effectiveness of the DM model for complex tasks like estimating ORC thermal efficiency. This approach empowers engineers with the ability to predict thermal efficiency in organic Rankine systems with high accuracy, speed, and ease.


Introduction
It clearly states the critical issue of climate change caused by carbon dioxide emissions from burning fossil fuels.It even cites the International Energy Agency (IEA) statistic of a staggering 36.3 gigatons of CO2 released into the atmosphere in 2021 [1].The transition to a more sustainable energy system hinges on two crucial elements: developing renewable energy sources to replace fossil fuels and utilizing waste heat to enhance overall energy conversion efficiency.Notably, within renewable and waste heat resources, there exists a vast potential for development, particularly in the area of medium and low-grade thermal energy.As the use of renewable energy grows and waste heat generation increases, the organic Rankine cycle (ORC) emerges as a highly suitable technology for heat conversion.This is due to its ability to efficiently utilize low-grade thermal energy.The ORC operates on a similar principle to the traditional steam Rankine cycle, but employs organic fluids with lower boiling points as the working fluid [2].
Machine learning, a type of artificial intelligence algorithm, has gained significant interest in recent years.This is due to its ability to effectively handle complex data, with multiple dimensions and variations, even in situations with dynamic or uncertain conditions [3].In the field of Organic Rankine Cycles (ORCs), some promising research has begun to explore how machine learning can be applied.Several studies documented in academic literature have employed machine learning techniques to estimate and optimize ORC power system performance.Tab le 1 summarizes prior research that investigated ORC systems using various machine learning approaches.In this paper, unlike the studies in the literature, data mining method is used for the thermal efficiency estimation of the ORC system using R417A and R422D as working fluid.The thermodynamic modeling of the ORC system was performed using the Genetron software (Genetron Properties 1.4.2).The results obtained from the data mining method were compared with the thermodynamic model results (actual results) obtained using the Genetron software.The data mining method will help to predict the thermal efficiency of the ORC system very accurately and quickly.

ORC system and thermodynamic modeling
The corresponding system flowchart for the ORC system is presented in Figure 1.The basic ORC configuration consists of four essential components [18]: • Evaporator: This component transfers heat from an external source to the system, vaporizing the high-pressure organic working fluid.
• Turbine: The high-temperature, high-pressure organic working fluid expands in the turbine, generating electricity.
• Condenser: Here, heat is extracted from the low-pressure working fluid exiting the turbine, condensing it back into a liquid state.The four components associated with the ORC (pump, evaporator, turbine and condenser) are steady-state flow devices, and thus the four processes that make up the ORC can be analyzed as a steady flow process which can be expressed as [19,20]: The energy conservation relationship for each component can be expressed as follows: Pump: The power required to pump the condensed liquid working fluid to the inlet side of the boiler is calculated by the equation: Evaporator: In the evaporator, heat is added to the liquid working fluid so that it changes its phase to gas.The calorific value required by the boiler is calculated by the equation: ) Turbine: The process of expansion of the working fluid in gaseous form from high pressure to condensing pressure produces turbine power, the output power is calculated by the equation: Condenser: In the condenser a certain amount of heat is discharged into the environmental air, and the value of the heat released is calculated by the equation: The performance of ORC systems is usually expressed by thermal efficiency, and is calculated by the equation: The thermodynamic modeling of the ORC system was performed using the Genetron software.
Some of parameters and assumptions in this study presented in Table 2 were selected based on the working range of ORC systems that have been used as smallscale power plants.

Data mining model and application
Data mining is an interdisciplinary field that integrates elements from databases, statistics, machine learning, signal processing, and high-performance computing.Its primary objective is to uncover meaningful correlations and patterns within existing data that are potentially valuable and understandable.It serves as a potent tool for extracting predictive insights from vast datasets.Data mining tasks can generally be categorized as either predictive or descriptive in nature.Predictive modeling involves the construction of predictive models based on the outcomes of disparate datasets.In contrast, descriptive modeling aims to identify underlying patterns or relationships within the data.Unlike predictive modeling, which focuses on making predictions, descriptive modeling seeks to uncover inherent characteristics of the data being studied rather than predicting new features.Common predictive modeling tasks in data mining include classification, prediction, regression, and time series analysis.Descriptive tasks encompass techniques such as clustering, summarization, association rules, and ranking.Figure 2 illustrates the various tasks and models in data mining [21].Among predictive models, Classification is arguably the most comprehensively understood approach in data mining.Three key characteristics of classification tasks are [22]: Ability to assign new data to distinct predefined classes.
In contrast to classification, Prediction modeling aims to forecast future outcomes rather than describing current behavior.Its output can be either categorical or numerical.
Another type of forecasting model, known as Statistical Regression, is a supervised learning technique that involves analyzing the relationship between attributes within the same dataset and building a model capable of predicting attribute values for new instances.Forecasting scenarios involving one or more time-dependent attributes are commonly referred to as time series problems.
CRISP-DM (Cross-Industry Standard Process for Data Mining) is a process model management and implementation of data mining projects.It is the most widelyused analytics model.This methodology consists of six phases [23]: Business Understanding, Data Preparation, Modelling, Evaluation and Deployment, as shown in Figure 3. Business Understanding: It includes business understanding and is the stage where the objectives and requirements are determined.The purpose and business requirements of the data mining are determined.At this stage, objectives and requirements are understood.
Data Understanding: In the data understanding phase, the existing data set is analyzed and understood.Data quality, missing data and data relationships are analyzed at this stage.In this phase, important information about the data is obtained and analyzed.The influence of evaporator, condenser, subcooling, and superheating temperatures on the thermal efficiency of an ORC system is well known.Consequently, these temperatures were selected as the input data in the study.Thermal efficiency is the output data of the data mining model.
Data Preparation: In the data preparation phase, the data is made suitable for data mining operations.The data is cleaned, transformed and brought into a suitable format.At this stage, the data set is made ready to be used in the modelling phase.In this study, the thermodynamic modeling of the ORC system was made using the Genetron software.The data set used to train the network was obtained from the results of the thermodynamic modeling Modelling: Data mining models are created and tested.At this stage, learning from the data set is performed using different algorithms and techniques.In this study; The Linear Regression (LR), Multi-Layer Perceptron (MLP), Decision Table (DT), M5 rules (M5R), M5P model tree (M5P) and Random Committee (RC) models are applied for the thermal efficiency estimating of the ORC system working with R417A and R422D.Information on these models is given below.

•
Linear Regression: Regression analysis is a statistical method utilized to investigate the numerical relationship between two or more variables.Its primary objective is to elucidate the functional relationship between variables and to articulate this relationship through a model.Within regression analysis, when there exists one dependent variable and one independent variable, it is termed as Simple Linear Regression; whereas, if there are multiple independent variables, it is referred to as Multiple Linear Regression.Multivariate Regression analysis represents a generalized form of Multiple Linear Regression analysis, wherein there are multiple dependent variables involved [25].

•
Multilayer Perceptron: A multilayer perceptron (MLP) serves as a classifier employing backpropagation for sample classification and learning.These networks, known as feed-forward neural networks, are trained utilizing the standard backpropagation algorithm.Being supervised networks, MLPs necessitate a desired response for training purposes.They learn the process of transforming input data into the desired response, making them extensively employed for pattern classification tasks.Equipped with one or two hidden layers, MLPs demonstrate the capability to approximate nearly any input-output mapping.Moreover, they have demonstrated the ability to approach the performance levels of optimal statistical classifiers even in challenging conditions [26,27].

•
M5 Rules: The M5 Rules algorithm is a method that employs the divide-andconquer approach to construct decision lists for regression tasks.Utilizing the divide-and-conquer technique, the M5 Rules algorithm constructs a model tree, generates rules from the optimal leaf, and subsequently processes the remaining instances in the dataset based on the generated rule.In contrast to PART (Partial Decision Trees), which employs a similar strategy for categorical prediction, M5 Rules constructs complete trees rather than partially explored trees.The generation of partial trees offers enhanced computational efficiency without compromising the size and accuracy of the resultant rules [26].[29,30].Evaluation: The effectiveness and performance of the models created are evaluated.At this stage, the performance of the models is evaluated by looking at how well they meet the set objectives.Different statistical criteria can be used to determine the model's performance, such as mean absolute error (MAE), Root Mean Square Error (RMSE), and coefficient of determination (R2), as given below [22].
In these equations, ye,i refers to the prediction value, ta.i to the true value, t̅ a.m to the mean of the true value, and n to the number of data.
Deployment: Successful models are integrated into the business.It is planned how the models will be used and maintained in real conditions.

Result and discussion
Different Data mining models (LR, MLP, DT, M5 rules, M5 model tree, and Random Committee) were used to determine the thermal efficiency of the ORC system operating with R417A and R422D working fluids.Data mining analyses were conducted using WEKA 3.9 software (Waikato Environment for Knowledge Analysis).The performance of various models was evaluated using metrics like Rsquared (R²), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE).
The formulas for these metrics are provided earlier in the text.The ideal estimating model is highly accurate, meaning the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) approach zero, while the R-squared (R 2 ) value gets as close to 1 as possible.Accordingly, Tables 3 and 4 show that the best results for the thermal efficiency of the ORC system with R417A and R422D working fluids are obtained using the MLP model.For the test data set, the comparison of actual and predicted thermal efficiency of R417A and R422D is shown in     Tables 5 and 6 compare the actual thermal efficiency of the ORC system using R417A and R422D working fluids with the estimates from the MLP model.The comparison considers the evaporator temperature, condenser temperature, subcooling temperature, and superheating temperature.The maximum percentage errors in thermal efficiency were 2.51% for R417A and 2.37% for R422D.

Conclusion
This study combined thermodynamic and data mining methods to predict the thermal efficiency of an Organic Rankine Cycle (ORC) system.We evaluated various data mining models, including Linear Regression (LR), Multi-Layer Perceptron (MLP), M5 Rules, M5 Model Tree, Random Committee (RC), and Decision Tree (DT) models.Analyses were conducted for R417A and R422D zeotropic working fluids.
The MLP model emerged as the most effective approach for predicting the thermal efficiency of both R417A and R422D.The MLP's predicted results closely matched the actual results obtained from the thermodynamic model using Genetron software.The Root Mean Square Error (RMSE) for the thermal efficiency was exceptionally low, at 0.0002 for R417A and 0.0003 for R422D.Additionally, the Rsquared (R 2 ) values for thermal efficiency were very high, reaching 0.9999 for R417A and R422D.While the model exhibited high accuracy, the maximum percentage errors in thermal efficiency were 2.51% for R417A and 2.37% for R422D.
It describes the successful use of data mining for complex engineering tasks, like modeling ORC systems.The newly created model allows for predicting ORC system performance in a simpler, faster, and more accurate way compared to traditional models.As is proved by the results presented in this research, the data mining method can be used to solve many engineering problems.The authors believe this machine learning approach has the potential to be applied to a wide variety of engineering problems in future studies.
Author contributions: Conceptualization, ED and AŞŞ; methodology, ED; software, AŞŞ; validation, ED and AŞŞ; formal analysis, ED; investigation, AŞŞ; data curation, AŞŞ; writing-original draft preparation, ED; writing-review and editing, AŞŞ.All authors have read and agreed to the published version of the manuscript.

Conflict of interest:
The authors declare no conflict of interest.

•
Pump: The pump increases the pressure of the liquid working fluid to match the evaporator pressure, allowing the cycle to repeat.

Figure 2 .
Figure 2. Data mining tasks and models.

Figure 4 .
The correlation coefficient values for R417A and R422D are 0.9961 and 0.9936, respectively.

Figure 4 .
Figure 4. Actual and predicted thermal efficiency for R417A and R422D.

Figure 5
Figure 5 compares the actual thermal efficiency of the ORC system with the efficiencies estimated by the MLP model, for both R417A and R422D working fluids.As the figure shows, thermal efficiency increases with higher evaporator temperatures.The MLP model's estimates closely match the actual efficiency values.
Abbreviationsℎ Enthalpy (kJ/kg) ̇Mass flow rate (kg/s) ̇Heat transfer (kW) Temperature (℃) ̇Power (kW) h Efficiency D Difference C Condenser E Evaporator in Inlet out Outlet p Pump r Working fluid SC Subcooling SH Superheating t Turbine

Table 1 .
Summary of related studies on optimization and performance prediction of ORC systems.

Table 2 .
Research parameters and assumptions.
[28]worthy advantages of model trees over regression trees include their comparatively smaller size, transparent decision-making processes, and the tendency for regression functions to involve a manageable number of variables[26].•RandomCommittee:TheRandomCommitteeclassifierfunctions by assembling an ensemble of base classifiers that are randomized in nature.Each base classifier is generated utilizing a distinct random number kernel, albeit based on the same dataset.The ultimate prediction is derived by computing a simple average of the predictions generated by each individual base classifier[28].•DecisionTable:TheDecision Table classifier is employed to construct a majority classifier using a straightforward decision table.Through an induction algorithm applied to a labeled training set, this classifier is generated.Two distinct variants of decision table classifiers have been delineated.The first variant, known as DTMaj (Decision Table Majority), returns the majority class of the training set if the corresponding cell in the decision table, corresponding to the new example, is empty, signifying the absence of any training examples.The second variant, termed DTLoc (Decision Table Local), introduces a novel approach by seeking a decision table entry with fewer matching attributes (larger cells) in the event that the matching cell is devoid of examples.Consequently, this variant furnishes a response from the local vicinity, where minor alterations in a pertinent attribute do not induce changes in the label value branches of regression trees.Consequently, model trees resemble piecewise linear functions, thus exhibiting non-linear behavior.Model trees offer enhanced learning efficiency and are adept at handling tasks involving high dimensionality, even up to hundreds of attributes.

Table 5 .
Comparison of actual and model results for R417A.

Table 6 .
Comparison of actual and model results for R422D.