In this study, a preprocessing approach is proposed to improve training accuracy and computational efficiency by leveraging SCADA data and integrating machine learning algorithms with statistical techniques for unlabeled data. The method reduces the training dataset substantially by partitioning the data into equal points intervals and selecting representative samples based on quantiles, a procedure referred to as RD. Using this strategy, only 0.2% of the original dataset is required for model training in this study. Subsequently, abnormal data points are identified using a power curve model with quantile thresholds. The RD method is evaluated against DBSCAN and a KNN-based model. Experimental results obtained from real-world wind farm data indicate that RD combined with KNN outperforms DBSCAN. Specifically, both MAE and RMSE are reduced by approximately 15%, reflecting improved predictive accuracy. From a computational perspective, the execution time of RD is about 0.15 seconds, compared to 0.99 seconds for DBSCAN, corresponding to a reduction in runtime exceeding 50%. Moreover, unlike DBSCAN, which requires precise parameter tuning or additional constraints tailored to the power curve structure when dealing with dense or linear outliers, the proposed approach is capable of automatically eliminating outlier data points from the wind turbine power curve without the need for predefined filters or explicit boundary definitions prior to the cleaning process.
Aghajani Mobarakeh, A. and Poshtan, J. (2025). A data-driven framework for wind turbine power curve cleaning and abnormal data detection based on binning and quantiles. Amirkabir Journal of Mechanical Engineering, 57(10), -. doi: 10.22060/mej.2026.24688.7894
MLA
Aghajani Mobarakeh, A. , and Poshtan, J. . "A data-driven framework for wind turbine power curve cleaning and abnormal data detection based on binning and quantiles", Amirkabir Journal of Mechanical Engineering, 57, 10, 2025, -. doi: 10.22060/mej.2026.24688.7894
HARVARD
Aghajani Mobarakeh, A., Poshtan, J. (2025). 'A data-driven framework for wind turbine power curve cleaning and abnormal data detection based on binning and quantiles', Amirkabir Journal of Mechanical Engineering, 57(10), pp. -. doi: 10.22060/mej.2026.24688.7894
CHICAGO
A. Aghajani Mobarakeh and J. Poshtan, "A data-driven framework for wind turbine power curve cleaning and abnormal data detection based on binning and quantiles," Amirkabir Journal of Mechanical Engineering, 57 10 (2025): -, doi: 10.22060/mej.2026.24688.7894
VANCOUVER
Aghajani Mobarakeh, A., Poshtan, J. A data-driven framework for wind turbine power curve cleaning and abnormal data detection based on binning and quantiles. Amirkabir Journal of Mechanical Engineering, 2025; 57(10): -. doi: 10.22060/mej.2026.24688.7894