Extracting a reliable signal from heterogeneous data
DOI:
https://doi.org/10.47813/2782-2818-2024-4-1-0122-0132Keywords:
heterogeneous data, soft maximin estimation, common reliable signal, large-scale systems, data heterogeneity.Abstract
The article is devoted to the study of extracting a common reliable signal from data divided into heterogeneous groups. A soft maximum estimate of the maximum value is proposed as a computationally attractive alternative aimed at achieving a balance between a combined estimate and a (hard) estimate of the maximum value. The problem of extracting a common signal from heterogeneous data is considered. Since heterogeneity prevails in large-scale systems, the goal is a computationally efficient estimator (solution) with good statistical proper-ties with varying degrees of data heterogeneity. Using this estimate can lead to more reliable estimates for heterogeneous data compared to an estimate that does not take into account grouping, that is, a combined estimate. In large-scale data processing systems, where data heterogeneity is usually found, the computational aspect of evaluation is crucial. In substantiation of this thesis, the article provides an analysis of the effectiveness of soft maximum estimation for approval of large-scale data processing systems, confirming the effectiveness of the applied method. In summary soft maximin estimation will be practically useful in a number of different contexts, as a way of aggregating explained variances across groups.
References
Meinshausen N., Bühlmann P. Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 2015; 43(4): 17-22. https://doi.org/10.1214/15-AOS1325
Fanaee-T H., Gama J. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence. 2013; 2(2): 113-127. https://doi.org/10.1007/s13748-013-0040-3
Tseng P., Yun S. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming. 2009; 117(1-2): 387-423. https://doi.org/10.1007/s10107-007-0170-0
Lund A. SMME: Soft maximin estimation for large scale heterogeneous data. R package version 1.0.1; 2021.
Lund A., Mogensen W.S., Hansen R.N. Soft maximin estimation for heterogeneous data. Scandinavian Journal of Statistics. 2022; 49(4): 1761-1790. https://doi.org/10.1111/sjos.12580
Rothenhdusler D., Meinshausen N., Behlmann P., Peters, J. Anchor regression: Heterogeneous data meet causality. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2021; 83(2): 215-246. https://doi.org/10.1111/rssb.12398
Atlasov D.I., Kravets O.Ja. To the formulation of the problem of extracting a common signal from heterogeneous data of heterogeneous information systems. Modern informatization problems in simulation and social technologies (MIP-2023’SCT). Proc. of the XXVIII-th Int. Open Science Conf. January 2023; Yelm, WA, USA: Science Book Publishing House; 2023: 8-13.
REFERENCES
Meinshausen N., Bühlmann P. Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 2015; 43(4): 17-22. https://doi.org/10.1214/15-AOS1325 DOI: https://doi.org/10.1214/15-AOS1325
Fanaee-T H., Gama J. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence. 2013; 2(2): 113-127. https://doi.org/10.1007/s13748-013-0040-3 DOI: https://doi.org/10.1007/s13748-013-0040-3
Tseng P., Yun S. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming. 2009; 117(1-2): 387-423. https://doi.org/10.1007/s10107-007-0170-0 DOI: https://doi.org/10.1007/s10107-007-0170-0
Lund A. SMME: Soft maximin estimation for large scale heterogeneous data. R package version 1.0.1; 2021.
Lund A., Mogensen W.S., Hansen R.N. Soft maximin estimation for heterogeneous data. Scandinavian Journal of Statistics. 2022; 49(4): 1761-1790. https://doi.org/10.1111/sjos.12580 DOI: https://doi.org/10.1111/sjos.12580
Rothenhdusler D., Meinshausen N., Behlmann P., Peters, J. Anchor regression: Heterogeneous data meet causality. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2021; 83(2): 215-246. https://doi.org/10.1111/rssb.12398 DOI: https://doi.org/10.1111/rssb.12398
Atlasov D.I., Kravets O.Ja. To the formulation of the problem of extracting a common signal from heterogeneous data of heterogeneous information systems. Modern informatization problems in simulation and social technologies (MIP-2023’SCT). Proc. of the XXVIII-th Int. Open Science Conf. January 2023; Yelm, WA, USA: Science Book Publishing House; 2023: 8-13.
Downloads
Published
How to Cite
Conference Proceedings Volume
Section
License
Copyright (c) 2024 Д. И. Атласов, О. Я. Кравец
This work is licensed under a Creative Commons Attribution 4.0 International License.
The journal MIST - "Modern Innovations, Systems and Technologies" publishes materials under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license, hosted on the official website of the non-profit corporation Creative Commons:
This work is licensed under a Creative Commons Attribution 4.0 International License.
This means that users can copy and distribute materials in any medium and in any format, adapt and transform texts, use content for any purpose, including commercial ones. At the same time, the terms of use must be observed - an indication of the author of the original work and the source: you should indicate the output of the articles, provide a link to the source, and also indicate what changes have been made