Extracting a reliable signal from heterogeneous data

Authors

DOI:

https://doi.org/10.47813/2782-2818-2024-4-1-0122-0132

Keywords:

heterogeneous data, soft maximin estimation, common reliable signal, large-scale systems, data heterogeneity.

Abstract

The article is devoted to the study of extracting a common reliable signal from data divided into heterogeneous groups. A soft maximum estimate of the maximum value is proposed as a computationally attractive alternative aimed at achieving a balance between a combined estimate and a (hard) estimate of the maximum value. The problem of extracting a common signal from heterogeneous data is considered. Since heterogeneity prevails in large-scale systems, the goal is a computationally efficient estimator (solution) with good statistical proper-ties with varying degrees of data heterogeneity. Using this estimate can lead to more reliable estimates for heterogeneous data compared to an estimate that does not take into account grouping, that is, a combined estimate. In large-scale data processing systems, where data heterogeneity is usually found, the computational aspect of evaluation is crucial. In substantiation of this thesis, the article provides an analysis of the effectiveness of soft maximum estimation for approval of large-scale data processing systems, confirming the effectiveness of the applied method. In summary soft maximin estimation will be practically useful in a number of different contexts, as a way of aggregating explained variances across groups.

Author Biographies

D. I. Atlasov

Denis Atlasov, postgraduate, Automation and computing Systems, Voronezh State Technical University, Voronezh, Russian Federation

O. Ja. Kravets

Oleg Ja. Kravets, Doctor of Engineering, Professor, Automation and computing Systems, Voronezh State Technical University, Voronezh, Russian Federation

References

Meinshausen N., Bühlmann P. Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 2015; 43(4): 17-22. https://doi.org/10.1214/15-AOS1325

Fanaee-T H., Gama J. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence. 2013; 2(2): 113-127. https://doi.org/10.1007/s13748-013-0040-3

Tseng P., Yun S. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming. 2009; 117(1-2): 387-423. https://doi.org/10.1007/s10107-007-0170-0

Lund A. SMME: Soft maximin estimation for large scale heterogeneous data. R package version 1.0.1; 2021.

Lund A., Mogensen W.S., Hansen R.N. Soft maximin estimation for heterogeneous data. Scandinavian Journal of Statistics. 2022; 49(4): 1761-1790. https://doi.org/10.1111/sjos.12580

Rothenhdusler D., Meinshausen N., Behlmann P., Peters, J. Anchor regression: Heterogeneous data meet causality. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2021; 83(2): 215-246. https://doi.org/10.1111/rssb.12398

Atlasov D.I., Kravets O.Ja. To the formulation of the problem of extracting a common signal from heterogeneous data of heterogeneous information systems. Modern informatization problems in simulation and social technologies (MIP-2023’SCT). Proc. of the XXVIII-th Int. Open Science Conf. January 2023; Yelm, WA, USA: Science Book Publishing House; 2023: 8-13.

REFERENCES

Meinshausen N., Bühlmann P. Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 2015; 43(4): 17-22. https://doi.org/10.1214/15-AOS1325 DOI: https://doi.org/10.1214/15-AOS1325

Fanaee-T H., Gama J. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence. 2013; 2(2): 113-127. https://doi.org/10.1007/s13748-013-0040-3 DOI: https://doi.org/10.1007/s13748-013-0040-3

Tseng P., Yun S. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming. 2009; 117(1-2): 387-423. https://doi.org/10.1007/s10107-007-0170-0 DOI: https://doi.org/10.1007/s10107-007-0170-0

Lund A. SMME: Soft maximin estimation for large scale heterogeneous data. R package version 1.0.1; 2021.

Lund A., Mogensen W.S., Hansen R.N. Soft maximin estimation for heterogeneous data. Scandinavian Journal of Statistics. 2022; 49(4): 1761-1790. https://doi.org/10.1111/sjos.12580 DOI: https://doi.org/10.1111/sjos.12580

Rothenhdusler D., Meinshausen N., Behlmann P., Peters, J. Anchor regression: Heterogeneous data meet causality. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2021; 83(2): 215-246. https://doi.org/10.1111/rssb.12398 DOI: https://doi.org/10.1111/rssb.12398

Atlasov D.I., Kravets O.Ja. To the formulation of the problem of extracting a common signal from heterogeneous data of heterogeneous information systems. Modern informatization problems in simulation and social technologies (MIP-2023’SCT). Proc. of the XXVIII-th Int. Open Science Conf. January 2023; Yelm, WA, USA: Science Book Publishing House; 2023: 8-13.

Published

2024-03-11

How to Cite

Atlasov, D. I., & Kravets , O. J. (2024). Extracting a reliable signal from heterogeneous data. Modern Innovations, Systems and Technologies, 4(1), 0122–0132. https://doi.org/10.47813/2782-2818-2024-4-1-0122-0132

Conference Proceedings Volume

Section

IT and informatics