Discrete and Continuous Models and Applied Computational Science

2658-46702658-7149

Peoples' Friendship University of Russia named after Patrice Lumumba (RUDN University)

17894

10.22363/2312-9735-2018-26-1-58-73

Modeling and Simulation

Математическое моделирование

Research Article

On a Method of Multivariate Density Estimate Basedon Nearest Neighbours Graphs

Об одном методе оценки многомерной плотности на основеближайших соседей

Beliakov

Gleb

Беляков

Глеб

Beliakov Gleb - professor, Candidate of Physical and Mathematical Sciences, professor of School of Information Technology of Deakin University, Australia

Беляков Глеб - профессор, кандидат физико-математических наук, профессор кафедры вычислительных технологий Университета Дикин, Австралия

gleb@deakin.edu.au

Deakin UniversityУниверситет Дикин

15122018

261

VOL 26, NO1 (2018)

ТОМ 26, №1 (2018)

587328022018

2018

Beliakov G.

Беляков Г.

https://creativecommons.org/licenses/by-nc/4.0

https://journals.rudn.ru/miph/article/view/17894

A method of multivariate density estimation based on the reweighted nearest neighbours,mimicking the natural neighbours techniques, is presented. Estimation of multivariate densityis important for machine learning, astronomy, biology, physics and econometrics. A 2-additivefuzzy measure is constructed based on proxies for pairwise interaction indices. The neighboursof a point lying in nearly the same direction are treated as redundant, and the contributionof the farthest neighbour is transferred to the nearer neighbour. The calculation of the localpoint density estimate is performed by the discrete Choquet integral, so that the contributionsof the neighbours all around that point are accounted for. This way an approximation to theSibson’s natural neighbours is computed. The method relieves the computational burden of theDelaunay tessellation-based natural neighbours approach in higher dimensions, whose complexityis exponential in the dimension of the data. This method is suitable for density estimates ofstructured data (possibly lying on lower dimensional manifolds), as the nearest neighbours diﬀersigniﬁcantly from the natural neighbours in this case.

Представлен метод оценки многомерной плотности, основанный на взвешенном методе ближайших соседей и имитирующий метод естественных соседей. Оценка многомерной плотности важна в машинном обучении, астрономии, биологии, физике и эконометрике.Строится 2-аддитивная нечёткая мера на основе аппроксимации индексов парных взаимодействий. Соседи, лежащие примерно в одном направлении, рассматриваются как излишние,и вклад дальнего соседа передаётся ближнему соседу. Расчёт локальной оценки плотности осуществляется с помощью дискретного интеграла Шоке таким образом, что учитывается вклад соседей, расположенных со всех сторон точки, где производятся вычисления. Однако вклад соседей, расположенных с одной и той же стороны, занижается с помощью выбора подходящей нечёткой меры. Таким образом вычисляется приближение к множеству естественных соседей Сибсона. Этот метод значительно снижает вычислительную нагрузку методов на базе естественных соседей, которые лежат на основе тесселяции Делоне, в высокой размерности, для которых вычислительная сложность растёт как экспонента раз-мерности. Описанный метод подходит для оценки плотности структурированных данных(возможно, лежащих на многообразии более низкой размерности), так как в этом случае ближайшие соседи могут значительно отличаться от естественных соседей.

density estimatenearest neighboursChoquet integralfuzzymeasurenatural neighbour

оценка плотностиметод ближайших соседейинтеграл Шокенечёт-кая мераметод естественных соседей

D. W. Scott, Multivariate Density Estimation, John Wiley and Sons, New York, 2015.

Scott D. W. Multivariate Density Estimation. - New York: John Wiley and Sons, 2015.

G. Beliakov, M. King, Density Based Fuzzy C-Means Clustering of Non-Convex Patterns, Europ. J. Oper. Res. 173 (2006) 717–728.

Beliakov G., King M. Density Based Fuzzy C-Means Clustering of Non-Convex Patterns // Europ. J. Oper. Res. - 2006. - Vol. 173. - Pp. 717-728.

P. Angelov, R. R. Yager, Density-Based Averaging — a New Operator for Data Fusion, Information Sciences 222 (2013) 163–174.

Angelov P., Yager R. R. Density-Based Averaging - a New Operator for Data Fusion // Information Sciences. - 2013. - Vol. 222. - Pp. 163-174.

G. Beliakov, T. Wilkin, On Some Properties of Weighted Averaging with Variable Weights, Information Sciences 281 (2014) 1–7.

Beliakov G., Wilkin T. On Some Properties of Weighted Averaging with Variable Weights // Information Sciences. - 2014. - Vol. 281. - Pp. 1-7.

E. Parzen, On the Estimation of a Probability Density Function and the Mode, Annals of Math. Stats. 33 (1962) 1065–1076.

Parzen E. On the Estimation of a Probability Density Function and the Mode // Annals of Math. Stats. - 1962. - Vol. 33. - Pp. 1065-1076.

C. Abraham, G. Biau, B. Cadre, Simple Estimation of the Mode of a Multivariate Density, The Canadian Journal of Statistics 31 (2003) 23–34.

Abraham C., Biau G., Cadre B. Simple Estimation of the Mode of a Multivariate Density // The Canadian Journal of Statistics. - 2003. - Vol. 31. - Pp. 23-34.

W. E. Schaap, R. van de Weygaert, Continuous Fields and Discrete Samples: Reconstruction Through Delaunay Tessellations, Astronomy and Astrophysics 363 (2000) L29–L32.

Schaap W. E., van de Weygaert R. Continuous Fields and Discrete Samples: Reconstruction Through Delaunay Tessellations // Astronomy and Astrophysics. - 2000. - Vol. 363. - Pp. L29-L32.

E. Schubert, J. Sander, M. Ester, H. P. Kriegel, X. Xu, DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN, ACM Trans. Database Syst. 42 (2017) 19:1–19:21. doi:10.1145/3068335.

DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN / E. Schubert, J. Sander, M. Ester, H. P. Kriegel, X. Xu // ACM Trans. Database Syst. - 2017. - Vol. 42. - Pp. 19:1-19:21.

N.-B. Heidenreich, A. Schindler, S. Sperlich, Bandwidth Selection for Kernel Density Estimation: a Review of Fully Automatic Selectors, AStA Adv. Stat. 97 (2013) 403–433.

Heidenreich N.-B., Schindler A., Sperlich S. Bandwidth Selection for Kernel Density Estimation: a Review of Fully Automatic Selectors // AStA Adv. Stat. - 2013. - Vol. 97. - Pp. 403-433.

10.

G. Voronoi, Nouvelles applications des parametres continus a la theorie des formes quadratiques, Journal fur die Reine und Angewandte Mathematik 133 (1908) 97–178.

Voronoi G. Nouvelles applications des parametres continus a la theorie des formes quadratiques // Journal fur die Reine und Angewandte Mathematik. - 1908. - Vol. 133. - Pp. 97-178.

11.

B. Delaunay, Sur la sph`ere vide, Bulletin de l’Academie des Sciences de l’URSS, Classe des sciences mathematiques et naturelles 6 (1934) 793–800.

Delaunay B. Sur la sph`ere vide // Bulletin de l’Academie des Sciences de l’URSS, Classe des sciences mathematiques et naturelles. - 1934. - Vol. 6. - Pp. 793-800.

12.

R. Sibson, Brief Description of Natural Neighbor Interpolation, in: V. Barnett (Ed.), Interpreting Multivariate Data, John Wiley and Sons, New York, 1981, pp. 21–36.

Sibson R. Brief Description of Natural Neighbor Interpolation // Interpreting Multivariate Data / Ed. by V. Barnett. - New York: John Wiley and Sons, 1981. - Pp. 21-36.

13.

W. Stuetzle, Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample, Journal of Classiﬁcation 20 (2003) 25–47.

Stuetzle W. Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample // Journal of Classiﬁcation. - 2003. - Vol. 20. - Pp. 25-47.

14.

H. Samet, Foundations of Multidimensional and Metric Data Structures, Elsevier, Boston, 2006.

Samet H. Foundations of Multidimensional and Metric Data Structures. - Boston: Elsevier, 2006.

15.

T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer- Verlag, New York, Berlin, Heidelberg, 2001.

Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. - New York, Berlin, Heidelberg: Springer-Verlag, 2001.

16.

B. Dasarathy, Nearest Neighbor Norms: NN Pattern Classiﬁcation Techniques, IEEE Computer Society Press, Los Alamitos, CA, 1991.

Dasarathy B. Nearest Neighbor Norms: NN Pattern Classiﬁcation Techniques. - Los Alamitos, CA: IEEE Computer Society Press, 1991.

17.

S. Cost, S. Salzberg, A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features, Machine Learning 10 (1993) 57–78.

Cost S., Salzberg S. A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features // Machine Learning. - 1993. - Vol. 10. - Pp. 57-78.

18.

R. Yager, Using Fuzzy Methods to Model Nearest Neighbor Rules, IEEE Trans. on Syst., Man, and Cybernetics 32 (2002) 512–525.

Yager R. Using Fuzzy Methods to Model Nearest Neighbor Rules // IEEE Trans. on Syst., Man, and Cybernetics. - 2002. - Vol. 32. - Pp. 512-525.

19.

E. H¨ullermeier, The Choquet-Integral as an Aggregation Operator in Case-Based Learning, in: B. Reusch (Ed.), Computational Intelligence, Theory and Applications, Springer, Berlin, Heidelberg, 2006, pp. 615–627.

Hullermeier E. The Choquet-Integral as an Aggregation Operator in Case-Based Learning // Computational Intelligence, Theory and Applications / Ed. by B. Reusch. - Berlin, Heidelberg: Springer, 2006. - Pp. 615-627.

20.

D. Watson, Contouring: A Guide to the Analysis and Display of Spatial Data, Pergamon Press, Oxford, 1992.

Watson D. Contouring: A Guide to the Analysis and Display of Spatial Data. - Oxford: Pergamon Press, 1992.

21.

J.-D. Boissonnat, F. Cazals, Smooth Surface Reconstruction Via Natural Neighbour Interpolation of Distance Functions, Proc. of the 16th Annual Symposium on Computational Geometry (2000) 223–232.

Boissonnat J.-D., Cazals F. Smooth Surface Reconstruction Via Natural Neighbour Interpolation of Distance Functions // Proc. of the 16th Annual Symposium on Computational Geometry. - 2000. - Pp. 223-232.

22.

V. V. Belikov, V. D. Ivanov, V. K. Kontorovich, S. A. Korytnik, A. Y. Semenov, The Non-Sibsonian Interpolation: a New Method of Interpolation of the Values of a Function on an Arbitrary Set of Points, Computational Mathematics and Mathematical Physics 37 (1997) 9–15.

The Non-Sibsonian Interpolation: a New Method of Interpolation of the Values of a Function on an Arbitrary Set of Points / V. V. Belikov, V. D. Ivanov, V. K. Kontorovich, S. A. Korytnik, A. Y. Semenov // Computational Mathematics and Mathematical Physics. - 1997. - Vol. 37. - Pp. 9-15.

23.

G. Beliakov, A. Pradera, T. Calvo, Aggregation Functions: A Guide for Practitioners, Springer, Heidelberg, 2007.

Beliakov G., Pradera A., Calvo T. Aggregation Functions: A Guide for Practitioners. - Heidelberg: Springer, 2007.

24.

M. Grabisch, J.-L. Marichal, R. Mesiar, E. Pap, Aggregation Functions, Cambridge University press, Cambridge, 2009.

Aggregation Functions / M. Grabisch, J.-L. Marichal, R. Mesiar, E. Pap. - Cambridge: Cambridge University press, 2009.

25.

M. Grabisch, T. Murofushi, M. Sugeno (Eds.), Fuzzy Measures and Integrals. Theory and Applications, Physica-Verlag, Heidelberg, 2000.

Fuzzy Measures and Integrals. Theory and Applications / Ed. by M. Grabisch, T. Murofushi, M. Sugeno. - Heidelberg: Physica-Verlag, 2000.

26.

M. Grabisch, k-Order Additive Discrete Fuzzy Measures and Their Representation, Fuzzy Sets and Systems 92 (1997) 167–189.

Grabisch M. k-Order Additive Discrete Fuzzy Measures and Their Representation // Fuzzy Sets and Systems. - 1997. - Vol. 92. - Pp. 167-189.

27.

B. Mayag, M. Grabisch, C. Labreuche, A Characterization of the 2-additive Choquet Integral, in: Proc. of IPMU, Malaga, Spain, 2008, pp. 1512–1518.

Mayag B., Grabisch M., Labreuche C. A Characterization of the 2-additive Choquet Integral // Proc. of IPMU. - Malaga, Spain: 2008. - Pp. 1512-1518.

28.

J. W. Harris, H. Stocker, Spherical Segment (Spherical Cap), in: Handbook of Mathematics and Computational Science, Springer, New York, 1998.

Harris J. W., Stocker H. Spherical Segment (Spherical Cap) // Handbook of Mathematics and Computational Science. - New York: Springer, 1998. - 107 p.