# On a Method of Multivariate Density Estimate Basedon Nearest Neighbours Graphs

**Authors:**Beliakov G.^{1}-
**Affiliations:**- Deakin University

**Issue:**Vol 26, No 1 (2018)**Pages:**58-73**Section:**Modeling and Simulation**URL:**https://journals.rudn.ru/miph/article/view/17894**DOI:**https://doi.org/10.22363/2312-9735-2018-26-1-58-73

Cite item

## Full Text

## Abstract

A method of multivariate density estimation based on the reweighted nearest neighbours,mimicking the natural neighbours techniques, is presented. Estimation of multivariate densityis important for machine learning, astronomy, biology, physics and econometrics. A 2-additivefuzzy measure is constructed based on proxies for pairwise interaction indices. The neighboursof a point lying in nearly the same direction are treated as redundant, and the contributionof the farthest neighbour is transferred to the nearer neighbour. The calculation of the localpoint density estimate is performed by the discrete Choquet integral, so that the contributionsof the neighbours all around that point are accounted for. This way an approximation to theSibson’s natural neighbours is computed. The method relieves the computational burden of theDelaunay tessellation-based natural neighbours approach in higher dimensions, whose complexityis exponential in the dimension of the data. This method is suitable for density estimates ofstructured data (possibly lying on lower dimensional manifolds), as the nearest neighbours diﬀersigniﬁcantly from the natural neighbours in this case.

## Full Text

1. Introduction and Problem Formulation Multivariate density estimates from ﬁnite samples play an important role in data analysis and clustering [1]. Among other applications, density estimates provide a way to construct density based metrics [2], density based averages [3, 4], perform density based clustering, and also compute robustly the mode(s) of a distribution [5, 6]. Practical applications include data analysis and machine learning, anomaly detection, econometrics, high energy physics, astronomy, ﬂow cytometry, image analysis and computer vision to name a few. For example, spatial distribution of cosmic matter at megaparsec scale was analysed by using nonparametric density estimates in [7]. Density based metrics are often used in unsupervised data analysis, e.g., in the DBSCAN algorithm [8]. Histograms are traditionally used as density estimates of single variable distributions. Their use in the multivariate setting is problematic because of the rapidly growing number of histogram bins, the majority of which remain empty. Kernel-based density estimates due to the works by Parzen and Rosenblatt [1, 5], often called Parzen-Rosenblatt windows, is a popular multivariate approach, in which a point density estimate is constructed by averaging the values of a kernel function of the distances between a ﬁxed point and the data. One problem with kernel density estimates is the bandwidth selection, which is the smoothing parameter in this process. The values of the bandwidth parameter which are too small result in spiky estimates, values that are too large result in oversmoothing. There are approaches for automatic bandwidth selection based on cross-validation [9] but they are computationally expensive. Another family of density estimates is based on the notion of the Voronoi diagram [10]. A Voronoi cell is a set of points which are closer to one point from the sample than to any other point in that sample. Intuitively, the volumes of Voronoi cells can serve as proxies for density estimation: small Voronoi cells imply high density. One can view Voronoi cells as (polyhedral) bins in a histogram that contain a single datum. From the technical viewpoint, Voronoi cells are not very convenient, as a) there are Voronoi cells of inﬁnite volume, and b) multiple calculation of Voronoi cell volumes is computationally expensive. Instead the dual of the Voronoi diagram, the Delaunay tessellation, is used [11]. The Delaunay tessellation is a partition of the convex hull of the data (and hence Delaunay cells are ﬁnite), and since these cells are simplices, their volume is computed easily in the multivariate setting. The method in [7] averages the reciprocals of the volumes of the neighbouring Delaunay simplices to provide point density estimates at every point in the sample. One important feature of Delaunay tessellation is that the neighbouring simplices involve data located all around the point at which the density estimate is computed. This feature led to the development of the method of “natural neighbours” in scattered data interpolation [12]. The issue with Delaunay tessellation is its complexity: the number of Delaunay cells grows exponentially with the dimension of the space, more precisely as where is the sample size. This is a manifestation of the course of dimensionality. Another approach to density estimation is based on the nearest neighbour type graphs, including the nearest neighbour graph (kNN), minimum spanning tree (MST) and Gabriel graph [13, 14]. The distance from a point to its nearest neighbour can give an estimate of the density, as it provides the volume of an empty sphere near that point. Compared to the Delaunay tessellation, there is no combinatorial explosion of complexity with the increasing dimension, as no space partitioning is required (only 2 pairwise distances are needed to construct the MST or kNN graph). The MST and Gabriel graphs are subgraphs of the Delaunay graph, which prompted their use as proxies for the Delaunay tessellation. But on the other hand, the nearest neighbours are not always located all around a query point, and the nearest neighbour relation is not reciprocal. The kNN graphs may not be connected, which makes them not fully suitable for proximity calculations [14]. Selecting a larger value of also leads to oversmoothing. In this paper we explore one method of density estimation based on the nearest neighbours graph. In this method we take a suﬃciently large value of in the kNN density estimate, but ensure that only the neighbours located all around a query point are counted. That is, we attempt to marry the kNN with the natural neighbours approach, but without performing expensive Delaunay tessellation. To this end we use the notion of the discrete Choquet integral with respect to a specially constructed fuzzy measure. It allows one to account for correlations between the inputs, and explicitly model the notions of redundancy and positive reinforcement. In particular we account for contributions of the neighbours situated in the same direction from a query point and downweight the contribution of the furthest. This way only the contributions to the density estimate from the neighbours all around a query point will count. The problem is formulated as follows. Given a sample of (independent, identically distributed) data of size and dimension ﬁnd a density estimate approximating the probability density of the distribution the data were drawn from. The paper is structured as follows. Section 2 presents the background material needed for the rest of the paper. Section 3 presents the proposed kNN reweighting method, including the construction of a 2-additive fuzzy measure from the interaction indices and computation of the threshold for the size of the cone of directions in the multivariate setting, so that the proportion of data located in such a code remains constant in diﬀerent dimensions. Section 4 provides a numerical illustration and Section 5 concludes. 2. Preliminaries 2.1. Point Density Estimation Problem Let the data set be generated by sampling from a distribution with probability density . The goal of density estimation is to recover an approximation to , denoted ˆ . Non-parametric methods do not assume any speciﬁc form of and hence build ˆ based only on the data. Building a histogram is the traditional approach which usually works in one or two dimensions, but is not suitable in the multivariate setting because of the rapidly growing number of histogram bins where the data are allocated. There are several approaches to density estimation mentioned in the Introduction. In particular, kernel density estimates provide density at a point using that point as a centre of a neighbourhood of selected radius, while Voronoi diagrams provide point density estimates using the nearest neighbours of the point x located all around it. By selecting a kernel function with bandwidth parameter we have The bandwidth aﬀects the roughness or smoothness of the estimate, and kernel based methods are sensitive to the choice of . Voronoi diagram based methods like [7] use the data all around x and the neighbourhood around x is thus obtained automatically. 2.2. Nearest Neighbours and Natural Neighbours Estimators The nearest neighbours is a popular method in machine learning, see e.g. [15]. It is based on calculating the distances between the reference data (it is often called training data, although no actual training in the kNN method takes place) and the query point x , at which either the value of a function or a class label is required. Calculate the pairwise distances (in some norm), and sort the data set in the order of increasing There are many works dedicated to the choice of such a norm, see, e.g. [15, 16], which is a very hard and context dependent problem. In this study we assume it is the Euclidean norm. Then approximate where the weights are determined usually by some non-increasing function see [16, 17]. It was also proposed [18] to use the Induced Ordered Weighted Averaging functions (Induced OWA) instead of the weighted mean to aggregate the values and to learn w from the data. The Choquet integral was used for the same purpose in [19]. Unlike in function approximation, in the case of density estimation the values are not given but need to be estimated from the data set itself. One measure of density applicable to the kNN approach is the reciprocal of the pairwise distances, which we present in Section 3. Another popular method of multivariate approximation is the natural neighbour scheme by Sibson [12, 20, 21]. The idea of this method is to build an interpolant whose value at x would depend on a few data points close to x at the same time distributed all around x , see Figure 1. It favorably contrasts with the nearest neighbour methods in which only the distance from x matters. Figure 1. The nearest neighbours of a query point (left) versus natural neighbours (right) In the natural neighbour scheme, the interpolant is a weighted average of the neighbouring data values where the weight is proportional to the volume of the part of Voronoi cell which is cut by the Voronoi cell when x is added to the Voronoi diagram as one of the sites. Since Voronoi cell borders only a few neighbouring Voronoi cells, only a few neighbouring data points around x participate in calculation of (so called natural neighbours). More recently variations of Sibson’s method were developed, based on other rules for calculating weights Sibson’s interpolant possesses many useful properties, but it is computationally expensive, as each x requires computation of a new Voronoi diagram having x as one of the sites. There are methods that allow an update of the Voronoi diagram when x is added to the list of sites in 2- and 3-variate cases, so that the whole Voronoi diagram needs not be built for every x . Such methods are very competitive, but we are unaware of any extension for more than three variables. 2.3. Fuzzy Measures and Discrete Choquet Integral Aggregation of inputs into a representative output is the subject of aggregation functions [23, 24]. The weighted arithmetic mean (WAM) and the median are the two most commonly employed aggregation functions, and the WAM is used in the traditional kNN when averaging contributions of the nearest neighbours. These functions are not suitable for our purpose as we want to account for input redundancies. The Choquet integral is a tool for explicitly modelling such interactions. While the weights of the inputs in the WAM are associated with relative importances of each input, a discrete fuzzy measure allows one to assign importances to all possible groups of inputs, and thus oﬀers a much greater ﬂexibility for modeling aggregation. Deﬁnition 1. Let . A discrete fuzzy measure is a set function which is monotonic and satisﬁes and In Deﬁnition 1, a subset can be considered as a coalition, so that gives us an idea about the importance or the weight of this coalition. The monotonicity condition implies that adding new elements to a coalition does not decrease its weight. Deﬁnition 2. The discrete Choquet integral with respect to a fuzzy measure is given by (1) where is a non-decreasing permutation of the input x , and by convention. Deﬁnition 3. Let be a fuzzy measure. The M¨obius transformation of is a function deﬁned for every The WAM and ordered weighted averaging (OWA) functions are special cases of Choquet integrals with respect to additive and symmetric fuzzy measures respectively. In this contribution we are speciﬁcally interested in additive fuzzy measures [25, 26]. Deﬁnition 4. A fuzzy measure is called -additive if its M¨obius transformation veriﬁes for any subset with more than elements, and there exists a subset ℬ with elements such that In this work we are interested in 2-additive fuzzy measures, therefore we assume all When dealing with multiple inputs, it is often the case that these are not independent, and there is some interaction (positive or negative) among the inputs. To measure such concepts as the importance of an input and interaction among the inputs we will use the concepts of Shapley value, which measures the importance of an input in all possible coalitions, and the interaction index, which measures the interaction of a pair of inputs , in all possible coalitions [25, 26]. Deﬁnition 5. Let be a fuzzy measure. The Shapley index for every is The Shapley value is the vector It satisﬁes Deﬁnition 6. Let be a fuzzy measure. The interaction index for every pair The interaction indices verify as soon as are positively correlated (negative synergy). Similarly for negatively correlated inputs (positive synergy). for any pair A fundamental property of -additive fuzzy measures, which justiﬁes their use in simplifying interactions between the criteria in multiple criteria decision making is the following [26]. Proposition 1. Let be a -additive fuzzy measure, Then for every such that for every such that Thus -additive measures acquire an interesting interpretation. These are fuzzy measures that limit interaction among the criteria to groups of size at most For instance, for 2-additive fuzzy measures, there are pairwise interactions among the criteria but no interactions in groups of 3 or more. The Choquet integral can also be expressed in terms of interaction indices. For 2-additive fuzzy measures we have [27]: 3. Nearest Neighbour Reweighted Graph As we mentioned in the introduction, this density estimate is based on the kNN graph. Let us ﬁx a value of ( suﬃciently large to include the natural neighbours, of the order of tens to hundreds). Let us also ﬁx a datum, x at which the density estimate will be computed. Calculate the pairwise distances from x to all the other pints in the sample and select the nearest neighbours. Let the density estimate be given as a weighted sum of the values which are (up to a constant factor) the reciprocals of the volumes of spheres whose diameters are the segments between and If we were to use a kNN estimate without reweighting, a large value of would result in oversmoothing. Our goal is to select the weights in such a way that contributions of the neighbours on the same side relative to are not double counted. This way only the natural neighbours all around x will contribute to the sum, and that would be equivalent to using a Delaunay based estimate but without its high complexity when is large. The question is how to perform such weights redistribution. Our main tool will be the discrete Choquet integral with respect to a fuzzy measure. 3.1. Construction of the Fuzzy Measure We now construct such a fuzzy measure based on the proxies for interaction indices, which we call the redundancy values. In our setting the contributions of two neighbours, and , toward point density estimate are redundant if these neighbours lie on the same side from the query point x . The degree of redundancy can be expressed as a function of the cosine of the angle In other metric spaces can be computed without recurring to the scalar product, as a function of distances only. Now, take the redundancy values where is some monotone function chosen as described below. Of course, the redundancy values cannot be taken as the (negative) interaction indices directly, because the interaction indices need to satisfy a number of constraints [25, p. 429], namely, for all where and are the Shapley indices. The constraints are satisﬁed if and only if is a 2-additive fuzzy measure. In addition, the Shapley values are also unknown. While it is possible to set up an optimization problem to select the interaction indices close to the redundancy values, but subject to the constraints (3) , it would be extremely ineﬃcient to solve such a problem for every datum x . Instead we proceed as follows. Let be a triangular conorm [23], a monotone increasing symmetric associative function with neutral element 0. These functions are often used to aggregate inputs so that the total contribution does not exceed 1. The Einstein sum and the maximum function are prototypical examples of triangular conorms. Let the initial contribution of all the nearest neighbours of x be the same Suppose that the neighbour is located further than the inputs and in roughly the same direction, so that are smaller than some threshold, like see Figure 2. We want to redistribute the contribution from the inputproportionally to the redundancy values. Figure 2. The contribution of inputs inside the cone is downweighted We take the new weight Note that 0 and becomes 0 only in case of at least one of the redundancy values The weights of the inputs are incremented by the value The weight is updated By applying these formulas to every neighbour from the furthest to the nearest, we downweight the contribution of the furthest and reallocate their weights to the nearer neighbours as long as those lie in the same direction (in the same spherical cone centered at of angle of We can now state that the resulting reweighted sum corresponds to the Choquet integral with respect to a 2-additive fuzzy measure whose interaction indices are negative and correspond to the redundancy values. Theorem 1. Let the redundancy values and let the weights be computed as where denotes the value of the triangular conorm applied to the arguments that satisfy the condition expressed in its subindex, analogously to the notation. Then the weighted sum is equal to the Choquet integral of with respect to some 2-additive fuzzy measure whose interaction indices are negative only when Proof. Since the values of are inversely proportional to the distances from they are sorted in the order opposite to the order of So we assume the neighbours are sorted in the order of decreasing distance to and hence are sorted in increasing order. Consider the sequential process of calculating the weights . Before the process starts all which are positive and add to one. Take any iteration of this reweighting process, and assume that at its start all 0 and they add to one. We show that after that iteration is completed, the updated weights still add to one and are non-negative. We perform the following two steps and since the value of the triangular conorm is no greater than one, remains non-negative. Hence after the above iteration all are still non-negative and add to one. By applying mathematical induction, these properties are maintained till the end of the iterative reweighting process. The formula (4) expresses the end result of the described reweighting process. The weighted sum can be expressed as the Choquet integral with respect to some fuzzy measure [23]. There are of course many such possible fuzzy measures, including additive and 2-additive measures, because we have only speciﬁed out of 2 fuzzy measure coeﬃcients (in the form of ). In particular for the two-additive measure we have expression (2) [27]. In our case we discard the ﬁrst sum as we only have to account for redundancies (all 6 0) and hence our measure is submodular. We can therefore determine the values of and by matching the coeﬃcients in (2) , (5) with and setting whenever For this we obtain an underdetermined linear system of equations which always has at least one positive (in terms of the values solution. Furthermore we can set up a linear programming problem to maximize the values (in terms of their sum or their minimum) subject to matching (2) , (5) with and the selected which always has a feasible solution (one of which is and all So for the purposes of averaging local density values over the natural neighbours of triangular conorm and a way of calculating the redundancy coeﬃcients (from the cosines of the angles , and then apply the iterative reweighting process expressed in (4) to calculate the density estimateas the Choquet integral with respect to some submodular 2-additive fuzzy measure. In our experiments we used for the threshold and a modiﬁed version of this formula for other thresholds as described in the next section. Three features of the reweighting method can be highlighted. Firstly, this method is equivariant to data translation, rotation and scaling (this property is expected from reliable estimators of density, mode and location). The reason is that the pairwise distances and angles used in calculations are not aﬀected under these linear transformations. Secondly, the computational complexity of the presented algorithm is based on the number of distance and angle calculations. Hence it will have performance gains over other natural neighbours schemata for larger dimensions, notably for 8. Thirdly, this method is fully parallelisable and also suitable for SIMD architectures like Graphics Processing Units (GPUs). Therefore quadratic complexity in seems not to be much of an issue for 6 10 6 . 3.2. Selection of the Threshold We now discuss a method for choosing an appropriate value of the threshold consistent across diﬀerent dimensions . If we choose then the cone in which the data is assumed to be redundant becomes half-space in any dimension, so half of the nearest neighbours of the point x are expected to be located in that half-space (assuming a locally uniform distribution). That may look too broad a choice, and one may select the redundant neighbours in a narrower cone, for example, choosing see Figure 2. In the case of two-dimensional data such a cone will contain roughly a quarter of the nearest neighbours. The diﬃculty is that when the dimension increases, the probability that a near neighbour of x falls into such a cone of angle decreases. This is due to the fact that in higher dimensions the volume of a spherical cone of angle decreases compared to the volume of the ball. Therefore, in order for a spherical cone to contain approximately the same proportion of the near neighbours of a point across diﬀerent dimensions we need to select the threshold as a function of the dimension of the space. Let us consider the ratio of the volume of the intersection of a spherical cone with the ball of radius to the volume of the ball Vol the ratio we want to keep constant. With no loss of generality we can set It is known that where the constant depends only on the dimension . Γ is the standard gamma-function. The spherical cone, which is the intersection of a cone with the ball centered at the vertex of the cone can be represented as the union of two parts, the spherical cap (a non-empty intersection of a ball with a half-plane) and the intersection of the cone with the complement of the mentioned half-space, which we call the base cone see Figure 3. Figure 3. Three-dimensional spherical cone It is also known that the volume of the spherical cap is given by [28] Vol whereis the regularized incomplete beta function, and is the height of the cap. Further, the volume of the base cone of height and base radius is given by Now, let us ﬁx the desired fraction of the volume of the ball for example Then we solve for the equation From we ﬁnd the desired threshold The graph of is presented on Figure 4. As expected, the ﬁrst two values are which correspond to and respectively, but no closed form expression for the other values was found, although some simpliﬁcations using the relations between the gamma and beta functions can be made. Interestingly, the computed values are very well approximated by the function tan and this formula can be used for selecting a suitable value for the cosine of the threshold. The coeﬃcients in the formula for were obtained by the standard least squares regression with the approximation error RMSE= 0 . 004. Figure 4. The graph of the values of the threshold as a function of the dimension and its approximating function Numerical Illustration In order to show the advantages of the proposed method we will use highly structured data in R coming from a lower dimensional manifold. The reason is that if the data are sampled from some standard test distribution, like a mixture of multivariate normals with nearly equal the nearest neighbours of a point are distributed all around that point, and thus will overlap signiﬁcantly with the set of natural neighbours we aim at identifying. In this case the proposed method shows quite similar results as the standard kNN and kernel estimates, provided that the value of or the bandwidth are chosen appropriately to avoid oversmoothing. It is for structured data that we expect signiﬁcant beneﬁts, i.e., when the nearest neighbours signiﬁcantly diﬀer from the natural neighbours. Furthermore, it turns out that this method is not sensitive to the choice of as contributions from the neighbours which are located beyond closer neighbours in the same direction are automatically downweighted. Compared to Delaunay tessellation based methods, we expect to obtain computational advantages for higher dimensions. But for the purposes of illustration we limit ourselves to two-dimensional pictures. A detailed computational benchmarking is a subject for a followup paper. Figure 5 presents a sample generated from a mixture (in equal proportions) of three products of normal distributions with parameters taken as (0 . 13 , 0 . 4 , 0 . 08 , 0 . 001), (0 . 3 , 0 . 3 , 0 . 002 , 0 . 8) and (0 . 25 , 0 . 4 , 0 . 025 , 0 . 025). Notice that the sample from the ﬁrst distribution is practically located on a horizontal line, and because of the second component of the mixture spread in the other direction, standardization of the data does not alleviate this. The simulated mixture has three local modes at the centres of the above normal distributions, with the highest mode at (0 . 13 , 0 . 4) (notice small values of for this component). The colour intensity of the data points in Figure 5 re- ﬂects the computed density at that point. The main mode of this mixture is at (0 . 13 , 0 . 4) and is signiﬁcantly more pronounced than the two other modes. The sample size is 1000. Figure 5. For this structured data sample the reweighted kNN shows advantages over other estimates which fail to identify the mode correctly The reweighted kNN estimate (with = 150, which is quite large) correctly identiﬁes the regions of high density and points correctly to the mode. In contrast, the standard kNN with that value of oversmoothens the estimate and incorrectly identiﬁes the mode as that of the second component of the mixture. Other (smaller) values of in the standard kNN incorrectly position the mode around (0 . 25 , 0 . 4). In fact, a careful manual adjustment to the value of between 10 and 15 yields a better estimate of the mode at (0 . 2 , 0 . 4), but makes the estimate more “spiky” and overestimates the density at other places. 5. Conclusion We have presented a multivariate density estimation method which calculates the local data density from the averaged distance to the natural neighbours of a point x , the nearest neighbours distributed all around x . The natural neighbours oﬀer advantages over the standard kNN and kernel density estimates for structured data, for which the nearest neighbours could be distributed from one side of x , thus introducing a bias into the estimate. However, the methods based on Voronoi and Delaunay tessellations, which compute the natural neighbours, suﬀer from high computational cost even for moderate dimension 5. To alleviate prohibitive computational cost for higher dimensions we proposed a reweighting scheme, in which the contributions from a larger number of the nearest neighbours are reweighted based on their redundancy values, measured through the cosines of the angles these neighbours are visible from the point x . These redundancy values serve as proxies for the interaction indices of a 2-additive fuzzy measure, with respect to which the pairwise distances are averaged by using the discrete Choquet integral. This way the contribution of the neighbours in the same direction as some of the nearer neighbours are discounted, and eventually only the contributions from the neighbours which all lie in distinct directions are accounted for. It is shown how redundancy values should be computed from the cosines of the angles in the multivariate setting according to the dimension . The computational complexity of the proposed method is quadratic in the number of data (same as the complexity of the kNN and kernel density estimates), and the method is fully parallelisable. Besides the kNN, the reweighting scheme can be used in conjunction with the kernel density estimates, which will be studied in the future. We foresee applications of the proposed technique in density based clustering, mode estimation, image segmentation, anomaly detection and other areas of data analytics.

## About the authors

### Gleb Beliakov

Deakin University
**Author for correspondence.**

Email: gleb@deakin.edu.au

Beliakov Gleb - professor, Candidate of Physical and Mathematical Sciences, professor of School of Information Technology of Deakin University, Australia

221 Burwood Hwy, Burwood 3125, Australia## References

- D. W. Scott, Multivariate Density Estimation, John Wiley and Sons, New York, 2015.
- G. Beliakov, M. King, Density Based Fuzzy C-Means Clustering of Non-Convex Patterns, Europ. J. Oper. Res. 173 (2006) 717–728.
- P. Angelov, R. R. Yager, Density-Based Averaging — a New Operator for Data Fusion, Information Sciences 222 (2013) 163–174.
- G. Beliakov, T. Wilkin, On Some Properties of Weighted Averaging with Variable Weights, Information Sciences 281 (2014) 1–7.
- E. Parzen, On the Estimation of a Probability Density Function and the Mode, Annals of Math. Stats. 33 (1962) 1065–1076.
- C. Abraham, G. Biau, B. Cadre, Simple Estimation of the Mode of a Multivariate Density, The Canadian Journal of Statistics 31 (2003) 23–34.
- W. E. Schaap, R. van de Weygaert, Continuous Fields and Discrete Samples: Reconstruction Through Delaunay Tessellations, Astronomy and Astrophysics 363 (2000) L29–L32.
- E. Schubert, J. Sander, M. Ester, H. P. Kriegel, X. Xu, DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN, ACM Trans. Database Syst. 42 (2017) 19:1–19:21. doi: 10.1145/3068335.
- N.-B. Heidenreich, A. Schindler, S. Sperlich, Bandwidth Selection for Kernel Density Estimation: a Review of Fully Automatic Selectors, AStA Adv. Stat. 97 (2013) 403–433.
- G. Voronoi, Nouvelles applications des parametres continus a la theorie des formes quadratiques, Journal fur die Reine und Angewandte Mathematik 133 (1908) 97–178.
- B. Delaunay, Sur la sph`ere vide, Bulletin de l’Academie des Sciences de l’URSS, Classe des sciences mathematiques et naturelles 6 (1934) 793–800.
- R. Sibson, Brief Description of Natural Neighbor Interpolation, in: V. Barnett (Ed.), Interpreting Multivariate Data, John Wiley and Sons, New York, 1981, pp. 21–36.
- W. Stuetzle, Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample, Journal of Classiﬁcation 20 (2003) 25–47.
- H. Samet, Foundations of Multidimensional and Metric Data Structures, Elsevier, Boston, 2006.
- T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer- Verlag, New York, Berlin, Heidelberg, 2001.
- B. Dasarathy, Nearest Neighbor Norms: NN Pattern Classiﬁcation Techniques, IEEE Computer Society Press, Los Alamitos, CA, 1991.
- S. Cost, S. Salzberg, A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features, Machine Learning 10 (1993) 57–78.
- R. Yager, Using Fuzzy Methods to Model Nearest Neighbor Rules, IEEE Trans. on Syst., Man, and Cybernetics 32 (2002) 512–525.
- E. H¨ullermeier, The Choquet-Integral as an Aggregation Operator in Case-Based Learning, in: B. Reusch (Ed.), Computational Intelligence, Theory and Applications, Springer, Berlin, Heidelberg, 2006, pp. 615–627.
- D. Watson, Contouring: A Guide to the Analysis and Display of Spatial Data, Pergamon Press, Oxford, 1992.
- J.-D. Boissonnat, F. Cazals, Smooth Surface Reconstruction Via Natural Neighbour Interpolation of Distance Functions, Proc. of the 16th Annual Symposium on Computational Geometry (2000) 223–232.
- V. V. Belikov, V. D. Ivanov, V. K. Kontorovich, S. A. Korytnik, A. Y. Semenov, The Non-Sibsonian Interpolation: a New Method of Interpolation of the Values of a Function on an Arbitrary Set of Points, Computational Mathematics and Mathematical Physics 37 (1997) 9–15.
- G. Beliakov, A. Pradera, T. Calvo, Aggregation Functions: A Guide for Practitioners, Springer, Heidelberg, 2007.
- M. Grabisch, J.-L. Marichal, R. Mesiar, E. Pap, Aggregation Functions, Cambridge University press, Cambridge, 2009.
- M. Grabisch, T. Murofushi, M. Sugeno (Eds.), Fuzzy Measures and Integrals. Theory and Applications, Physica-Verlag, Heidelberg, 2000.
- M. Grabisch, k-Order Additive Discrete Fuzzy Measures and Their Representation, Fuzzy Sets and Systems 92 (1997) 167–189.
- B. Mayag, M. Grabisch, C. Labreuche, A Characterization of the 2-additive Choquet Integral, in: Proc. of IPMU, Malaga, Spain, 2008, pp. 1512–1518.
- J. W. Harris, H. Stocker, Spherical Segment (Spherical Cap), in: Handbook of Mathematics and Computational Science, Springer, New York, 1998.