# Group averaging and the Gini deviation

## Abstract

It is known that partitioning a society into groups with subsequent averaging in each group decreases the Gini coefficient. The resulting Lorenz function is piecewise linear. This study deals with a natural question: by how much the Gini coefficient could decrease when passing to a piecewise linear Lorenz function? Obtained results are quite illustrative (since they are expressed in terms of the geometric parameters of the polygon Lorenz curve, such as the lengths of its segments and the angles between successive segments) upper bound estimates for the maximum possible change in the Gini coefficient with a restriction on the group shares, or on the difference between the averaged values of the attribute for consecutive groups. It is shown that there exist Lorenz curves with the Gini coefficient arbitrarily close to one, and at the same time with the Gini coefficient of the averaged society arbitrarily close to zero.

## Full Text

Introduction: the Gini coefficient of a piecewise linear Lorenz function The Gini coefficient is one of the most important (income) inequality measures. Originally due to Gini (1912), also see (Ceriani, Verme, 2012), it is defined as twice the area of the region between the Lorenz curve and the diagonal (Figure 1). The Lorenz curve is the graph of the Lorenz function, that is, the function defined on such that is equal to the fraction of the total income or wealth, owned by the poorest fraction of the population. The Lorenz function is necessarily non-decreasing and concave up, i.e. is a non-decreasing function of for every . Population Income Figure 1. The Lorenz curve Throughout the paper, the Gini coefficient corresponding to the Lorenz function is denoted by or simply by . The Gini coefficient is widely used in economics and other fields of knowledge (see more details in: Arnold, 2007; Farris, 2010; Astashenko, Malykhin, 2012; Pavlov, Pavlova, 2016, 2018), but most often for measuring income inequality. For example, the Gini coefficient for the total volume of monetary income of population in Russia in 2019 was 0,411.[92] In the European Union in 2019 the Gini coefficient for equivalised disposable income ranged from 0,228 for Slovakia to 0,408 for Bulgaria according to Eurostat.[93] If the Lorenz curve is a polygonal chain (which corresponds to the division of society into groups with uniform distribution in each group) then the Gini coefficient of the society is (Kämpke, Radermacher, 2015): where , , ,…, are the vertices of this chain. Consider a typical oval-shaped Lorenz curve (with no segment of population having a uniform income distribution). Let be a finite increasing sequence on the -axis. If we average out the income in each group , ,…, , then the resulting Lorenz curve will be piecewise linear and inscribed into original Lorenz curve (Figure 2). Since is concave up and does not contain a non-trivial segment, the only common ponts of and are , ,…, . Hence is above everywhere except these points, so the area of the region between and the diagonal of the square is strictly less than between and this diagonal. Therefore . Figure 2. The original Lorenz curve and its piecewise linearization The procedure of partitioning the population into 5 or 10 equal groups (i.e. into quintiles or deciles) is a standard one and is used “for the convenience of computations and to increase the analyticity of the data.”[94] Such data modification can lead to computational errors. How much does the Gini coefficient decrease when such a procedure is used? In the above mentioned total monetary income in Russia in 2019 the combined income of the five 20 percent groups is, respectively, 5,3; 10,1; 15,1; 22,6, and 46,9 percent of the total monetary income. It follows from (1) that this partition (with subsequent averaging in each group) corresponds to the Gini coefficient of 0,319, which is considerably less than 0,411! Literature review Estimation of the Gini coefficient from incomplete/grouped data has been studied extensively. Since grouped data leads to Lorenz curve which is a polygonal chain, and the definite integral over the latter is computed with the trapezoidal rule exactly, any estimation formula for this rule (e.g., see: Zorich, 2019) produces an estimate for the Gini coefficient . However, there are better alternatives. For example, Gastwirth (1972) presented several methods for calculation of the lower and upper bounds using a probability density function. Golden (2008) proposed an easy to compute estimate for data presented in cumulative income quintiles using a single point of the Lorenz curve. Farris (2012) gave interval estimates using aggregate parameters (such as the number of individuals/families in a percentile and their overall wealth). Fellman (2012) surveyed further publications and concluded that “no method is uniformly optimal, but the trapezium rule is almost always inferior, and Simpson’s rule is superior. Golden’s method is usually of medium quality.” In this note we obtain upper bound estimates for the maximum possible change in the Gini coefficient with a restriction on the group shares, or on the difference between the averaged values of the attribute for consecutive groups. These estimates are as illustrative as in (Golden, 2008) because they are expressed in terms of the geometric parameters of the polygonal Lorenz curve (such as the lengths of its segments and the angles between successive segments), but some of them are more accurate. Let be an arbitrary Lorenz function/curve and - some finite increasing sequence on the -axis . Then or simply will denote the Lorenz function/curve which is obtained from by linearization on segments whose endpoints are consecutive elements of . We will call a piecewise inearization of corresponding to and call the difference the deviation of the Gini coefficient or simply the Gini deviation. We show that the deviation of the Gini coefficient can be arbitrary close to (i.e., similtaniously can be arbitrary close to , and to ). A Lorenz curve with a large Gini deviation Example 1. Consider a Lorenz curve which is a polygonal chain , where , , , and is small (so that is much closer to the -axis than to the -axis and is much closer to the vertical line than to the horizontal line , see Figure 3). The lines and have equations and respectively; we denote their intersection by . By solving the system of these equations, we get . As is close to the vertex , the area of the region between and the diagonal is close to the area of the triangle with the vertices , and , and the latter area equals 1/2. This means that is close to . could be calculated exactly using (1), but it is easy to do so by doubling the difference between 1/2 and the area of the region under . This region is a union of a rectangle (actually, a square) and two triangles. Its area is equal to Thus, is indeed close to for small . B A e2 1 - e2 e 1 - e C D Figure 3. Lorenz curve Let , then is a polygonal chain . Since the points and are close to the diagonal, is close to . We calculate it using (1). The differences equal , , for 1, 2, 3 respectively; the sums equal , , respectively. Therefore, both and are of order for the considered Lorenz curve . This means that is close to , and is close to for a small . Note the following features of the considered : the share of one of the groups (the second one) is close to and the angle between some segments of (namely, between and and also between and ) is close to . These features are not accidental. We show below that both the smallness of the share of each group or the smallness of the angle between every two consecutive segments of (or of every two segments standing the next but one) implies the smallness of the Gini deviation. The Gini deviation and the shares of groups Theorem 1. Let be the largest share of groups in a society (i.e. the maximum of values for ). Then . Proof. Let be a piecewise linearization of the Lorenz curve corresponding to a sequence . For every , will denote the common value of and . If , then the society is “divided” into a single group whose share is equal to . The theorem is valid in this case as the Gini deviation never exceeds . So we can assume that . Consider two consecutive vertices and of . Since is a graph of a non-decreasing function, its entire piece lying between and is contained in the right triangle , which is bounded by the hypotenuse from above and by a horizontal leg and a vertical leg from below (Figure 4). Figure 4. The Gini deviation and the shares of groups Consequently, the part of the region between and which lies between and is also included in this triangle. Therefore, the whole region between and is included in the union of such right triangles. Hence it is enough to show that the combined area of these triangles does not exceed . In the triangle , the leg is equal to and the leg is equal to , so its area is equal to . The deviation is equal to twice the area of the region between and , which does not exceed according to the previous calculations. But . Theorem 1 is proved. If а society is divided into equal groups, then share each of them is equal to . Then, according to Theorem 1, the Gini deviation does not exceed 1/n. Therefore, the maximum possible Gini deviation in the above mentioned total monetary income distribution in Russia in 2019 is 1/5 = 0,2. The real deviation for this distribution is equal to , which is a significant fraction of . We obtained using (1) and the data from Federal State Statistics Service.[95] The Gini deviation estimate given by Theorem 1 is sharp for every . For example, in case of perfect inequality in a society (all the income/wealth belongs to a single person), . The corresponding Lorenz curve is a two-segment polygonal chain , where almost coincides with . Divide such a society into several groups so that the share of the richest of them is equal to an arbitrary . Then will consist of a horizontal segment , where (this segment may correspond to several groups of the society) and an inclined line segment . The doubled area between and the diagonal is equal to which is . Thus, as required. The Gini deviation and the angles between segments Every two consecutive segments of a polygonal Lorenz curve are at the angle from to to each other. The more segments, the smaller this angle can be uniformly given a fixed Gini coefficient. It turns out that a constraint on the magnitude of such an angle also limits the Gini deviation. It is also possible to consider a constraint on segments standing the next but one. Since the slope of the segment[96] with the vertices и is the average income in the group , the proximity of the slopes of two consecutive groups is equivalent to the closeness of the average incomes in these groups. We denote the points and by and , respectively, and will consider and as new segments of . Theorem 2. Let be the largest of the angles between the segments of standing the next but one. Then a) if , ,..., denote the lengths of the segments , ,..., , then ; b) if , ,..., denote the lengths of the projections of the segments , ,..., on the diagonal , then ; c) if is the largest fraction of the projections of the segments of on the diagonal , then ; d) . In each item a-d, can be replaced by the largest of the angles between any two consecutive segments of . Proof: a) we use the following well-known fact from geometry (e.g. see: Boltyanskij et. al., 1974): if the length of a side of a triangle is equal to , and the other two sides make angles with that in total do not exceed , then the area of such a triangle does not exceed . Consider an arbitrary segment of , as well as the previous and subsequent segments (Figure 5). P O D Figure 5. The Gini deviation and the angles between segments Due to concavity of , every point lying between the arc of (the arc is not shown in Figure 5) and the segment , must be not lower than both lines and , i.e. it belongs to triangle , where is the intersection of the lines and . By condition, the angle between the lines and is at most . Then, by the mentioned fact, the area of the triangle does not exceed . If the lines and coincide, then triangle degenerates into segment having zero area. Therefore . Item a is proved; b) let and denote, respectively, the projections of points and on the diagonal . Then the length of the segment does not exceed the length of segment multiplied by . Thus and further Item b is proved; c) it follows directly from the condition of item c that whence d) follows from c as since . If is the largest of the angles between any two consecutive segments of , then the largest of the angles between the segments of standing the next but one does not exceed , so can be replaced by in each item a-d. Theorem 2 is proved. A comparison with Golden’s approach Golden (2008) proposed a simple way to compute the upper bound Gini coefficient estimate for data presented in cumulative income quintiles. We compare the Gini deviation corresponding to this estimate, obtained for a particular Lorenz function , with the estimates from our Theorem 2 and find that the estimates given in items a, b, c of Theorem 2 are more accurate. Golden called his method for approximating the upper bound of the Lorenz curve’s Gini coefficient “the Z-gradient rule”. The estimate given by this rule is the Gini coefficient of a three-segment polygonal chain (which we further denote by ), where , , and points and lie on the -axis and -axis, respectively, so that the line is tangent to at a point of farthest from the diagonal . It is known that , see Hoover (1936) and Kakwani (1980). Since , , so that and . The tangent line to the Lorenz curve at the point has an equation , or , thus and . Therefore, by (1). Let , , ,4,…, be the first coordinates of the vertices of (which is the linearization of corresponding to the division of the segment into five equal subsegments). Then by (1), therefore which is the error of Golden’s approximation. We organize the calculations of the estimates from Theorem 2 in the Table. Here is the slope of the -th segment of the linearized Lorenz curve and is its angle with the positive direction of the -axis (measured in radians). The largest half-angle between the segments of standing the next but one is rad (the angle between the first and the third segments), its tangent is equal to . The sum of the squares of the segments of is equal to , the sum of the squares of the projections of these segments on the diagonal is . Calculations of the estimates from Theorem 2 0 0 0,2 0,04 0,2 0,04 0,2 0,197 0,042 0,029 0,4 0,16 0,2 0,12 0,6 0,54 0,343 0,054 0,051 0,6 0,36 0,2 0,2 1 0,785 0,294 0,245 0,08 0,08 0,8 0,64 0,2 0,28 1,4 0,951 0,205 0,165 0,118 0,115 1 1 0,2 0,36 1,8 1,064 0,139 0,113 0,17 0,157 S 0,464 0,432 max 0,294 0,343 Consequently, the estimates obtained in items a, b, c of Theorem 2 are respectively (rounded to the nearest thousandth), which are considerably more accurate, than obtained by applying the Golden’s Z-gradient rule. The largest angle between any two consecutive segments of is rad (the angle between the first and the second segments). Its tangent is equal to , which is 18% greater, than . Hence replacing with in the latter three displayed formulas increases the estimates by 18% which still leaves them less than . Conclusion We obtained upper bound estimates for the maximum possible change in the Gini coefficient with a restriction on the group shares, or on the difference between the averaged values of the attribute for consecutive groups (geometrically - in terms of the geometric parameters of the polygon Lorenz curve, such as the lengths of its segments and the angles between successive segments). For a considered particular Lorenz curve these estimates are much more accurate than derived by Golden (2008). Further research may include deriving still better (ultimately, sharp) estimates, e.g. such that for a given the Gini deviation could not be multiplied by . The case of equal groups is of particular interest.

×

### Oleg I. Pavlov

Peoples’ Friendship University of Russia (RUDN University)

Author for correspondence.
Email: pavlov-oi@rudn.ru

PhD, Associate Professor of Economic and Mathematic Modelling Department, Economic Faculty

6 Miklukho-Maklaya St, Moscow, 117198, Russian Federation

### Olga Yu. Pavlova

All-Russian Correspondence Multidisciplinary School

Email: lolgau@yandex.ru

PhD, Associate Professor at the Department of Higher Mathematics

B-234 Vorob'evy Gory, Moscow, 119234, Russian Federation

## References

1. Arnold, B.C. (2007). The Lorenz curve: Evergreen after 100 years. In S. Betti, A. Lemmi (Eds.), Advances in Income Inequality Concentration Measures (pp. 12-24). New York: Routledge.
2. Astashenko, A.N., & Malykhin, V.I. (2012). Income inequality measures. LAP Lambert Academic Publishing.
3. Boltyanskij, V.G., Sidorov, Yu.V., & Shabunin, M.I. (1974). Lectures and problems on elementary mathematics. Moscow: Nauka Publ. (In Russ.)
4. Ceriani, L., & Verme, P. (2012). The origins of the Gini index: Extracts from Variabilità e Mutabilità (1912) by Corrado Gini. J. Econ. Inequal, 10, 412-443.
5. Farris, F.A. (2010). The Gini index and measures of inequality. American Mathematical Monthly, 117(10), 851-864.
6. Fellman, J. (2012). Estimation of Gini coefficients using Lorenz curves. Journal of Statistical and Econometric Methods, 1(2), 31-38.
7. Gastwirth, J. (1972). The estimation of the Lorenz curve and Gini index. Rev. Econom. Statist, 54, 306-316.
8. Gini, C. (1912). Variabilità e mutuabilità: Contributo allo studio delle distribuzioni e delle relazioni statistiche. Bologna: C. Cuppini.
9. Golden, J. (2008). A simple geometric approach to approximating the Gini coefficient. The Journal of Economic Education, 39(1), 68-77
10. Hoover, E. (1936). The measurement of industrial localization. The Review of Economics and Statistics, 18, 162-171.
11. Kakwani, N. (1980). Income inequality and poverty: Methods of estimation and poverty applications. Oxford University Press.
12. Kämpke, T., & Radermacher, F.J. (2015). Income modeling and balancing. A rigorous treatment of distribution patterns. Lecture Notes in Economics and Mathematical Systems, 679, 44-53.
13. Pavlov, O.I., & Pavlova, O.Yu. (2016). The Lorenz curve and a mathematical definition of the middle class. Management of Economic Systems, (12). Retrieved March 15, 2021, from http://uecs.ru/uecs-94-942016/item/4239-2016-12-24-07-45-16
14. Pavlov, O.I., & Pavlova, O.Yu. (2018). Differential deviation and the Gini coefficient. Russian Economics Online-Journal, (4). Retrieved March 15, 2021, from http://www.e-rej.ru/publications/176/%D0%9F/
15. Zorich, V.A. (2019). Mathematical analysis (part 1). Moscow: MCCME Publ. (In Russ.)

Copyright (c) 2021 Pavlov O.I., Pavlova O.Y.