## Analysis of the File Distribution Time in Peer-to-Peer Network

**Authors:**Bobrikova EV^{1}, Gaidamaka Y.V^{1}**Affiliations:**- Peoples’ Friendship University of Russia (RUDN University)

**Issue:**Vol 26, No 1 (2018)**Pages:**84-92**Section:**Computer Science**URL:**http://journals.rudn.ru/miph/article/view/17896**DOI:**https://doi.org/10.22363/2312-9735-2018-26-1-84-92- Cite item

# Abstract

Peer-to-peer (P2P) ﬁle sharing systems are responsible for a signiﬁcant part of the Internettraﬃc today. File sharing is perhaps the most popular application among P2P applications. Incomparison with traditional Client/Server ﬁle distribution, P2P ﬁle sharing has some advantages,namely, scalability, bandwidth and others. In this paper we study the minimum distribution timefor getting the entire ﬁle by all of the users in the system, who need this ﬁle. This parameter isclosely associated with the mentioned bandwidth. The expression for the minimum distributiontime uses ﬂuid-ﬂow arguments and includes such terms as the ﬁle size, the upload rates of theseeds and the upload and download rates of the leechers. Using numerical examples and theexpression for the minimum distribution time, we show the eﬃciency of P2P ﬁle sharing. Weconsider the system behaviour, when there are two types of leechers in the system. These typesdiﬀer from each other by their upload bandwidths.

# Full Text

1. Introduction The classical methods of resource distribution are based on the paradigm Client/Server. Here a set of servers distributes a ﬁle to receiving users. This ﬁle could be a software, a content such as a movie or a TV show, etc. The servers and the servers’ bandwidth can be bottlenecks in the process of the ﬁle distribution, when the ﬁle size and the number of receiving nodes become large. Peer-to-peer (P2P) ﬁle sharing is an alternative to the classical Client/Server ﬁle distribution and allows to amplify the uploading capacity of the receiving users to aid in the process of the ﬁle distribution [1]. We should notice, that a node, participating in P2P ﬁle sharing, is usually called a peer. In particular, as soon as a peer has got any portion of the ﬁle, it can redistribute that portion to any of the other receiving peers. There are lots of examples of P2P networks today, for instance: Napster [2], Gnutella [3], Freenet [4], etc. P2P ﬁle sharing systems, transferring ﬁles via the BitTorrent protocol [5], for instance, Vuse [6], etc., are widespread. The intrinsic scalability of these protocols enables to distribute the ﬁles of large sizes to several thousand participants. Here the usage of high bandwidth at the distribution servers is not demanded. A user with an ordinary PC with a regular connection can apply such a way to distribute large ﬁles to an audience, which size is signiﬁcantly higher than what is possible with the classical Client/Server approach. Obviously, P2P ﬁle sharing systems have become well known in the Internet today. But there are a lot of diﬀerent questions, which demand answers. It is necessary to explore: how good quantitatively P2P is in the ﬁle sharing process. Can P2P be essentially better than Client/Server distribution? Can P2P scale well when the number of receiving peers increases and becomes very large? How does the cooperation of the server upload bandwidth, the upload bandwidth of a receiving peer, and the download bandwidth of a receiving peer inﬂuence the total distribution time? Various considerable papers have been dedicated to mathematical models, measurements, and simulation results for P2P networks, [7 - 12]. In this paper, we consider fundamental questions of P2P ﬁle sharing, which are in a basis of P2P ﬁle sharing. We propose an expression for the minimum achievable ﬁle distribution time. In the expression we use ﬂuid arguments. In the expression we also use terms of the basic parameters of a P2P ﬁle sharing system, precisely, the ﬁle size, the number of servers, the number of receiving peers, and the upload and download bandwidths of all the peers, who participate. The expression takes place for arbitrary and heterogeneous upload and download bandwidths. The expression has a closed and simple form. The problem statement was given in [13]. The mentioned result allows to address many of the questions posed above. The rest of the paper is organized as follows. In Section 2 we describe the main problem and demonstrate the main result of the paper. We discuss the applicability of the result. In Section 3 we present numerical results for the case of the receiving peers with diﬀerent upload bandwidths. In Section 4 there is a conclusion. 2. Main Problem Description The fundamental problem in P2P ﬁle sharing is how to distribute a ﬁle to peers in P2P network. In P2P a ﬁle is divided into parts or portions to distribute it. We consider two sets of peers: seeds and leechers. Any seed has a whole copy of the ﬁle and stays in the system to allow other peers to download from itself. Any leecher needs a copy of the ﬁle. At ﬁrst leechers have no portions of the ﬁle and have to download portions from the seeds. But as soon as a leecher gets a portion, it begins to upload the portion to other leechers. So leechers can get portions of the ﬁle from any of the seeds and from other leechers that have portions. A leecher is allowed to leave the system after getting the whole ﬁle. The problem is to minimize the distribution time, i.e. the time, which is needed to obtain the ﬁle by all of the leechers. We consider the following parameters: a set of all peers in P2P network, - a number of all peers; - a set of seeds, - a number of seeds; ℒ - a set of leechers, - a number of leechers, so we have and - the size of a ﬁle; - a download bandwidth of a leecher - an upload bandwidth of a peer . A peer can transfer bits at a maximum rate of and can download bits at a maximum rate of . We take into account the condition that is usual in the Internet today, but we can also consider arbitrary upload and download rates. Furthermore, - the rate, at which a leecher downloads “new” content from seeds and other leechers combined at time - a rate proﬁle; - a distribution time for a rate proﬁle - minimum distribution time achievable over all possible rate proﬁles. The rate proﬁle can achieve min for arbitrary values of and . So we aim to determine the corresponding minimum distribution time min . In fact the model, which is considered in the paper, is a ﬂuid model [14]. Particularly, we imply that a leecher can replicate and forward a bit as soon as it receives the bit. This key assumption allows us to derive remarkably explicit expressions for the minimum distribution time for general, heterogeneous models. This assumption is core and important. Due to it clear expressions for the minimum distribution time for general, heterogeneous models are obtained. As we mentioned, BitTorrent is a very popular protocol for real P2P ﬁle sharing systems. The idea of BitTorrent is to divide the ﬁle to be distributed into parts, named chunks. The size of the chunks is typically 256 KB [8]. In BitTorrent a peer can only forward a chunk as soon as it has fully got the chunk. So chunk-based models are more realistic than ﬂuid-based models. For the chunk-based model, closed form expressions for the minimum distribution time are only available for very simple cases of peers, who have homogeneous bandwidths and inﬁnite download bandwidths, and for the case of heterogeneous systems it is diﬃcult to obtain closed-form expressions for the minimum distribution time. The explicit expressions for the minimum distribution time min , demonstrated in the paper and obtained for the ﬂuid-based model, are lower bounds for more realistic chunk-based models. We can compare the minimum distribution time for a ﬂuid-based model and the minimum distribution time for a chunk-based model. It is possible to show that for homogeneous systems, the error is about where is the number of chunks in the ﬁle. It turns out this error is negligibly small for cases of practical interest, even when ﬁle sizes are medium. Thus although the chunk-based model is more realistic than our ﬂuid-based model, the last one leads to a clear expression for the minimum distribution time for heterogeneous systems. This expression is a good approximation of the minimum distribution time for the chunk-based model for homogeneous systems. We imply, that the bandwidth bottlenecks are in upload and download rates and not in the basis of Internet. This postulate dominate in Internet today. Further, we do not take into account the impacts of network congestion and TCP congestion control. Thus, our expressions for the minimum distribution time min are really approximations for real ﬁle distribution times. However, the expressions can be used to make useful, relevant calculations. Further that can be a base for a ﬁle-distribution protocol for arbitrary upload and download rates. We also make an assumption that each peer in the system takes participation in the ﬁle distribution up until the peer has gained the whole ﬁle. This enables to understand the main aspect: how diﬀerent system parameters aﬀect the P2P ﬁle distribution process. Further, the following parameters are also considered: 3. Minimum File Distribution Time In this section we demonstrate the result for the minimum distribution time min of a ﬁle [13]. We consider all the leechers are equal, but their download and upload bandwidth can diﬀer. Thus, P2P system is heterogeneous. Theorem 1. The minimum distribution time for the general heterogeneous P2P ﬁle sharing system is The expression (2) has a rather simple and explicit form. The theorem gives some distribution scheme for any upload and download parameters and can be used as a reference point for the distribution time for any P2P ﬁle distribution protocol. We can notice, that we actually choose min from three values. Each value has its own sense. We have because the leecher with the lowest download rate cannot receive the ﬁle faster than We also have because the set of seeds cannot distribute fresh bits at a rate faster than a leecher cannot obtain the ﬁle at a rate faster than Finally, we have because the total upload bandwidth of the system is and because leechers need to obtain a total of bits. So we have the lower bound for P2P ﬁle distribution: The theorem ﬁnds that the right-hand side of this inequality is not only a lower bound, but also the exact value of the minimum distribution time min . As we can see, four cases are possible here (the proof of the theorem is based on these cases): Let be the rate at which the seeds send bits to leecher at time . In each case, a rate proﬁle has the same general structure, based on the following assumptions. As soon as each leecher begins to obtain its bits from the seeds, it replicates the obtained bits to each of the other - 1 leechers at some rate less than or equal as shown in Figure 1. Figure 1. The scheme of P2P ﬁle distribution Thus, for each case, the distribution scheme consists of application-level multicast trees. Each tree has a root in the seed, passes through one of the leechers, and ends at each of the - 1 other leechers. Now we consider three examples to show the signiﬁcance and the usefulness of the presented theorem. In each example, it is necessary to distribute a ﬁle of size 25 GB and to calculate min according to the expression (2). There are one seed and ten leechers in P2P network. The seed’s upload bandwidth is usually higher than leecher’s upload bandwidth. We distinguish two types of leechers: ordinary leechers and super leechers. The number of ordinary leechers is denoted as ord and the number of super leechers is denoted as . A super leecher has the same upload bandwidth as a seed, and an ordinary leecher’s upload bandwidth ord is lower. Example 1. Each leecher has download bandwidth = 2000 Kbps. Each ordinary leecher has upload bandwidth ord = 200 Kbps. If we change the number of the ordinary leechers ord from 1 to 10, the value of the minimum distribution time min increases. In Figure 2 we present the dynamics of min at three values of a seed’s upload bandwidth: = 1000 Kbps, = 1500 Kbps, = 2000 Kbps. We can see, the higher the less time is necessary to distribute the ﬁle. Example 2. Here again the download bandwidth of an each leecher is = 2000 Kbps. A seed’s upload bandwidth is a constant = 1500 Kbps. Now we present in Figure 3 the same function, but now we change leecher’s upload bandwidth, we have ord = 200 Kbps, ord = 600 Kbps, ord = 1000 Kbps. Here we can observe similar eﬀect: the higher ord the less time is necessary to distribute the ﬁle. Figure 2. Figure 3. , Example 3. In this example each ordinary leecher has constant upload bandwidth ord = 600 Kbps, and a seed has constant upload bandwidth = 2000 Kbps. Again we draw the plot for function min , depending on a number of ordinary leechers ord . The parameter of leecher’s download bandwidth is changed: = 600 Kbps, = 1600 Kbps, = 2600 Kbps. The plot is presented in Figure 4. Here min = 4 . 63 hours is a constant at low download bandwidth = 600 Kbps. In other cases of ﬁrst the values of min vary, then from the number of ordinary leechers 5 they coincide and increase. The presented examples allow to conclude that our calculations for ﬂuid-based model is rather good for a description of a real P2P ﬁle distribution process in the Internet. The study of the implications of theorem are made in [13]. In (1) we present the fractional error between a chunk-based model and a ﬂuid-based model. This error is obtained for the minimum distribution time min for a homogeneous system with peers having inﬁnite download capacity. It turns out, that for ﬁle size of 350 MB or more, the percentage error between min ( ) and min ( ) is less than 1%, even when there are 10,000 leechers in the network. We can conclude from (1) that, for homogeneous systems, the error can be surely neglected if This condition is also easily satisﬁed for typical ﬁle sizes of the order of several GB. We suppose this is true for heterogeneous systems as well. Figure 4. = const The ﬂuid-based model is very accurate, even though the ﬂuid-based model is not as realistic as the chunk-based model. The ﬂuid-based model has the advantage of providing a simple, explicit expression for the minimum distribution time for general, heterogeneous systems. 4. Conclusion Today P2P ﬁle sharing is an important application in the Internet. The determination of a minimum achievable time for distribution of a ﬁle to all leechers is a fundamental problem in P2P ﬁle sharing network. Distribution of a ﬁle to all leechers means, that all equal small pieces of the ﬁle, named chunks, are obtained by all leechers. The mentioned fundamental problem becomes a complex optimal scheduling problem, when we consider a model in which discrete chunks are kept and forwarded at the peers. In this paper we consider a version of a ﬂuid-based model for the problem of the determination of the minimum achievable distribution time. We get an explicit expression for this fundamental problem, using this ﬂuid-based model. The result is simple, useful. On the basis of the expression we construct three numerical examples, demonstrating the behaviour of the minimum distribution time depending on the number of super leechers with the high upload bandwidth and ordinary leechers with the ordinary upload bandwidth. As it was expected, with the growing number of ordinary leechers, the total bandwidth available on the network for distribution is reduced, which leads to the increase of the value of the minimum distribution time min . As we can see from the plot in Figure 2, this time depends on the upload bandwidth , and for super leechers with = 2 Mbps the minimum distribution time min is minimal. From the plot in Figure 3 we can see the minimum distribution time min also depends on the upload bandwidth ord , and for ordinary leechers with ord = 1 Mbps the minimum distribution time min is minimal. Finally, from the plot in Figure 4 we see the minimum distribution time min depends on the download bandwidth , and for leechers with = 2 . 6 Mbps the minimum distribution time min is minimal. Note that, the values of min are the same for = 1 . 6 Mbps and = 2 . 6 Mbps starting with the value ord = 5. The obtained expression is rather close to the similar result for a more realistic chunkbased model in the case of homogeneous system. The result of this paper has a very high accuracy. The given fractional error demonstrates this fact. It turns out, that the discussed ﬂuid-based model is a rather good approximation to a real network. The results of the paper can be developed in diﬀerent directions. One direction is to compare the minimum distribution time in the ﬂuid-based model with the minimum distribution time in the chunk-based model for heterogeneous systems. Another direction is a determination of the minimum distribution time for networks which limit the number of simultaneous connections between participating peers.

# About the authors

### E V Bobrikova

Peoples’ Friendship University of Russia (RUDN University)
**Author for correspondence.**

Email: bobrikova_ev@rudn.university

6 Miklukho-Maklaya St., Moscow, 117198, Russian Federation

Bobrikova E. V. - Candidate of Physical and Mathematical Sciences, senior lecturer of Department of Applied Probability and Informatics of Peoples’ Friendship University of Russia (RUDN University)

### Yu V Gaidamaka

Peoples’ Friendship University of Russia (RUDN University)
Email: gaydamaka_yuv@rudn.university

6 Miklukho-Maklaya St., Moscow, 117198, Russian Federation

Gaidamaka Yu. V. - Candidate of Physical and Mathematical Sciences, associate professor of Department of Applied Probability and Informatics of Peoples’ Friendship University of Russia (RUDN University)

# References

- Yu. V. Gaidamaka, A. K. Samuilov, Analysis of Playback Continuity for Video Streaming in Peer-to-Peer Networks with Data Transfer Delays, T-Comm: Telecommunications and Transport (11) (2013) 77–81, in Russian.
- Napster Company Info. URL http://us.napster.com/availability
- The Gnutella Protocol Speciﬁcation v0.4. URL https://gnunet.org/node/147
- What is Freenet? Freenet Company Info. URL https://freenetproject.org
- The BitTorrent Protocol Speciﬁcation. URL http://www.bittorrent.org/beps/bep_0003
- Vuse BitTorrent Client. URL http://www.vuze.com
- X. Yang, G. de Veciana, Service Capacity of Peer to Peer Networks, in: Proceedings of IEEE INFOCOM, Vol. 4, 2004, pp. 2242–2252.
- D. Qiu, R. Srikant, Modeling and Performance Analysis of BitTorrent-Like Peer-to- Peer Networks, in: Proceedings of ACM SIGCOMM, Vol. 34, 2004, pp. 367–378.
- Z. Mordji, M. Amad, D. Aissani, A Derived Queueing Network Model for Structured P2P Architectures, in: VECoS 2014, 2014, pp. 76–84.
- A. Ferragut, F. Paganini, Fluid Models of Population and Download Progress in P2P Networks, IEEE Trans. on Control of Network Systems 3 (1) (2016) 34–45.
- Yu. V. Gaidamaka, E. V. Bobrikova, E. G. Medvedeva, The Application of Fluid Models to the Analysis of Peer-to-Peer Network, Bulletin of PFUR. Series: Mathematics. Informatics. Physics (4) (2016) 15–25, in Russian.
- Yu. V. Gaidamaka, , E. G. Medvedeva, S. I. Salpagarov, E. V. Bobrikova, Analysis of Model for Multichannel Peer-to-peer TV Network with View-upload Decoupling Scheme, RUDN Journal of Mathematics, Information Sciences and Physics 25 (2) (2017) 123–132, in Russian. doi: 10.22363/2312-9735-2017-25-2-123-132.
- R. Kumar, K. W. Ross, Optimal Peer-Assisted File Distribution: Single and Multi- Class Problems, 2006.
- K. E. Samouylov, E. V. Bobrikova, A Simple Fluid Model of P2P File Sharing Network, T-Comm: Telecommunications and Transport (7) (2012) 180–184, in Russian.