Analysis of the File Distribution Time in Peer-to-Peer Network

Cover Page

Abstract


Peer-to-peer (P2P) file sharing systems are responsible for a significant part of the Internettraffic today. File sharing is perhaps the most popular application among P2P applications. Incomparison with traditional Client/Server file distribution, P2P file sharing has some advantages,namely, scalability, bandwidth and others. In this paper we study the minimum distribution timefor getting the entire file by all of the users in the system, who need this file. This parameter isclosely associated with the mentioned bandwidth. The expression for the minimum distributiontime uses fluid-flow arguments and includes such terms as the file size, the upload rates of theseeds and the upload and download rates of the leechers. Using numerical examples and theexpression for the minimum distribution time, we show the efficiency of P2P file sharing. Weconsider the system behaviour, when there are two types of leechers in the system. These typesdiffer from each other by their upload bandwidths.


1. Introduction The classical methods of resource distribution are based on the paradigm Client/Server. Here a set of servers distributes a file to receiving users. This file could be a software, a content such as a movie or a TV show, etc. The servers and the servers’ bandwidth can be bottlenecks in the process of the file distribution, when the file size and the number of receiving nodes become large. Peer-to-peer (P2P) file sharing is an alternative to the classical Client/Server file distribution and allows to amplify the uploading capacity of the receiving users to aid in the process of the file distribution [1]. We should notice, that a node, participating in P2P file sharing, is usually called a peer. In particular, as soon as a peer has got any portion of the file, it can redistribute that portion to any of the other receiving peers. There are lots of examples of P2P networks today, for instance: Napster [2], Gnutella [3], Freenet [4], etc. P2P file sharing systems, transferring files via the BitTorrent protocol [5], for instance, Vuse [6], etc., are widespread. The intrinsic scalability of these protocols enables to distribute the files of large sizes to several thousand participants. Here the usage of high bandwidth at the distribution servers is not demanded. A user with an ordinary PC with a regular connection can apply such a way to distribute large files to an audience, which size is significantly higher than what is possible with the classical Client/Server approach. Obviously, P2P file sharing systems have become well known in the Internet today. But there are a lot of different questions, which demand answers. It is necessary to explore: how good quantitatively P2P is in the file sharing process. Can P2P be essentially better than Client/Server distribution? Can P2P scale well when the number of receiving peers increases and becomes very large? How does the cooperation of the server upload bandwidth, the upload bandwidth of a receiving peer, and the download bandwidth of a receiving peer influence the total distribution time? Various considerable papers have been dedicated to mathematical models, measurements, and simulation results for P2P networks, [7 - 12]. In this paper, we consider fundamental questions of P2P file sharing, which are in a basis of P2P file sharing. We propose an expression for the minimum achievable file distribution time. In the expression we use fluid arguments. In the expression we also use terms of the basic parameters of a P2P file sharing system, precisely, the file size, the number of servers, the number of receiving peers, and the upload and download bandwidths of all the peers, who participate. The expression takes place for arbitrary and heterogeneous upload and download bandwidths. The expression has a closed and simple form. The problem statement was given in [13]. The mentioned result allows to address many of the questions posed above. The rest of the paper is organized as follows. In Section 2 we describe the main problem and demonstrate the main result of the paper. We discuss the applicability of the result. In Section 3 we present numerical results for the case of the receiving peers with different upload bandwidths. In Section 4 there is a conclusion. 2. Main Problem Description The fundamental problem in P2P file sharing is how to distribute a file to peers in P2P network. In P2P a file is divided into parts or portions to distribute it. We consider two sets of peers: seeds and leechers. Any seed has a whole copy of the file and stays in the system to allow other peers to download from itself. Any leecher needs a copy of the file. At first leechers have no portions of the file and have to download portions from the seeds. But as soon as a leecher gets a portion, it begins to upload the portion to other leechers. So leechers can get portions of the file from any of the seeds and from other leechers that have portions. A leecher is allowed to leave the system after getting the whole file. The problem is to minimize the distribution time, i.e. the time, which is needed to obtain the file by all of the leechers. We consider the following parameters: a set of all peers in P2P network, - a number of all peers; - a set of seeds, - a number of seeds; ℒ - a set of leechers, - a number of leechers, so we have and - the size of a file; - a download bandwidth of a leecher - an upload bandwidth of a peer . A peer can transfer bits at a maximum rate of and can download bits at a maximum rate of . We take into account the condition that is usual in the Internet today, but we can also consider arbitrary upload and download rates. Furthermore, - the rate, at which a leecher downloads “new” content from seeds and other leechers combined at time - a rate profile; - a distribution time for a rate profile - minimum distribution time achievable over all possible rate profiles. The rate profile can achieve min for arbitrary values of and . So we aim to determine the corresponding minimum distribution time min . In fact the model, which is considered in the paper, is a fluid model [14]. Particularly, we imply that a leecher can replicate and forward a bit as soon as it receives the bit. This key assumption allows us to derive remarkably explicit expressions for the minimum distribution time for general, heterogeneous models. This assumption is core and important. Due to it clear expressions for the minimum distribution time for general, heterogeneous models are obtained. As we mentioned, BitTorrent is a very popular protocol for real P2P file sharing systems. The idea of BitTorrent is to divide the file to be distributed into parts, named chunks. The size of the chunks is typically 256 KB [8]. In BitTorrent a peer can only forward a chunk as soon as it has fully got the chunk. So chunk-based models are more realistic than fluid-based models. For the chunk-based model, closed form expressions for the minimum distribution time are only available for very simple cases of peers, who have homogeneous bandwidths and infinite download bandwidths, and for the case of heterogeneous systems it is difficult to obtain closed-form expressions for the minimum distribution time. The explicit expressions for the minimum distribution time min , demonstrated in the paper and obtained for the fluid-based model, are lower bounds for more realistic chunk-based models. We can compare the minimum distribution time for a fluid-based model and the minimum distribution time for a chunk-based model. It is possible to show that for homogeneous systems, the error is about where is the number of chunks in the file. It turns out this error is negligibly small for cases of practical interest, even when file sizes are medium. Thus although the chunk-based model is more realistic than our fluid-based model, the last one leads to a clear expression for the minimum distribution time for heterogeneous systems. This expression is a good approximation of the minimum distribution time for the chunk-based model for homogeneous systems. We imply, that the bandwidth bottlenecks are in upload and download rates and not in the basis of Internet. This postulate dominate in Internet today. Further, we do not take into account the impacts of network congestion and TCP congestion control. Thus, our expressions for the minimum distribution time min are really approximations for real file distribution times. However, the expressions can be used to make useful, relevant calculations. Further that can be a base for a file-distribution protocol for arbitrary upload and download rates. We also make an assumption that each peer in the system takes participation in the file distribution up until the peer has gained the whole file. This enables to understand the main aspect: how different system parameters affect the P2P file distribution process. Further, the following parameters are also considered: 3. Minimum File Distribution Time In this section we demonstrate the result for the minimum distribution time min of a file [13]. We consider all the leechers are equal, but their download and upload bandwidth can differ. Thus, P2P system is heterogeneous. Theorem 1. The minimum distribution time for the general heterogeneous P2P file sharing system is The expression (2) has a rather simple and explicit form. The theorem gives some distribution scheme for any upload and download parameters and can be used as a reference point for the distribution time for any P2P file distribution protocol. We can notice, that we actually choose min from three values. Each value has its own sense. We have because the leecher with the lowest download rate cannot receive the file faster than We also have because the set of seeds cannot distribute fresh bits at a rate faster than a leecher cannot obtain the file at a rate faster than Finally, we have because the total upload bandwidth of the system is and because leechers need to obtain a total of bits. So we have the lower bound for P2P file distribution: The theorem finds that the right-hand side of this inequality is not only a lower bound, but also the exact value of the minimum distribution time min . As we can see, four cases are possible here (the proof of the theorem is based on these cases): Let be the rate at which the seeds send bits to leecher at time . In each case, a rate profile has the same general structure, based on the following assumptions. As soon as each leecher begins to obtain its bits from the seeds, it replicates the obtained bits to each of the other - 1 leechers at some rate less than or equal as shown in Figure 1. Figure 1. The scheme of P2P file distribution Thus, for each case, the distribution scheme consists of application-level multicast trees. Each tree has a root in the seed, passes through one of the leechers, and ends at each of the - 1 other leechers. Now we consider three examples to show the significance and the usefulness of the presented theorem. In each example, it is necessary to distribute a file of size 25 GB and to calculate min according to the expression (2). There are one seed and ten leechers in P2P network. The seed’s upload bandwidth is usually higher than leecher’s upload bandwidth. We distinguish two types of leechers: ordinary leechers and super leechers. The number of ordinary leechers is denoted as ord and the number of super leechers is denoted as . A super leecher has the same upload bandwidth as a seed, and an ordinary leecher’s upload bandwidth ord is lower. Example 1. Each leecher has download bandwidth = 2000 Kbps. Each ordinary leecher has upload bandwidth ord = 200 Kbps. If we change the number of the ordinary leechers ord from 1 to 10, the value of the minimum distribution time min increases. In Figure 2 we present the dynamics of min at three values of a seed’s upload bandwidth: = 1000 Kbps, = 1500 Kbps, = 2000 Kbps. We can see, the higher the less time is necessary to distribute the file. Example 2. Here again the download bandwidth of an each leecher is = 2000 Kbps. A seed’s upload bandwidth is a constant = 1500 Kbps. Now we present in Figure 3 the same function, but now we change leecher’s upload bandwidth, we have ord = 200 Kbps, ord = 600 Kbps, ord = 1000 Kbps. Here we can observe similar effect: the higher ord the less time is necessary to distribute the file. Figure 2. Figure 3. , Example 3. In this example each ordinary leecher has constant upload bandwidth ord = 600 Kbps, and a seed has constant upload bandwidth = 2000 Kbps. Again we draw the plot for function min , depending on a number of ordinary leechers ord . The parameter of leecher’s download bandwidth is changed: = 600 Kbps, = 1600 Kbps, = 2600 Kbps. The plot is presented in Figure 4. Here min = 4 . 63 hours is a constant at low download bandwidth = 600 Kbps. In other cases of first the values of min vary, then from the number of ordinary leechers 5 they coincide and increase. The presented examples allow to conclude that our calculations for fluid-based model is rather good for a description of a real P2P file distribution process in the Internet. The study of the implications of theorem are made in [13]. In (1) we present the fractional error between a chunk-based model and a fluid-based model. This error is obtained for the minimum distribution time min for a homogeneous system with peers having infinite download capacity. It turns out, that for file size of 350 MB or more, the percentage error between min ( ) and min ( ) is less than 1%, even when there are 10,000 leechers in the network. We can conclude from (1) that, for homogeneous systems, the error can be surely neglected if This condition is also easily satisfied for typical file sizes of the order of several GB. We suppose this is true for heterogeneous systems as well. Figure 4. = const The fluid-based model is very accurate, even though the fluid-based model is not as realistic as the chunk-based model. The fluid-based model has the advantage of providing a simple, explicit expression for the minimum distribution time for general, heterogeneous systems. 4. Conclusion Today P2P file sharing is an important application in the Internet. The determination of a minimum achievable time for distribution of a file to all leechers is a fundamental problem in P2P file sharing network. Distribution of a file to all leechers means, that all equal small pieces of the file, named chunks, are obtained by all leechers. The mentioned fundamental problem becomes a complex optimal scheduling problem, when we consider a model in which discrete chunks are kept and forwarded at the peers. In this paper we consider a version of a fluid-based model for the problem of the determination of the minimum achievable distribution time. We get an explicit expression for this fundamental problem, using this fluid-based model. The result is simple, useful. On the basis of the expression we construct three numerical examples, demonstrating the behaviour of the minimum distribution time depending on the number of super leechers with the high upload bandwidth and ordinary leechers with the ordinary upload bandwidth. As it was expected, with the growing number of ordinary leechers, the total bandwidth available on the network for distribution is reduced, which leads to the increase of the value of the minimum distribution time min . As we can see from the plot in Figure 2, this time depends on the upload bandwidth , and for super leechers with = 2 Mbps the minimum distribution time min is minimal. From the plot in Figure 3 we can see the minimum distribution time min also depends on the upload bandwidth ord , and for ordinary leechers with ord = 1 Mbps the minimum distribution time min is minimal. Finally, from the plot in Figure 4 we see the minimum distribution time min depends on the download bandwidth , and for leechers with = 2 . 6 Mbps the minimum distribution time min is minimal. Note that, the values of min are the same for = 1 . 6 Mbps and = 2 . 6 Mbps starting with the value ord = 5. The obtained expression is rather close to the similar result for a more realistic chunkbased model in the case of homogeneous system. The result of this paper has a very high accuracy. The given fractional error demonstrates this fact. It turns out, that the discussed fluid-based model is a rather good approximation to a real network. The results of the paper can be developed in different directions. One direction is to compare the minimum distribution time in the fluid-based model with the minimum distribution time in the chunk-based model for heterogeneous systems. Another direction is a determination of the minimum distribution time for networks which limit the number of simultaneous connections between participating peers.

E V Bobrikova

Peoples’ Friendship University of Russia (RUDN University)

Author for correspondence.
Email: bobrikova_ev@rudn.university
6 Miklukho-Maklaya St., Moscow, 117198, Russian Federation

Bobrikova E. V. - Candidate of Physical and Mathematical Sciences, senior lecturer of Department of Applied Probability and Informatics of Peoples’ Friendship University of Russia (RUDN University)

Yu V Gaidamaka

Peoples’ Friendship University of Russia (RUDN University)

Email: gaydamaka_yuv@rudn.university
6 Miklukho-Maklaya St., Moscow, 117198, Russian Federation

Gaidamaka Yu. V. - Candidate of Physical and Mathematical Sciences, associate professor of Department of Applied Probability and Informatics of Peoples’ Friendship University of Russia (RUDN University)

  • Yu. V. Gaidamaka, A. K. Samuilov, Analysis of Playback Continuity for Video Streaming in Peer-to-Peer Networks with Data Transfer Delays, T-Comm: Telecommunications and Transport (11) (2013) 77–81, in Russian.
  • Napster Company Info. URL http://us.napster.com/availability
  • The Gnutella Protocol Specification v0.4. URL https://gnunet.org/node/147
  • What is Freenet? Freenet Company Info. URL https://freenetproject.org
  • The BitTorrent Protocol Specification. URL http://www.bittorrent.org/beps/bep_0003
  • Vuse BitTorrent Client. URL http://www.vuze.com
  • X. Yang, G. de Veciana, Service Capacity of Peer to Peer Networks, in: Proceedings of IEEE INFOCOM, Vol. 4, 2004, pp. 2242–2252.
  • D. Qiu, R. Srikant, Modeling and Performance Analysis of BitTorrent-Like Peer-to- Peer Networks, in: Proceedings of ACM SIGCOMM, Vol. 34, 2004, pp. 367–378.
  • Z. Mordji, M. Amad, D. Aissani, A Derived Queueing Network Model for Structured P2P Architectures, in: VECoS 2014, 2014, pp. 76–84.
  • A. Ferragut, F. Paganini, Fluid Models of Population and Download Progress in P2P Networks, IEEE Trans. on Control of Network Systems 3 (1) (2016) 34–45.
  • Yu. V. Gaidamaka, E. V. Bobrikova, E. G. Medvedeva, The Application of Fluid Models to the Analysis of Peer-to-Peer Network, Bulletin of PFUR. Series: Mathematics. Informatics. Physics (4) (2016) 15–25, in Russian.
  • Yu. V. Gaidamaka, , E. G. Medvedeva, S. I. Salpagarov, E. V. Bobrikova, Analysis of Model for Multichannel Peer-to-peer TV Network with View-upload Decoupling Scheme, RUDN Journal of Mathematics, Information Sciences and Physics 25 (2) (2017) 123–132, in Russian. doi: 10.22363/2312-9735-2017-25-2-123-132.
  • R. Kumar, K. W. Ross, Optimal Peer-Assisted File Distribution: Single and Multi- Class Problems, 2006.
  • K. E. Samouylov, E. V. Bobrikova, A Simple Fluid Model of P2P File Sharing Network, T-Comm: Telecommunications and Transport (7) (2012) 180–184, in Russian.

Views

Abstract - 99

PDF (English) - 42


Copyright (c) 2018 Bobrikova E.V., Gaidamaka Y.V.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.