Analysis of the File Distribution Time in Peer-to-Peer Network

Peer-to-peer (P2P) file sharing systems are responsible for a significant part of the Internet traffic today. File sharing is perhaps the most popular application among P2P applications. In comparison with traditional Client/Server file distribution, P2P file sharing has some advantages, namely, scalability, bandwidth and others. In this paper we study the minimum distribution time for getting the entire file by all of the users in the system, who need this file. This parameter is closely associated with the mentioned bandwidth. The expression for the minimum distribution time uses fluid-flow arguments and includes such terms as the file size, the upload rates of the seeds and the upload and download rates of the leechers. Using numerical examples and the expression for the minimum distribution time, we show the efficiency of P2P file sharing. We consider the system behaviour, when there are two types of leechers in the system. These types differ from each other by their upload bandwidths.


Introduction
The classical methods of resource distribution are based on the paradigm Client/Server.Here a set of servers distributes a file to receiving users.This file could be a software, a content such as a movie or a TV show, etc.The servers and the servers' bandwidth can be bottlenecks in the process of the file distribution, when the file size and the number of receiving nodes become large.
Peer-to-peer (P2P) file sharing is an alternative to the classical Client/Server file distribution and allows to amplify the uploading capacity of the receiving users to aid in the process of the file distribution [1].We should notice, that a node, participating in P2P file sharing, is usually called a peer.In particular, as soon as a peer has got any portion of the file, it can redistribute that portion to any of the other receiving peers.There are lots of examples of P2P networks today, for instance: Napster [2], Gnutella [3], Freenet [4], etc. P2P file sharing systems, transferring files via the BitTorrent protocol [5], for instance, Vuse [6], etc., are widespread.The intrinsic scalability of these protocols enables to distribute the files of large sizes to several thousand participants.Here the usage of high bandwidth at the distribution servers is not demanded.A user with an ordinary PC with a regular connection can apply such a way to distribute large files to an audience, which size is significantly higher than what is possible with the classical Client/Server approach.
Obviously, P2P file sharing systems have become well known in the Internet today.But there are a lot of different questions, which demand answers.It is necessary to explore: how good quantitatively P2P is in the file sharing process.Can P2P be essentially better than Client/Server distribution?Can P2P scale well when the number of receiving peers increases and becomes very large?How does the cooperation of the server upload bandwidth, the upload bandwidth of a receiving peer, and the download bandwidth of a receiving peer influence the total distribution time?
In this paper, we consider fundamental questions of P2P file sharing, which are in a basis of P2P file sharing.We propose an expression for the minimum achievable file distribution time.In the expression we use fluid arguments.In the expression we also use terms of the basic parameters of a P2P file sharing system, precisely, the file size, the number of servers, the number of receiving peers, and the upload and download bandwidths of all the peers, who participate.The expression takes place for arbitrary and heterogeneous upload and download bandwidths.The expression has a closed and simple form.The problem statement was given in [13].The mentioned result allows to address many of the questions posed above.
The rest of the paper is organized as follows.In Section 2 we describe the main problem and demonstrate the main result of the paper.We discuss the applicability of the result.In Section 3 we present numerical results for the case of the receiving peers with different upload bandwidths.In Section 4 there is a conclusion.

Main Problem Description
The fundamental problem in P2P file sharing is how to distribute a file to peers in P2P network.In P2P a file is divided into parts or portions to distribute it.We consider two sets of peers: seeds and leechers.Any seed has a whole copy of the file and stays in the system to allow other peers to download from itself.Any leecher needs a copy of the file.At first leechers have no portions of the file and have to download portions from the seeds.But as soon as a leecher gets a portion, it begins to upload the portion to other leechers.So leechers can get portions of the file from any of the seeds and from other leechers that have portions.A leecher is allowed to leave the system after getting the whole file.The problem is to minimize the distribution time, i.e. the time, which is needed to obtain the file by all of the leechers.
We consider the following parameters:  -a set of all peers in P2P network,  = ||a number of all peers;  -a set of seeds,  = || -a number of seeds; ℒ -a set of leechers,  = |ℒ| -a number of leechers, so we have  =  ∪ ℒ and  =  + ;  -the size of a file;   -a download bandwidth of a leecher ;   -an upload bandwidth of a peer .
A peer  can transfer bits at a maximum rate of   and can download bits at a maximum rate of   .We take into account the condition     , that is usual in the Internet today, but we can also consider arbitrary upload and download rates.Furthermore,   () -the rate, at which a leecher  downloads "new" content from seeds and other leechers combined at time , so  = {  (),  0,  ∈ ℒ} -a rate profile;  -a distribution time for a rate profile ;  min = min  -minimum distribution time achievable over all possible rate profiles.
The rate profile {  (),  0,  ∈ ℒ} can achieve  min for arbitrary values of  ,   ,  ∈ , and   ,  ∈ ℒ.So we aim to determine the corresponding minimum distribution time  min .
In fact the model, which is considered in the paper, is a fluid model [14].Particularly, we imply that a leecher can replicate and forward a bit as soon as it receives the bit.This key assumption allows us to derive remarkably explicit expressions for the minimum distribution time for general, heterogeneous models.This assumption is core and important.Due to it clear expressions for the minimum distribution time for general, heterogeneous models are obtained.
As we mentioned, BitTorrent is a very popular protocol for real P2P file sharing systems.The idea of BitTorrent is to divide the file to be distributed into parts, named chunks.The size of the chunks is typically 256 KB [8].In BitTorrent a peer can only forward a chunk as soon as it has fully got the chunk.So chunk-based models are more realistic than fluid-based models.For the chunk-based model, closed form expressions for the minimum distribution time are only available for very simple cases of peers, who have homogeneous bandwidths and infinite download bandwidths, and for the case of heterogeneous systems it is difficult to obtain closed-form expressions for the minimum distribution time.The explicit expressions for the minimum distribution time  min , demonstrated in the paper and obtained for the fluid-based model, are lower bounds for more realistic chunk-based models.We can compare the minimum distribution time  min ( ) for a fluid-based model and the minimum distribution time  min () for a chunk-based model.It is possible to show that for homogeneous systems, the error is about where  is the number of chunks in the file.It turns out this error is negligibly small for cases of practical interest, even when file sizes are medium.Thus although the chunk-based model is more realistic than our fluid-based model, the last one leads to a clear expression for the minimum distribution time for heterogeneous systems.This expression is a good approximation of the minimum distribution time for the chunk-based model for homogeneous systems.We imply, that the bandwidth bottlenecks are in upload and download rates and not in the basis of Internet.This postulate dominate in Internet today.Further, we do not take into account the impacts of network congestion and TCP congestion control.Thus, our expressions for the minimum distribution time  min are really approximations for real file distribution times.However, the expressions can be used to make useful, relevant calculations.Further that can be a base for a file-distribution protocol for arbitrary upload and download rates.
We also make an assumption that each peer in the system takes participation in the file distribution up until the peer has gained the whole file.This enables to understand the main aspect: how different system parameters affect the P2P file distribution process.
Further, the following parameters are also considered: for any subset  ⊆ ,  min =  min (ℒ).

Minimum File Distribution Time
In this section we demonstrate the result for the minimum distribution time  min of a file [13].We consider all the leechers are equal, but their download and upload bandwidth can differ.Thus, P2P system is heterogeneous.
Theorem 1.The minimum distribution time for the general heterogeneous P2P file sharing system is The expression (2) has a rather simple and explicit form.The theorem gives some distribution scheme for any upload and download parameters and can be used as a reference point for the distribution time for any P2P file distribution protocol.
We can notice, that we actually choose  min from three values.Each value has its own sense.We have  min / min , because the leecher with the lowest download rate cannot receive the file faster than / min .We also have  min /(), because the set of seeds cannot distribute fresh bits at a rate faster than (), a leecher cannot obtain the file at a rate faster than ().Finally, we have  min  /(), because the total upload bandwidth of the system is () and because leechers need to obtain a total of  bits.So we have the lower bound for P2P file distribution:  min max{/ min , /(), /()}.The theorem finds that the right-hand side of this inequality is not only a lower bound, but also the exact value of the minimum distribution time  min .
As we can see, four cases are possible here (the proof of the theorem is based on these cases): Let   () be the rate at which the seeds send bits to leecher  at time .In each case, a rate profile has the same general structure, based on the following assumptions.As soon as each leecher  begins to obtain its bits from the seeds, it replicates the obtained bits to each of the other  − 1 leechers at some rate less than or equal   () as shown in Figure 1.Thus, for each case, the distribution scheme consists of  application-level multicast trees.Each tree has a root in the seed, passes through one of the leechers, and ends at each of the  − 1 other leechers.Now we consider three examples to show the significance and the usefulness of the presented theorem.In each example, it is necessary to distribute a file of size  = 1.25 GB and to calculate  min according to the expression (2).There are one seed ( = 1) and ten leechers ( = 10) in P2P network.The seed's upload bandwidth   is usually higher than leecher's upload bandwidth.We distinguish two types of leechers: ordinary leechers and super leechers.The number of ordinary leechers is denoted as  ord and the number of super leechers is denoted as   .A super leecher has the same upload bandwidth   as a seed, and an ordinary leecher's upload bandwidth   ord is lower.In Figure 2 we present the dynamics of  min at three values of a seed's upload bandwidth:   = 1000 Kbps,   = 1500 Kbps,   = 2000 Kbps.
We can see, the higher   the less time is necessary to distribute the file.
Example 2.Here again the download bandwidth of an each leecher is  = 2000 Kbps.A seed's upload bandwidth is a constant   = 1500 Kbps.Now we present in Figure 3 the same function, but now we change leecher's upload bandwidth, we have   ord = 200 Kbps,   ord = 600 Kbps,   ord = 1000 Kbps.
Here we can observe similar effect: the higher   ord the less time is necessary to distribute the file.Here  min = 4.63 hours is a constant at low download bandwidth  = 600 Kbps.In other cases of  first the values of  min vary, then from the number of ordinary leechers 5 they coincide and increase.
The presented examples allow to conclude that our calculations for fluid-based model is rather good for a description of a real P2P file distribution process in the Internet.
The study of the implications of theorem are made in [13].In (1) we present the fractional error between a chunk-based model and a fluid-based model.This error is obtained for the minimum distribution time  min for a homogeneous system with peers having infinite download capacity.It turns out, that for file size of 350 MB or more, the percentage error between  min ( ) and  min () is less than 1%, even when there are 10,000 leechers in the network.
We can conclude from (1) that, for homogeneous systems, the error can be surely neglected if  ≫ log 2 .This condition is also easily satisfied for typical file sizes of the order of several GB.We suppose this is true for heterogeneous systems as well.

Conclusion
Today P2P file sharing is an important application in the Internet.The determination of a minimum achievable time for distribution of a file to all leechers is a fundamental problem in P2P file sharing network.Distribution of a file to all leechers means, that all equal small pieces of the file, named chunks, are obtained by all leechers.The mentioned fundamental problem becomes a complex optimal scheduling problem, when we consider a model in which discrete chunks are kept and forwarded at the peers.
In this paper we consider a version of a fluid-based model for the problem of the determination of the minimum achievable distribution time.We get an explicit expression for this fundamental problem, using this fluid-based model.The result is simple, useful.On the basis of the expression we construct three numerical examples, demonstrating the behaviour of the minimum distribution time depending on the number of super leechers with the high upload bandwidth and ordinary leechers with the ordinary upload bandwidth.
As it was expected, with the growing number of ordinary leechers, the total bandwidth available on the network for distribution is reduced, which leads to the increase of the value of the minimum distribution time  min .As we can see from the plot in Figure 2, this time depends on the upload bandwidth   , and for super leechers with   = 2 Mbps the minimum distribution time  min is minimal.
From the plot in Figure 3 we can see the minimum distribution time  min also depends on the upload bandwidth   ord , and for ordinary leechers with   ord = 1 Mbps the minimum distribution time  min is minimal.Finally, from the plot in Figure 4 we see the minimum distribution time  min depends on the download bandwidth , and for leechers with  = 2.6 Mbps the minimum distribution time  min is minimal.Note that, the values of  min are the same for  = 1.6 Mbps and  = 2.6 Mbps starting with the value  ord = 5.
The obtained expression is rather close to the similar result for a more realistic chunkbased model in the case of homogeneous system.The result of this paper has a very high accuracy.The given fractional error demonstrates this fact.It turns out, that the discussed fluid-based model is a rather good approximation to a real network.

Figure 1 .
Figure 1.The scheme of P2P file distribution

Example 1 .
Each leecher has download bandwidth  = 2000 Kbps.Each ordinary leecher has upload bandwidth   ord = 200 Kbps.If we change the number of the ordinary leechers  ord from 1 to 10, the value of the minimum distribution time  min increases.