p2p storage networks - e-learning
Transcript
p2p storage networks - e-learning
Università degli Studi di Pisa Dipartimento di Informatica Lesson 1 P2P Systems: an introduction 24/9/2015 Laura Ricci Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 1 USEFUL INFORMATION The exam can be taken in the following degrees: Lauree Magistrali in Informatica, Business Informatics, Informatica e Networking Laurea triennale (26) (vecchio ordinamento) Prerequisites: Computer Networks Algorithm design Instructor: Laura Ricci (7 credits), Emanuele Carlini, ISTI, CNR (1 credit) Hours per week: 2 + 2 Credits: 6.0 Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 2 TENTATIVE PROGRAM (WITH SOME INNOVATIONS) This year we will try to introduce some innovation with respect to the program of the last years: ● ● main focus is still on P2P systems ● algorithms and data structures ● formal methods ● applications P2P systems are complex networks, millions of nodes and connections ● complex network analysis tools required for the analysis of P2P topologies. ● same tools can be applied also in other fields (social networks, big graphs,...) ● distributed algorithms for P2P systems exploits local information only. ● many similarities between P2P algorithms and vertex-centric algorithms for the analysis of big graph Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 3 TENTATIVE PROGRAM (WITH SOME INNOVATIONS) this year we will try to introduce some innovation with respect to the program of the last years: ● greater focus on pratical experience ● not a laboratory course, but some classes devoted to programming ● acquiring experience on a P2P simulator useful for the final project ● developing algorithms for P2P systems and for the analysis of big graphs Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 4 TENTATIVE PROGRAM P2P Systems ● ● ● ● ● ● unstructured overlays: Flooding, Random Walks structured overlays: Distributed Hash Tables (DHT) ● The Chord DHT: a Markov chain model of Chord routing ● Kademlia ● Applications: NoSQL Large Scale Distributed Storage The eMule KAD implementation of Kademlia Probabilistic Data Structures for P2P systems System issues: NAT transversal, heterogenity P2P Applications ● Bittorrent: a Content Distribution Network ● incentives, game based cooperation ● Bitcoin: Digital currency. Anonimicity in P2P networks Simulation Environments for P2P Systems: Peersim Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 5 TENTATIVE PROGRAM Distributed Algorithms for P2P systems and big data: P2P asynchronous model synchronous models and frameworks for Big graph computing vertix centric approaches epidemic algorithms, random sampling classical graph algorithms: k-core, connected components, triangle counting Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 6 TENTATIVE PROGRAM Models for P2P networks and Big Graphs: Random Graphs and Small Worlds Small world navigability: Watts Strogatz and Kleinberg Models Complex networks navigability: case studies the Darknet Freenet, a Small World Network Symphony Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 7 AND IN CASE THERE IS SOME TIME LEFT...... P2P audio/video Streaming streams diffusion overlays: tree, multi-tree, mesh Distributed virtual environment: massively multiplayer online games Distributed Online Social Networks Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 8 DIDACTIC MATERIAL Mandatory lesson slides (download this year slides) tutorial and material published on the course page Some lesson refer to material from these books Sasu Takoma, Overlay Networks, Toward Information Networking, 2010 Maarten Van Steen, Graph Theory and Complex Networks, 2010 Korzun, Dmitry, Gurtov, Andrei, Structured Peer-to-Peer Systems: Fundamentals of Hierarchical Organization, Routing, Scaling, and Security Other books: R. Steinmetz, K.Wehrle, Peer to Peer Systems and Applications, LNCS 3485, Springer Verlag, 2005 Peer to Peer Computing, Applications, Architecture, Protocols and Challanges, Yu-Kwong Ricky Kwok, CRC Press, 2012 Q.Hieu Vu, M.Lupu, B.Chin Ooi, Peer to Peer Computing, Principles and Applications, Springer Verlag, 2010 Buford, Yu, Lua, P2P Networking and Applications, Morgan Kaufmann, 2009 Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 9 EXAM RULES ● Mid term + Final term: if the student pass both mid and final term the oral proof, he/she is exempted from the oral examination tentative ideas: mid term: reading a paper and writing a brief relation final term: programming exercise useful for the final project choice: written examination or project project proposals from the students are accepted the project may be the first part of the final thesis oral examination only for those who have not passed the midterm and/or final term Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 10 TOOLS FOR THE PROJECT Peersim a simulator oriented to the P2P simulation highly scalable (until 10000 peers for the implementation of simple protocols) http://peersim.sourceforge.net/ some lesson to introduce this tool, full support from my team PeerfactSim.KOM a new proposal supports the high scale simulation on P2P systems http://peerfact.kom.e-technik.tu-darmstadt.de/ Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 11 PROJECT PROPOSALS Simulation of unstructured P2P networks distributed Hash Tables (DHT) self-organizing systems analysis of the P2P topologies implementation of vertex-centric algorithms for the analysis of big graphs Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 12 PEER TO PEER SYSTEMS Definition 1: A peer to peer system is a set of autonomous entities (peers) able to auto-organize and sharing a set of distributed resources in a computer network. The system exploits such resources to give a service in a complete or partial decentralized way Shared Resources: Read Only Information (Files) Read/Write storage space (Distributed File System) Computing power Bandwidth Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 13 PEER TO PEER SYSTEMS Definition 2: A P2P system is a distributed system defined by a set on nodes interconnected able to auto-organize and to build different topologies with the goal of sharing resources like CPU cycles, memory, bandwidth. The system is able to adapt to a continous churn of the nodes maintaining connectivity and reasonable performances without a centralized entity (like a server) Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 14 RESOURCE SHARING P2P: is relative to give and receive from a a community. Each peer shares a set of resources and obtains, in return, a set of resources the most common scenario: share musics (audio files) and obtain music in return (Napster, Gnutella,…) a peer behaves both like a client and like a server (symmetric functionality = SERVENT) but a peer can decide of offering for free a resource, for instance to participate to a project searching extra-terrestrial life cancer therapies research the shared resources are at “the border” of Internet, they are directly shared by the peers, there are no “special purpose nodes” for their management. Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 15 RESOURCE SHARING the peers' connection is transient: the connections and disconnections to the network are very frequent the resources offered by the peers are dynamically added and removed each peer is paired with a different IP address for each connection to the system a resource cannot be located by a static IP new addressing mechanisms have to be defined, at the application level, not at the IP level Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 16 P2P: FILE SHARING AND BEYOND... P2P file sharing file sharing:light weight/ best effort persistence and security are not the main goal anonymicity is important Examples: Napster Gnutella, KaZaa eMule BitTorrent P2P Content Publishing e Storage Systems persistency and security are main goals Examples: Freenet, initially Wuala then evolved towards cloud based solutions Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 17 P2P: FILE SHARING AND OTHERS... Bitcoin Digital Currency: a peer-to-peer version of the classical electronic payment the payments may be directly done between the users without a centralized financial entity no central entity which guarantees the electronic payment, lower costs one of my phd students is currently analysing the Bitcoin network Voice over P2P P2PSIP working group definition of a P2P protocol considering NATs minimum involvement of the centralized server exploits Distributed Hash Tables RELOAD protocol definition: Resource Location and Discovery VoP2P: Skype IPTV: video streaming applications PPlive, Ppstream, widely spread in Cina Joost, Soapcast Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 18 P2P DIFFUSION: MAIN STAGES 2003: Sandvine Study in Europe (France, Germany, ..) EDonkey/EMule in USA KaZaA/Fastrack 2005: BitTorrent the most popular P2P file sharing network Skype dominates VOIP KaZaA becomes less popular eDonkey replaced by eMule exploiting a similar protocol 2008: Spotify in Sweden Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 19 P2P DIFFUSION: MAIN STAGES 2009: Wuala P2P-based storage service, now cloud-based users can decide to share with the community the free space on the hard disks of the users replication security issues KaZaA is no more widely used PPLive: P2P-based Video Streaming Platform: diffusion in Asia Vuze (an Azureus evolution); P2P-based Video-on demand First version of Bitcoin Some challanges for the future: P2P social networks? integration cloud-P2P? P2P Massively Multiplayer Games? Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 20 FILE SHARING: A 'KILLER APPLICATION' File Sharing: it begins with the rapid success of Napster, at the end of nineties, about 10 years later than Wide Web First generation: Napster introduces a set of servers where the users register the descriptors of the files they are going to share only the content transmission (download/upload) exploits a P2P protocol, content search is centralized the presence of a centralized directory has been the 'Achille's heel' of this application its was convicted because it did not respect the law on the copyright it could have detected the content exchanged between the users through the analysis of the centralized directory Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 21 FILE SHARING: A 'KILLER APPLICATION' File sharing: second generation no centralization point both file search and content transfer are completely distributed Gnutella, FastTrack/Kazaa, BitTorrent Side effects of the diffusion of file sharing applications: change of the way users music is accessed by the from CD to online music iTunes Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 22 FILE SHARING: A 'KILLER APPLICATION' A P2P file sharing application A user U has a P2P client on its notebook the connection of the to Internet is intermittent : each time the user obtains a new IP address for each new connection the user stores the shared files in a directory and pairs each file with a set of keys able to identify it (for a song: title, author, publication date,...) U is interested in finding a song and sends a query to the system the client finds and shows to the user information about the other peers which own the required song U chooses a peer P among these (according some criteria, we will see in the following) the file is copied from the notebook of P to that of U while U is performing the download, other users can upload the parts of the file already downloaded by U and put in the shared directory Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 23 FILE SHARING: A 'KILLER APPLICATION' The P2P client behaves like a servlet and allows: the user to define a directory, in its file system, where the file it is going to share are stored. Each other peer of the P2P network may download files from this directory the user to download files from the shared directory a peer behaves like a web server a peer behaves like a client the user to find the content it is interested in, through queries submitted to the system this functionality is similar to that of Google Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 24 FILE SHARING: SOME PROBLEMS Several P2P clients can be freely downloaded from the network, but they may contain spyware or malware, the client software is free, but it often implies a violation of the privacy. Spyware = a software that collects information related to the online behaviour of a user (visited sites, online purchases, etc. ) without its authorization. sends information to an organization which exploits such information to send focused advertisements to the user Malware = a client that can damage the computer when it is executed. Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 25 FILE SHARING: SOME PROBLEMS Pollution (= Inquinamento) a large amount of corrupted material is injected in the P2P network. the peers cannot distinguish the good material from the corrupted one, so they download the corrupted material and contribute to its diffusion. the number of the corrupted copies may exceed that of those good pollution companies: Overpeer The free riders phenomenon: Peer exploiting the application to download the content shared by other peers, but they do not offer any content Solutions: Incentives mechanisms (eMule credits) Detecting and isolating free riders (Graph based Algorithms in Bittorrent) Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 26 VOICE OVER P2P IMP (Instant Messaging and Presence) applications Microsoft Messanger, Yahoo Messenger, Jabber, based on a server architecture. client Clients VoIP (Voice over IP) starts diffusion in the middle '90 desktop-to-desktop free calls low voice quality limited diffusion Skype: VoP2P application, voice over P2P: 2003 more than 500 milions of accounts and more than 50 milions of active users desktop to desktop calls and desktop to public telephon network calls good call quality integrates voice call with IMP characteristics Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 27 SKYPE: GENERAL CHARACTERISTICS The architecture is derived from that of Kazaa and is based on the presence of SuperPeers Based on a proprietary protocol with encripted traffic no documentation, some analysis based on sniffers The SuperPeers are responsible of: looking for the peers and their localization it is necessary to detect the single peer with whom the call has to be established (difference with file sharing) peer localization is a task of the SuperPeer they behave like rely-router for the peers which are behind a firewall and a NAT (60% of the user is behind a NAT) Supports VOIP sessions with a set of users algorithm for merging of different voice flows Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 28 P2PTV: VIDEO STREAMING ● The client server model has several problems with these applications because of the high computational requirements of these applications Multimedia audio/video files may be downloaded through file sharing clients: the audio/video file is before downloaded (at least some parts of it) and then displayed (like a play back) Video Streaming: real time transfer and display of the video stream Cooperation between the peer for then content distribution (content distribution networks) Bittorrent Buffering a set of video frames Commercial system: SoapCast, PPlive, TVAnts... Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 29 SPOTIFY: P2P LIVE MUSIC STREAMING A music live streaming serving exploiting an hybrid P2P client/server Advertised in 2008, initially in Sweden, then in other European countries In Usa in2011 …..and in Italy on 12 february 2013....(a coincidence??) Currently more than 20 milions of users, 5 milions with fee Similar solutions: french Deezer, Pandora from USA Spotify clients can be freely downlaoded, but the client requires the creation of a Spotify account Automatic client update Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 30 SPOTIFY: P2P LIVE MUSIC STREAMING Combines functionality of P2P and client/server Main goal for the development of a P2P protocol: increase the service scalability, decrease the server load exploit the available network bandwidth Mechanisms similar to those exploited for file sharing: unstructured network: it does not exploit the DHT peer detection: ideas from Gnutella and Bittorrent NAT transversal mechanism Streaming requires mechanisms for buffer management prefetching exploit markovian models for the prediciting the client throughput Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 31 P2P BANDWIDTH SHARING A server pubblishes new content (example: a new game release, a new release of an operating system, a song...) Sender Router Unicast Router Router Router Receiver Receiver Receiver Receiver Receiver client server model: a single centralized server is a bottleneck Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 32 P2P CONTENT DISTRIBUTION P2P approaches To obtain a better balance of the use of the communication bandwidth by exploiting the less used communication channels Sender Peer-to-Peer Content Distribution the initial file request are served by a Router centralized server further requests are sent to the peers which have already received and replicated Receiver/ Sender Receiver/ Sender Router Router the files Receiver/ Sender Receiver/ Sender Receiver/ Sender Receiver/ Sender Receiver/ 12 Sender 10 Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 33 P2P CONTENT DISTRIBUTION A combined use of the P2P and of the client-server model enables the optimization of the accesses to the server Segmented approach (example:BitTorrent) Doc Doc Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 34 P2P CONTENT DISTRIBUTION ● ● A combined use of the P2P and of the client-server model enables the optimization of the accesses to the server Segmented approach (example:BitTorrent) Part Part 33 Doc Doc Part Part 3 Part 3 Part 22Part Part 33 Part Part 22 Part Part 11 Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 35 P2P CONTENT DISTRIBUTION ● ● A combined use of the P2P and of the client-server model enables the optimization of the accesses to the server Segmented approach (example:BitTorrent) Doc Doc Part Part 11 Part Part 33 Part Part 11 Part Part 22 Part Part 3 Part 3 Part 22Part Part 33 Part Part 11 Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 36 P2P CONTENT DISTRIBUTION ● ● A combined use of the P2P and of the client-server model enables the optimization of the accesses to the server Segmented approach (example:BitTorrent) Part Part 11Part Part 22Part Part 33 Doc Doc Part Part 3 Part 3 Part 22Part Part 33 Part Part 1Part 1Part 22 Part Part 11 Part Part 22 Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 37 P2P CONTENT DISTRIBUTION ● ● A combined use of the P2P and of the client-server model enables the optimization of the accesses to the server Segmented approach (example:BitTorrent) Part Part 11Part Part 22Part Part 33 Doc Doc Part Part 3 Part 3 Part 22Part Part 33 Part Part 1Part 1Part 2Part 2Part 33 Part Part 11 Part Part 2Part 2Part 33 Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 38 P2P CONTENT DISTRIBUTION ● ● A combined use of the P2P and of the client-server model enables the optimization of the accesses to the server Segmented approach (example:BitTorrent) Part Part 11Part Part 22Part Part 33 Doc Doc Doc Doc Part Part 3 Part 3 Part 22Part Part 33 Part Part 1Part 1Part 2Part 2Part 33 Part Part 11 Part Part 2Part 2Part 33 Dipartimento di Informatica Università degli Studi di Pisa Doc Doc Doc Doc Introduction Laura Ricci 39 P2P STORAGE NETWORKS A P2P Storage Network is a computer cluster which are connected in the network and exploits all the memory shared by the peers for the definition of a distributed storage system Examples: DHT, often store read a read only index Real distributed file systems: PAST, Freenet, OceanStore, Wuala Organization: peer-identifiers association through a hash function each peer offers a part of its storege space or pays an amount of money a maximum volume of data are assigned to each peer, according to the contribution of the peer to the storage network. document-identifier assignment computed through a hash function computed on the name of the file or on a part of its content The storage and search of files in the network is guided by the identifiers paired with the hash function with peers and files. Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 40 P2P STORAGE NETWORKS Storage Network Definition ID 4 Hash ID 1 ID ID ID Dipartimento di Informatica Università degli Studi di Pisa 17 ID 25 3 ID Introduction 8 10 Laura Ricci 41 P2P STORAGE NETWORKS Storage Network Definition ID Hash ID 4 Hello ??? 1 ID ID 17 ID 25 8 Hello ??? ID Dipartimento di Informatica Università degli Studi di Pisa 3 ID Introduction 10 Laura Ricci 42 P2P STORAGE NETWORKS Storage Network Definition ID 4 ID Hash Hello ??? ID 1 ID ID 25 ID ID 4 17 ID 25 8 3 Hello ??? ID Dipartimento di Informatica Università degli Studi di Pisa 3 ID Introduction 10 Laura Ricci 43 P2P STORAGE NETWORKS Storage Network Definition neighbors 3 4 ID 4 ID 25 Hash Hello ??? ID 1 ID ID 25 ID ID 4 17 ID 25 8 3 Hello ??? ID Dipartimento di Informatica Università degli Studi di Pisa 3 ID Introduction 10 Laura Ricci 44 P2P STORAGE NETWORKS Document Storage 1 17 25 3 4 25 ID ID 1 4 1 4 10 ID 3 4 8 ID 17 ID 25 Dipartimento di Informatica Università degli Studi di Pisa 8 3 8 25 1 10 17 ID 10 17 3 ID Introduction 10 Laura Ricci 45 P2P STORAGE NETWORKS Document Storage 1 17 25 3 4 25 ID ID 1 1 4 10 Hash ID 4 11 ID 3 4 8 ID 17 ID 25 Dipartimento di Informatica Università degli Studi di Pisa 8 3 8 25 1 10 17 ID 10 17 3 ID Introduction 10 Laura Ricci 46 P2P STORAGE NETWORKS Document Storage 1 17 25 3 4 25 ID ID 1 1 4 10 Hash ID 4 11 ID 3 4 8 ID 17 ID 25 Dipartimento di Informatica Università degli Studi di Pisa 8 3 8 25 1 10 17 ID 10 17 3 ID Introduction 10 Laura Ricci 47 P2P STORAGE NETWORKS Document Storage ID 1 17 25 11 3 4 25 ID ID 1 1 4 10 Hash ID 4 11 ID 3 4 8 ID 17 ID 25 Dipartimento di Informatica Università degli Studi di Pisa 8 3 8 25 1 10 17 ID 10 17 3 ID Introduction 10 Laura Ricci 48 P2P STORAGE NETWORKS Document Storage ID 1 17 25 11 3 4 25 ID ID 1 1 4 10 Hash ID 4 11 ID 3 4 8 ID 17 ID 25 Dipartimento di Informatica Università degli Studi di Pisa 8 3 8 25 1 10 17 ID 10 17 3 ID Introduction 10 Laura Ricci 49 P2P STORAGE NETWORKS Document Storage ID 1 17 25 11 3 4 25 ID ID 1 ID 1 4 10 Hash ID 4 11 ID 11 3 4 8 ID 17 ID 25 Dipartimento di Informatica Università degli Studi di Pisa 8 3 8 25 1 10 17 ID 10 17 3 ID Introduction 10 Laura Ricci 50 P2P STORAGE NETWORKS Document Storage ID 1 17 25 11 3 4 25 ID ID 1 ID 1 4 10 Hash ID 4 11 ID 11 3 4 8 ID 17 ID 25 Dipartimento di Informatica Università degli Studi di Pisa 8 3 8 25 1 10 17 ID 10 17 3 ID Introduction 10 Laura Ricci 51 P2P STORAGE NETWORKS Document Storage ID 1 17 25 11 3 4 25 ID ID 1 ID 1 4 10 Hash ID 4 11 ID 11 3 4 8 ID 17 ID 25 Dipartimento di Informatica Università degli Studi di Pisa 8 3 8 25 1 10 17 ID 10 17 3 ID Introduction 10 Laura Ricci 52 P2P STORAGE NETWORKS Document Storage ID 1 17 25 11 3 4 25 ID ID 1 ID 1 4 10 Hash ID 4 11 ID 11 3 4 8 ID Dipartimento di Informatica Università degli Studi di Pisa 11 10 17 17 ID 25 8 3 8 25 1 10 17 ID ID 3 ID Introduction 10 Laura Ricci 53 P2P STORAGE NETWORKS Document Storage ID 1 17 25 11 3 4 25 ID ID 1 ID 1 4 10 Hash ID 4 11 ID 11 3 4 8 ID Dipartimento di Informatica Università degli Studi di Pisa 11 10 17 17 ID 25 8 3 8 25 1 10 17 ID ID 3 ID Introduction 10 Laura Ricci 54 P2P STORAGE NETWORKS Document Storage ID 1 17 25 11 3 4 25 ID ID 1 ID 1 4 10 Hash ID 4 11 ID 11 3 4 8 ID Dipartimento di Informatica Università degli Studi di Pisa 11 10 17 17 ID 25 3 8 25 1 10 17 ID ID 3 ID Introduction 8 ID 11 10 Laura Ricci 55 P2P STORAGE NETWORKS Document Storage ID 1 17 25 3 4 25 11 ID ID 1 ID 4 1 4 10 11 ID 3 4 8 ID 17 ID 25 3 8 25 1 10 17 ID Dipartimento di Informatica Università degli Studi di Pisa 10 17 3 ID Introduction 8 requestor: 1 ID 11 10 Laura Ricci 56 P2P STORAGE NETWORKS A P2P Storage System exploits all the storage offered by the peers PAST, Freenet, OceanStore, Freenet: mechanism for anonimisation Substitution of the sender of a message Encrypting Techniques Distributed Hash Tables techniques: Each peer offers a part of its storege or pays an amount of money According to its contribute, a maximum volume of data is assigned to each peer Hash functions to assign identifiers to peers identifiers to documents computed by exploiting the file name or part of its content The storage and search of the files is guided by the identifiers computed by the hash function. Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 57 P2P OVERLAY A P2P protocol defines the set of messages that the peers exchange, their format and their semantics A common characteristics of all the P2P protocols: identification of the peer trough a unique identifier, which is generally computed through a hash function the protocols are defined at application level (stack TCP/IP) and define a routing strategy Overlay Network: logic network connecting peers and defined at application level The overlay defines a set of logic links between the peers, which does not correspond to phisical connections Each logical link corresponds to: a set of physical links transversal of a set of routers Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 58 P2P OVERLAY CLASSIFICATION Unstructured P2P systems (Gnutella, Kazaa,…) A new peer randomly connects to a set of peers which are active in the P2P network The resulting logical network (overlay network) is unstructured Look-up algorithms: centralized servers (Napster). flooding (Gnutella), Look up cost= linear with N, where N is the number of nodes in the network Problem: scalability. Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 59 UNSTRUCTURED OVERLAY Peers directly interacts without the presence of a centralized server Interaction paradigm based on a decentralized cooperation, rather than a centralized coordinator. C C C P P S P P P C C C Client/Server Dipartimento di Informatica Università degli Studi di Pisa P P P P Peer to Peer Introduction Laura Ricci 60 HYBRID UNSTRUCTURED NETWORKS To increase the system performance, hybrid solutions have been defined Dynamic definition oa a set of Superpeers indexing the resources shared by the peers Resources are directly exchanged between the peers. P P P SP P SP P SP P P P Hybrid Peer to Peer Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 61 STRUCTURED P2P OVERLAY The choice of the neighbours is defined according to a given criteria The resulting overlay network is structured Goal: guaranteeing the la scalability the network structure guarantees that the look up of an information has a given complexity (for instance O(log N)). complexity guarantees also for peer joins and peer leave distributed hash tables approaches Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 62 OVERLAY CLASSIFICATION When considering P2P overlay for file sharing, we can distinguish different 'generation' of P2P system, while this characterization is not yet feasible for other classes of applications The different generation refer to the way the overlay is managed and the way data are indexed. Systems with centralization points A centralized server coordinating the peers Unstructured overlay Pure Peer to peer, no server Seti @home, Napster, Bittorrent (first generation) Gnutella, Kazaa, Skype (but authentication is centralized) Structured Overlay DHT- based CAN, Chord, Pastry, Emule (Kademlia),... Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 63 SYSTEM WITH CENTRALIZATION: SETI@home SETI@Home: Search for Extraterrestrial Intelligence A research project (UC California, Berkeley) supported also from some industries. Starts at the end of 90-ies. Analysis of the radio emissions received from the space, collected by the Arecibo telescope, located at PortoRico Goal: to build a ‘supercomputer’ by aggregating the computational power offered by the computers connected to the Internet, by exploiting the inactive cycles of the host The cost is reduced by a factor of 100 with respect to the usage of a supercomputers 200 millions of dollars to buy a computer of 30 TeraFlops Cost of the project: less than 1 million dollars Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 64 VOLUNTEER DISTRIBUTED COMPUTING A centralized Data Server and a set of clients BOINC: Berkeley Open Infrastructure for Network Computing Screen Saver Program developed for different platforms A single data server distributes the data Problem: reduced scalability and reliability During some period, the clients are able to connect to the server only during the night Exploits proxy servers: the proxy connects to the central server during the night and distributes data during the day High ratio computation/communication Each client connects for a while to receive the data, perform the computation (high computational request) and then sends the results No communications between the peers Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 65 BOINC VOLUNTEER DISTRIBUTED COMPUTING 1) your PC gets a set of tasks from the project's scheduling server. The tasks depend on your PC: the server does not give it tasks that requires more RAM than you have. Projects can support several applications, and the server may send you tasks from any of them. 2) your PC downloads executable and input files from the project's data server. If the project releases new versions of its applications, the executable files are downloaded automatically to your PC. 3) your PC runs the application programs, producing output files. 4) your PC uploads the output files to the data server. 5) later (up to several days later, depending on your preferences) your PC reports the completed tasks to the scheduling server, and gets new tasks. Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 66 BOINC: VOLUNTEER DISTRIBUTED COMPUTING Several interesting P2P characteristics, for instance the incentive syystem Credit System: Credit = Cobblestone measures the amount of computed tasks the users competes to collect credits credit check: each task is sent to a set of computers when a peer returns a result, a credit is given to the peer. the server compares a set of results and, if they are comparable, the smallest credit is given to all computers Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 67 VOLUNTEER DISTRIBUTED COMPUTING A long list of projects....http://boinc.berkeley.edu/projects.php Prime number search Cognitive Process simulation Quantum computing simulation Modelling of malaria transmissions ….. Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 68 SYSTEM WITH CENTRALIZATION: BITTORRENT Conceived for the download of large amount of data Currently it is the P2P protocol generating the maximum amount of Internet traffic (according to some statistics >60%) Focus on the parallel content download, rather than of sophisticated strategies for content search. Files are divided in chunks for download Do ut des: definition of di strategies for collaboration incentive Different strategies to choose pieces of files to download The set of peers which cooperate to the file download (swarm) is coordinated by a server (tracker) Tracker = centralization point Last versions (tracker-less) exploit a DHT Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 69 UNSTRUCTURED FLAT OVERLAY: GNUTELLA Pure P2P (or flat P2P): all the nodes have the same role and are interconnected through an unstructured overlay Goal: the file look up is completely decentralized Each node Share some files Receives the query submitted from the user and forward them to the neighbours Forward the query received from the neighbours Replies to the query, by checking if the shared files match the query Periodically explore the network to acquire knowledge of new peers Mechanism for propagating the queries = flooding Main problems: low scalability due to the high number of messages exchanged on the overlay Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 70 UNSTRUCTURED HIERARCHICAL OVERLAY: KAZAA one of the more popular file sharing systems An innovative characteristics: a hierarchical structure based in superpeers Nodes with different characteristics memory CPU connectivity uptime reachability Peer heterogeneity is explooited to assign them different roles Stable nodes characterized by good computational (computational power, connectivity) become super peer resources Simple Peer: 'unstable peer', natted peer, poor connectivity Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 71 STRUCTURED OVERLAY Distributed Hash Tables: HASH functions to assign logical identifiers to peers and data The same identifier space for peer and data Requires the definition of a mapping function data-peer Key based search: Different proposals developed in the academic area Routing algorithm definition: each routing hop on the overlay is guided by the key which identifies the data which is looked for CHORD, CAN, PASTRY,... Some commercial file sharing exploits the DHT Kademlia Emule KAD network Trackerless Bittorrent Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 72 P2P: ADVANTAGES Advantages; to exploit computational resources 'in excess' (idel CPU cycles unused storage space, unused band width,...) to have in return risources/services/social networks participation,.. Advantages for the community self scaling property: the participation of a larger number of users/host naturally increases the system resources and its capacity to serve a larger number of requests Advantages for the seller: decrease of the cost to set up a new application client/server: requires the use of a server farm characterized by high connectivity so that the request of millions of users can be satisfied fault tolerance: the server farm must be replicated in different locations the server must be managed so to offer a service 24*7 Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 73 P2P: THE FUTURE The success of a P2P application greatly depends from the creation of a “critical mass” of users critical mass: a level of participation to the P2P network which enables the application to self sustain In the first P2P systems the novelty of the application has been fundamental for reaching a critical mass File sharing: free content, wide content selection VoP2P: low cost for international calls, good audio quality,... P2PTV: access to a wide set of programs, possibility of accessing the content from any location The success of new P2P applications will greatly depend besides a good engineering of the application from the new application appealing from the definition of new 'business model' Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 74 P2P: THE FUTURE Exploiting classical P2P techniques in different contexts: data-centers with thousand of machines exploit techniques taken from the P2P world. Amazon Dynamo: distributed storage system over hosts belonging to the data center highly scalable simple: key/value put/get guarantees service level agreement exploits Distributed Hash Tables trade-off availabilty/strong consistency Facebook Cassandra Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 75 SCIENTIFIC CHALLANGES The development of P2P applications is a challange and requires the solution of several novel problems The classic metodologies for the development of distributed systems of the “old generation” cannot be exploited: the order of magnitude of a P2P system is different with respect to that of classical systems (milion of nodes with respect to hundreds of nodes) classical algorithms/techniques classiche “do not scale” on network of this size The failure of one of the nodes is not a rare event, rather a normal event Systems with dimension and with this dynamicity level require new tools probabilistic algorithms statistical analysis of complex networks game theory: strategies for the peer cooperation development of new computational models Dipartimento di Informatica Università degli Studi di Pisa Introduction Laura Ricci 76
Documenti analoghi
A Survey and Comparison of Peer-to
to the set of peers into a large space of identifiers. Data objects
are assigned unique identifiers called keys, chosen from the
same identifier space. Keys are mapped by the overlay network
protoc...