Andy Oram delivered a shortened version of this speech, "Research Possibilities in Peer-to-Peer Networking," to the Virtual Internet2 Member Meeting on Thursday, October 4, 2001. Internet2 is a consortium of over 180 universities working, with industry and government support, to develop new infrastructure and applications for the Internet.
Internet2 features the use of high-bandwidth media such as the videoconference through which this meeting was held. Several members organized a P2P session to build interest in the potential efficiency and new applications offered by P2P. Many in the academic community, however, still associate the term P2P with file sharing and the problems universities have had with bandwidth exhaustion and legal copyright challenges.
Peer-to-peer is a venerable and far-reaching concept that has received a new impetus and a striking visibility in the past year. I'm thrilled to present some of my observations before an audience of academic administrators and professionals, because when peer-to-peer first burst upon the attention of the greater public, universities played a crucial role: You tried to stop it.
Yes, when Napster started reaching huge audiences, universities got alarmed at the increase in network bandwidth usage and many blocked Napster from their networks. This says a lot about university administrators, and I'll return to the issue after I explain some network architecture issues.
![]() |
|
The O'Reilly Peer-to-Peer and Web Services Conference will gather leaders forging P2P as a technology and a business opportunity. |
Delivering a speech on any topic, especially one with social implications, is very hard at this time when the precariousness of modern life and the balance of world power is uppermost on our minds. I have little to offer that will help you set aside or assimilate thoughts of the attacks of September 11. I'll simply make one observation related to peer-to-peer: With the overhead that will have to be expended on greater security and on fighting terrorism, researchers and ordinary businesses are going to have to get along on less than we've had for quite some time. We're going to have to learn to do more with fewer resources. I hope peer-to-peer can help us do this.
Academic environments are ideal for experimenting with peer-to-peer and benefiting from peer-to-peer. You have an open attitude toward information, well-educated staff who can adapt to new tools, a variety of projects that require information exchange, and a willingness to expend time and effort in order to save money.
The Internet2 project, in particular, overcomes many barriers that are holding back the deployment of peer-to-peer products in current corporate environments. Internet2 is a good test bed for basic research that can benefit peer-to-peer.
When people ask me whether peer-to-peer is really anything new and whether the term has any value, I say, "Sure it's new, and sure the term has value, because these systems have created all kinds of new problems." Perhaps these problems will be solved by Sun Microsystems or by the Intel Peer-to-Peer Working Group. But perhaps they'll be solved by Internet2 researchers: I can only ask you to try.
Clearly, I want some of you to experiment with developing and using peer-to-peer systems. But I know that universities have already faced legal problems with one slice of the peer-to-peer world, file sharing. Further legal challenges are likely to emerge.
Peer-to-peer advances the key premise that new value comes from sharing information and building on it. Naturally it comes up against copyright issues, a problem that I don't trivialize because, after all, I work for a publisher.
Most changes that affect businesses are social as well as technological. Businesses that try to hold back technological change find themselves at odds with society as a whole, as is proven by the various digital copyright battles going on now.
Academic environments are ideal for experimenting with peer-to-peer and benefiting from peer-to-peer. |
These companies are stuck in regressive defense mechanisms because of sheer panic. This is the same risk our whole country now faces in its reaction to terrorism and the lack of creative adaptivity will bring the copyright holders low in the end. Recently they tipped their hand, introducing a bill into Congress that would utterly halt normal technological evolution and try to freeze current social relations in hardware. (Governments have tried to do things like this before.) But in contradiction to their fears, social change is usually slow enough that there is plenty of time for an entrepreneurial business to adapt. Physical music records will continue to be in demand for decades to come; the same goes for physical books and other media.
Of course, this doesn't make it OK to throw cease-and-desist orders in the trash can. You certainly have to understand your legal responsibilities. If students or staff are using your computers to share material that's copyrighted by someone else, you have legal liability. The much-maligned Digital Millennium Copyright Act (DMCA) actually makes your life easier on this specific issue because it provides a procedure that your system administrators can go through to protect you. But few universities understand the procedure. So learn more about the law.
Content producers may ask you to go beyond what you are legally required to do. They may approach you with various studio-friendly initiatives and tell you that you have a moral obligation to help them restrict digital distribution. Your answer to them should be, "No." Say to them, "Your industry is going through a historic upheaval, and it is up to you to figure out how you're going to deal with it. It's not our job." However, as university administrators, you do have a responsibility to protect the privacy of your students, faculty, and staff. You have an obligation to protect their freedom of speech to the extent allowed by law.
So you should watch out for copyright holders snooping around your networks or trying to suppress activities that are legal. Don't think I'm arbitrarily speculating. The Recording Industry Association of America has already declared that it might try to crack into computer systems to prevent the transmission of unauthorized music files, and its representatives were level-headed enough during discussions of the Anti-Terrorism Act to offer an amendment that specifically gives copyholders this right to be intruders.
|
Let's return to the happier subject of research topics: what peer-to-peer can do for you, and what you can do for peer-to-peer.
![]() Peer-to-Peer: Harnessing the Power of Disruptive Technologies |
|||||
Research into distributed applications and infrastructure has a very wide application. Centralized systems are evolving toward decentralization as they grow larger and scale upward. A well-known example is how the hosts file on the Internet became the Domain Name System. A more recent example concerns Web caching and the use of Akamai by large sites with high bandwidth demands. You might have heard that Akamai's founder and CTO, Daniel C. Lewin, was tragically lost on one of the hijacked planes last month. One observer pointed out that the rush to news sites after the tragedies proved the importance of his company's technology.
So centralized systems evolve toward decentralization. In an intriguing, complementary operation, decentralized or peer-to-peer sites are evolving toward centralization, also in a response to growth and the need to scale upward. Gnutella now has superpeers. Freenet provides gateways, JXTA Search creates a hierarchy of servers, and so on.
Some of the activities I've seen on the Internet2 Web site under the Middleware directory touch on the problems that centralized as well as peer-to-peer projects face. It would be great for Internet2 developers to remember the peer-to-peer aspects of whatever they are researching and its potential applications to peer-to-peer. For a start, consider the possibility of symmetric exchanges over all your protocols and infrastructure. Here are some more specific topics.
First, I'll talk about naming and resource discovery. The only systems where you don't care about names are systems where you want to be anonymous. Gnutella and Freenet are famous for this characteristic, of course, and they have achieved something incredibly ground-breaking and mind-expanding: They provide content independent of its location. Later peer-to-peer systems have built on this innovation, which allows lots of good things. But most systems still want to find particular individuals or repositories for information--they want identification and resource discovery.
|
| |
They achieve this through a shameless lapse away from decentralization. Identities are stored in a strictly centralized repository, as in instant messaging services. Some products, like XDegrees, Jibe, and Redmind, do some fancy distribution and breaking up of the namespace. The good old Domain Name System does this, in fact.
The Gartner Group speaks of a virtual namespace for peer-to-peer. I don't know what makes these names less real and more virtual than any other names. I think what the Gartner Group means is that these namespaces--instant messaging, Napster, and so on--tend to spring up ad hoc and opportunistically. This seems to me a weakness of current peer-to-peer systems, not a strength.
IPv6 will definitely help. It will, we hope, bring users' systems out into the open, eliminating the current Network Address Translation system that hides the users. But IPv6 is not enough to solve peer-to-peer's addressing problem. First, we can't wait until IPv6 is deployed in the larger world. Second, it is naive to think that every device will have a fixed, permanent address when IPv6 is deployed. To do so would overwhelm the world's routers; one of the major benefits advertised for IPv6, in fact, is that it makes renumbering easier. Finally, what we really want is names rather than numbers anyway. When I ask you to visit my Web site, I don't ask you to type 209.204.146.22 into your browser. Furthermore, I may log in from many places--work, home, a mobile phone, a train station--and I'm still me even though my address is different.
Identification and resource discovery is therefore one of the great problems you can work on in Internet2. I would like answers to the question: "What combination of centralization and decentralization works best for a particular application and information architecture?"
Partly because so many services are already offered through Web HTML forms and CGI, and partly because firewalls block any data not sent through port 80, the chief method of service delivery for the next couple years will be Web services using HTTP and probably either XML-RPC or SOAP. These protocols and the programs that handle them are probably not the most efficient nor flexible way to handle peer-to-peer communications. Some other protocols you can explore include JXTA, of course, the SCTP transport-level protocol, and the BEEP application-level protocol.
Security is the bogey man invoked by many people who want to debunk peer-to-peer. I'm not sure why there's so much hysteria around the supposed security problems of peer-to-peer. Most systems, and certainly commercial systems, are perfectly up to date on encryption, digital signatures, digests, and other standard elements of network security. I suppose the confusion sets in because the most famous peer-to-peer systems, like Napster and Freenet, are marvelously open and uncontrolled. To people who are unused to disruptive technologies, open and uncontrolled must mean insecure.
What combination of centralization and decentralization works best for a particular application and information architecture? |
If peer-to-peer were inherently insecure, it would not be used by the McAfee company to distribute updates to its virus-detection software. McAfee ASaP is a service provided to large companies to let them distribute updates quickly throughout their organizations. Instead of making 10,000 individuals contact the McAfee Web site (a sure recipe for network overloads), a few initial systems contact the McAfee site, and they pass on the software to other systems in a chain. This is called rumor technology and is a form of peer-to-peer, the same architecture used by such content-delivery networks and streaming-media distributors as AllCast.
When you're fighting viruses, you're clearly concerned with security, and McAfee's use of a partially peer-to-peer system is a stunning endorsement of peer-to-peer's security. McAfee's rumor technology is not only more efficient than routine Web downloads, but more secure. Employees of each company have to go outside their corporate network only a few times to get the software. Most of the networking takes place inside the corporate network, presumably protected by a firewall and the general LAN architecture.
But peer-to-peer systems have to deal with the same security problems as traditional systems. There's denial of service, where computers can become overloaded with requests or with data. There's authentication, so you know who's sending you data. And there are larger trust issues. A centralized public-key infrastructure (PKI) is not necessarily any more robust than the peer-to-peer solution known as a "web of trust." I would not be surprised if authentication and trust become the greatest success of peer-to-peer. Eventually we may all move to adopt the web of trust as our preferred form of PKI.
There's lots and lots of room for research projects in architecture. What's the best structure to impose on the mass of internetworked computers for each combination of application and environment?
I have already mentioned metadata as an area for research. Metadata includes the kinds of categories people search for, what scales they use to measure one resource against another, or in a social sense, what brings people together.
Jabber and RDF are particularly promising ways to deploy metadata, but communities must somehow agree on tags. Then applications that exploit their potential need to be developed.
|
And finally, bandwidth issues, one of the fundamental features of Internet2. I've saved this for last among my technical topics because the popularity of file-sharing systems on college campuses and the negative reaction of many system administrators deserves a good share of time.
By decentralizing data and therefore redirecting users so they download data directly from other users' computers, Napster reduced the load on its servers to the point where it could cheaply support tens of millions of users. The same principle is used in many commercial peer-to-peer systems; I just mentioned it in relation to McAfee ASaP. In short, peer-to-peer cannot only distribute files, it can also distribute the burden of supporting network connections. The overall bandwidth required on the Internet remains the same as in centralized systems, but bottlenecks are eliminated at central sites and equally importantly, at their ISPs.
[Overloaded campus networks] are not a problem caused by peer-to-peer. |
How much bandwidth does a simple peer-to-peer system like Napster save? Let's look at some rough estimates made by a company called CenterSpan, which makes a peer-to-peer content-sharing system called C-Star. They estimate that, if you put together Napster and the various Gnutella systems and all the knock-offs, you'd see about 3 billion songs traded every month. Sounds like a high number, but it's been replicated elsewhere and could be pretty accurate. If you delivered all those songs from a central server, you'd need 25,000 T1 lines costing 25 million dollars a month. Peer-to-peer has to be more efficient.
Many network administrators will now protest that Napster was a bandwidth hog and overloaded their campus networks. This is not a problem caused by peer-to-peer: The load would have been just as bad had all those students exchanged files over FTP or some other protocol. Music files are just plain big, and if it suddenly became the hippest thing on the planet to exchange PowerPoint presentations or 100-page PostScript files (like term papers), the load would be just the same.
I'm not surprised that colleges would complain about Napster bandwidth requirements because I hear the same wringing of hands over education in general. I hear there are too many applicants to top colleges. Excuse me, but wouldn't it be good to educate more students? Instead of saying there are too many applicants, why don't you work on increasing the availability of high-quality course offerings? I know you don't have tenure-track positions for all the people awarded doctorates, but it's not your job to offer everyone a position; it's your job to educate them.
College administrators have fallen into the same rut as telephone companies that are slow to roll out high-bandwidth lines, or the recording industry that is shutting down Napster. These institutions all find it more profitable to manage scarcity than to offer abundance.
Universities should be excited by the spirit of curiosity shown by Napster. |
I'll apply the same reasoning now to Napster. The reason tens of millions of people used it is that it opened up the wonderful universe of music. Napster was much more than a free source of popular tunes; it represented exploration, a striving to know the unknown, a widening of cultural horizons. Yes, I know most of the stuff traded over Napster was junk, but much of what Beethoven wrote was also junk; just ask any musicologist. The point is that you need to cut a wide swath to encourage new experiences and new sounds. Universities should be excited by the spirit of curiosity shown by Napster. It was a flowering of cultural opportunities never before seen in the world, and that's why there were so many downloads. Let's provide bandwidth for the material people want instead of complaining that they want it.
I know you have to pay for bandwidth, so you have to charge for bandwidth too, somehow. I'm sure your users like to have things cost-free, and the next best alternative to cost-free is flat-rate. If you have to move to some kind of chokepoints or metered pricing, I don't have a right to criticize you, but I'd like to offer a couple points of comparison for you to consider first.
One frequently made observation is that Internet access and use is much greater in the United States, where local calls are priced at flat rates, than in most countries where local calls are metered. Observers tend to conclude that flat-rate pricing encourages experimentation, and suggest that many innovative uses of the Internet arose within this environment of experimentation. The whole phenomenon called "surfing" was a critical phase in the growth of the World Wide Web.
A second fascinating historical point of comparison is the New York City subway system. When it was opened near the beginning of the 1900s, the city's leaders had to decide whether to base prices on how far people were riding, or to charge a nickel for every ride regardless of distance. They choose the latter, flat-rate system. Historians believe this led to the rapid spread of New Yorkers out of Manhattan and into the surrounding boroughs, creating a richer and more thriving city.
We have no idea what students will think of sharing next. Their experiments should be welcomed because it will make the university more transparent and force professors to teach better. |
Peer-to-peer excites people because they can participate and make a difference. Even something as impersonal as SETI@home, where users downloaded software that performed calculations in the background, attracted millions of volunteers. And many said they did it because they felt like they were part of something. Just think how much more sense of ownership and pride can evolve around systems where you share ideas and content that have personal meaning to you.
University professors are already feeling anxious about students who share personal notes on the Web. Some professors have tried to force students to remove these notes, all the while uttering the same irrelevant shibboleths that plague peer-to-peer now: The professors claim to be worried about quality; they think they own intellectual property rights to the ideas they put forward in class--Not true!--and so forth. Everybody knows the professors are just scared to death at having their work exposed to scrutiny.
Peer-to-peer will up the ante even further. We have no idea what students will think of sharing next. Their experiments should be welcomed because it will make the university more transparent and force professors to teach better. Remember what happened--excuse me if this is trite--when Alexander Fleming discovered a foreign mold in one of his petri dishes. Instead of throwing it out, he started to research it, and discovered penicillin.
There's still a lot of innovation left for computer technology. The next time you need rapid interaction, efficient data sharing, and the combined processing of inputs from many different sources, I urge you to look at what peer-to-peer can offer.
Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.
Return to OpenP2P.com.
Copyright © 2009 O'Reilly Media, Inc.