Editor's note: In part one of this two-part series on harnessing the idle processing power of distributed machines, Carlos Justiniano explained the current trends in this exciting technology area and drilled down into specifics such as client/server communication, protocols, server design, databases, and testing. In today's conclusion, he covers network failures, security, software updates, and backup.
Building network software can seem deceptively simple at times, especially for less experienced developers. In network programming, the complex actions of humans, computers, and networks can cause unexpected behavior that results in a catastrophic outcome, such as a server crash or, worse, a system crash. The key to surviving catastrophes is to plan for them.
Consider these issues:
Unless carefully planned for, unexpected problems may ultimately destroy a distributed computing project that is ill-prepared to cope with unexpected situations.
It's essential to distribute the same work units to many PeerNodes. If one PeerNode does not return a result, perhaps some other PeerNode will. Should all of the PeerNodes working on the same task fail to return a result, then the work unit is simply marked as incomplete and will be sent to another batch of PeerNodes at a later time. The use of redundancy creates a robust system, where the project is not dependent on whether or not an individual PeerNode completes a task.
You must also take into account the potential failure of a SuperNode server. How will PeerNode clients respond if a SuperNode is no longer available? PeerNode clients must be designed to handle the case of unreachable SuperNodes. If a PeerNode client stopped working (because of a crash), then the moment a SuperNode server becomes unreachable, your entire network might fall apart. >/p>
One way to address this problem is to design your PeerNode so that it accepts connection failures and gracefully retries at a later time. Additionally, build your PeerNode client so that it's able to connect to Internet addresses by name, such as node01.distributedchess.net or node02.distributedchess.net. If a connection to node01.distributedchess.net fails, then the PeerNode will try node02. PeerNodes can also maintain a list of SuperNode servers and migrate to the next available server as needed. Should a SuperNode server fail, PeerNodes will behave like a swarm of bees changing direction on their way to another destination.
Security plays a vital role in many aspects of a DC project. Both project organizers and participants have valid concerns regarding security. Participants have concerns about their privacy and their machine's susceptibility to viruses, and many wonder if using DC software makes their machines easier targets for attackers. On the other side of the fence, project developers have concerns that attackers may find ways to tamper with results, invalidating received work units.
Developers are faced with securing a project from a number of vantage points. A first order of business is to examine points of vulnerability. Where can an attacker cause harm? Which aspects of the project can be exploited and otherwise abused by participants?
Project participants download PeerNode client software from a project's web site, so it makes sense that the project's web server is an obvious target. If an attacker can penetrate a site and replace the downloadable PeerNode clients with compromised versions, then many machines will become infected.
Fortunately, a considerable amount of research has been done to address server security. Intrusion detection systems (IDSes) use sophisticated monitoring techniques to detect potential security issues. An IDS can monitor TCP packets to identify when an attacker is performing a port scan or when a denial-of-service attack is underway. IDSes can also monitor system files and user patterns for unexpected behavior (such as a normal user acquiring root level access), and when core system configuration files are modified. You can configure an IDS to send you an email when a problem occurs. Think of it as an early warning system.
There are thousands (OK, maybe just hundreds) of tools that can be used to monitor network traffic. One such tool is the freely available Ethereal, which uses a packet-sniffing library in order to perform its higher-level functions, such as filtering and display. The same types of underlying tools are available to attackers who can use them to intercept and modify data while in transit.
Take, for example, an attacker wishing to disrupt a DC project. The attacker builds a Trojan software product that masquerades as a useful monitoring and statistics tracking system for end users. The malicious tool performs useful functions while slightly modifying the transmitted results prior to sending them on their way. The tampered data has the potential of completely invalidating the DC project -- resulting in a complete waste of time for all involved. We won't get into the many psychological reasons why some people consider this sort of behavior exciting, but suffice it to say that disrupting a high-profile DC project might offer an attacker icon status in certain circles.
The only hope of protecting your project is to make it difficult for an attacker to modify transmitted data. As with most things, there are easy and harder ways of doing things.
Data Hiding
Software developers are sometimes faced with the classical problem of space versus performance. The need to protect data may be sufficiently clear; however, the cost of doing so may be prohibitive. Hiding data, rather then fully encrypting it, and using strong validation techniques on both the server and client end, may offer a suitable compromise.
Data compression can effectively reduce bandwidth requirements, and has a positive side effect of masking the original contents. Applying byte transformations, such as XOR operations and weak reversible data ciphers, will further aid in data hiding. Clearly, data hiding is by no means as secure as data encryption, but may be suitable for use in certain settings.
Data Encryption
Widely available implementations of popular data-encryption algorithms leave project developers with little reason not to apply some form of data protection. One popular algorithm is the Advanced Encryption Standard (AES), also known as Rijndael (pronounced "rain doll"). AES is a variable block symmetric encryption algorithm developed by Belgian cryptographers Joan Daemen and Vincent Rijmen as a replacement for the aging DES and Triple DES standards that are still commonly used to secure e-commerce. AES is currently used in hundreds of high-end encryption products and is a favorite among developers. Additionally, AES implementations can be found online.
For maximum security, where performance may be less of an issue, the use of public key cryptography is highly recommended. Public Key Infrastructure (PKI) systems use public key encryption to create digital certificates, which are managed by certificate authorities. Certificate Authorities (CAs) establish a trust hierarchy, which can be used to validate authenticity through association. The use of PKI would allow a SuperNode server to authenticate PeerNodes, and PeerNodes to validate that they are indeed communicating with an authentic SuperNode.
Detecting Software Tampering
Project contributors may acquire PeerNode client software from one of many locations. For example, a project team leader might download and place the client software on the team's web site along with specialized instructions for team members.
There is a certain degree of trust associated when the software is downloaded directly from the project's main web site. However, when project software is made available from different locations, project contributors may not be able to trust the origins and validity of the software. For all they know, the software could be a Trojan program. To address this concern, DC software is often posted on a project's site along with a cryptographic hash string, such as:
3402b30a24dc4d248c7c207e9632479a client21201-01-lin-i586.tgz
The string of numbers and letters is the output of a program called md5sum, which generates the string of alphanumeric characters when given a filename as input.
End users can download a project's client software and type:
md5sum client21201-01-lin-i586.tgz
The output is a string that should match the one posted on the download site.
For higher levels of security, some projects sign their files using a private key. Users wanting to validate that a file's digital signature is correct can retrieve the project's public key (available online via public key servers) and use it to validate that the downloaded file. The signature below is an example of what a contributor might see posted on the project's site.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA/ye8su9d1K+MjI6sRAtQXAKCgcXahYj1ZcptsXR10WCnSbKs2ggCeK/Qv 4THuyfGOeDEyHiHnHX9pkZw= =Nj+a -----END PGP SIGNATURE-----
The important thing concerning a cryptographic hash and a digital signature is that both techniques can be used to determine whether a file has been tampered with after it was posted. This allows the software to be distributed and for end users to validate that the file has not been tampered with. By far the most secure method involves the use of digital keys, and that technique is being used in an increasing number of projects.
There is a wealth of security information available on the Internet, and many open source projects demonstrate working implementations. As a DC project developer, you have the responsibility to explore security and protect both your project and your members.
As computer users, we've grown accustomed to automatic software updates. Now companies such as Microsoft, Apple, and Red Hat offer their customers software updates. They're not alone, as thousands of other software companies also offer updates.
Updates are released for any number of valid reasons, ranging from program fixes and new features, to the latest security Band-Aids. In all fairness, software complexity has increased while the time to market has decreased, causing products to be released to an unsuspecting public in a less-than-perfect state. In addition, network-connected software is under siege as attackers attempt to discover and later exploit security flaws. Rapidly distributing software updates has become the only real defense companies have against coping with the unexpected.
A real issue facing developers is how to make software updates quick and painless for their customers. Long gone are the days when posting an update on your company's FTP or web site sufficed. Companies are seeing their products used by a wider demographic, where even a "one-click install" is one click too many.
The challenge affects all DC projects, to lesser and greater degrees. Long-running, static-type projects often do not require software updates. However, for projects that are highly dynamic, where the client code is being updated in response to ongoing research and development, the need to release continuous updates is far more critical.
|
An important consideration for any DC project involves determining network bandwidth utilization. It is important to consider bandwidth from both the SuperNode server's end as well as from a PeerNode's perspective.
Many popular DC projects package data into chunks that take PeerNodes days or sometimes weeks to process. In these cases, the actual network bandwidth utilization on the client side is negligible by average end-user network usage patterns. For example, accessing a single high-bandwidth web site with lots of graphics can use more bandwidth than most DC clients use in weeks.
The situation on the server side offers a very different perspective, where a single SuperNode server might service thousands (or, if truly popular, millions) of requests per day.
It is important to examine the frequency and size of data transmissions and to adequately plan for bandwidth requirements. One of the best ways to study the behavior of a network application is to use a network analyzer. Ethereal is a good tool for examining what is actually going through the wire; however, network-bandwidth measuring tools and load-test simulations should also be used to gain better insights into the application's true requirements.
Understanding the project's network requirements is necessary in order to choose the right server-hosting plan. Proper planning is essential, because if your project becomes successful, you may discover you can not afford to pay for the bandwidth costs. Even if you do not pay for bandwidth, the bottom line is that someone, somewhere, does pay, and without proper analysis, your project may be terminated prematurely.
The best course of action is to plan for bandwidth, carefully consider your data protocol, and potentially, use data-compression techniques.
Your favorite computer's days are numbered: it's just a matter of time before a key component, such as a hard drive or power supply, breaks down. Because distributed computation projects typically deal with vast amounts of data, it is absolutely vital that you develop a backup strategy.
It is virtually impossible (or exceedingly expensive) to guard against data loss. The key is to minimize your risk exposure. For instance, if you perform a backup once per day, it's possible that you may lose nearly a day's worth of data in the event of a hardware failure. Thus, backing up data once per hour minimizes the risk of losing an entire day's work. Data mirroring using redundant hardware is certainly the way to go if you can afford it, however you'll still need a backup policy.
An overwhelming majority of individuals don't perform regular data backups. The reason is actually quite simple -- it's a chore to do so. The best way to ensure that data is backed up regularly is to automate the process. On UNIX-type systems, the process is greatly simplified using the crontab scheduling system and archive scripts. On other systems, you may need to explore backup software solutions.
Another key backup strategy is the concept of offsite storage. In addition to redundant storage (storage on multiple machines) and CD archives, I use a service called Xdrive as an offsite storage facility.
Protect your data. Hardware may be expendable, but loss of data may cripple your project. Yes, this is potentially one place where paranoia may really pay off.
We've skimmed the surface of many, but by no means all, of the technical considerations you might encounter. However, not all of the issues you'll encounter will be technical in nature. There is a very human aspect to distributed computing, and failure to understand the human elements will seriously jeopardize your project's longterm viability.
In the past, the notion that individuals would pay for, and allow their computers to participate in, research projects was foolhardy, at best. Times have changed. Today, millions of people participate in distributed computation projects. As a result, we are now able to tap a wealth of computing resources. However, there is one small catch: we must convince people to join our projects.
If you are interested in getting people to join your project, you need to create a value proposition offering an enjoyable and rewarding experience in exchange for participation. In addition, you need to consider how to retain members once they have joined. The best way to begin to address these issues is by understanding the underlying motivators that attract people to distributed computation projects.
You may be wondering what drives a person to contribute their time, energy, and the use of their computer to a distributed computation project. Although specific reasons vary, there are a few common themes that consistently appear in DC projects.
A Sense of Purpose
Some members are motivated by a deep sense of purpose. Projects such as FightAids@home and the University of Oxford's cancer research project offer individuals the opportunity to support noble research that might ultimately benefit millions of people.A Sense of Community
Many active members enjoy being part of a community and collaborating with other people. Generally, people like to be involved in things that transcend them as individuals.Competitive Opportunities and Peer Recognition
Members want to know that their contributions matter. All major distributed computing projects track member contributions and post the results on the project's web site. Members gain the respect of their peers and obtain subculture ranking within communities.Entertainment
Participating in a distributed computing project can be entertaining in a number of ways. Meeting people and competing against them can be entertaining.
Successful projects understand the needs we've just examined and seek ways of promoting them within the context of the project.
Distributed computing projects have given birth to communities of enthusiasts who closely support projects. In turn, project web sites publicly display project statistics and member ranking (sometimes referred to as leaderboards) offering individuals a convenient way to compare their ranking against those of their peers. This has led community members to form teams, which compete against one another to see which group can make the most significant contributions to a project. Project organizers are eager to support competition because the results typically lead to teams recruiting more members and subsequently, more computers.
Distributed computing team members have adopted the moniker "DC Team," and members refer to themselves as "DC'ers." Many DC'ers take their hobby seriously, and many run two or more machines, with some running as many as 40 or more while participating in various projects.
When asked why he contributes to projects, DC'er Chris Harrell replied, "I like to think I solely pursue DC projects for the common good of mankind, but I cannot deny the fact that the project statistics are the main attraction for 99% of DC contributors." Chris is far from alone; for many DC'ers, interest in a project comes second to competing for public ranking.
When I started ChessBrain, a global project to build the world's largest distributed chess computer, I was surprised to discover contributors who had very little interest in the game of chess. This was my first introduction to a network economy where DC teams support research projects in exchange for an opportunity to compete against one another.
International teams, like the Dutch Power Cows and AnandTech, claim to have thousands of members. DC Teams have become a powerful force in helping to shape the future of distributed computation projects on the net, by providing a highly technical member base with access to thousands of machines. They are the unsung heroes of a new age.
Project participants have many projects to choose from, but don't mind exploring a project for a brief time, in order to get a sense of it. However, potential members won't waste their time participating in a project that doesn't appear worthwhile.
Before a significant number of people take interest in a new project, it must first establish a certain degree of credibility. Establishing credibility begins by clearly articulating goals and demonstrating the project's commitment to achieving a measurable result. The project must clearly communicate the message: "This project is worth your time!"
Most distributed computing projects maintain web sites, which articulate the project goals, present project status reports, and offer software download areas. In some cases, project web sites feature online forums where members can post feedback directly to the development team and other members. A project's community forum offers project leaders and members opportunities to publicly engage in conversations. The presence of a public forum can go a long way toward communicating the commitment and seriousness of a project.
One way of gaining credibility is through association. Some DC projects enjoy near-instant credibility when well-established institutions or well-known companies sponsor them.
In the process of building credibility, you must also establish a relationship based on trust. Generally, participants must believe in the credibility and trustworthiness of a project before downloading and running potentially malicious software on their machines and networks.
One of the surest ways of establishing trust is to engage in direct conversations with potential members via email, on a project web site, and on other public forums. Nothing says, "your voice matters" faster than a prompt reply to a member's inquiry. Although this isn't always possible, the goodwill generated is worth its weight in gold.
Open and honest communication is a tool that tears down relationship barriers, and helps foster healthy and productive relationships. This is a point that is often difficult to remember when coping with difficult people. Let's face it – public relations can be a difficult job. Freedom of networked speech often results in members pretty much saying whatever they want while publicly venting frustrations. These sorts of behavior can quickly erode a project's credibility as mob-like conditions lead others to join in. This is where, as a leader, you must exercise the most restraint. Months and possibly years of relationship building can quickly crumble as a result of an ill-prepared response.
It is important to remember that project contributors give freely of themselves and that it is difficult to run a distributed computation project without them. Exercising tempered restraint and maintaining an eternal state of gratitude is vital to maintaining a successful project.
The most important element in a distributed computation project remains the people and communities who join together to unlock the vast potential of distributed machines. To paraphrase the Matrix: They are the gate keepers. They are guarding all the doors. They are holding all the keys.
Years ago, Sun Microsystems promoted their marketing slogan: "The network is the computer." Although this still remains relevant in the context of distributed computing, I'd like to offer another mantra: "The people are the network." The machines are simply tools that allow us to touch, if for just a moment, the very limits of our imaginations.
ReferencesThe O'Reilly Open Peer-to-Peer Directory Berkeley Open Infrastructure for Network Computing (BOINC) Alchemi: A .NET-based Grid Computing Framework and its Integration into Global Grids (PDF) Hypertext Transfer Protocol (HTTP) XML for Remote Procedure Calls (XMLRPC) Simple Object Access Protocol (SOAP) Justiniano, C and Frayn, C (2003). The ChessBrain Project: A Global Effort To Build The World's Largest Chess SuperComputer. |
Carlos Justiniano is a software architect with Y3K Secure Enterprise Software Inc., where he focuses on data security, communications, and distributed computing.
Return to OpenP2P.com.
Copyright © 2009 O'Reilly Media, Inc.