Keying into the global computer

Most of the world's computer power sits idly on desktops, but projects that can harness it are yielding valuable information, find Gary Wilkinson and Jack Schofield.

A 19-year-old architectural student in the Czech Republic recently reached a landmark when his PC, running the SETI@Home screen-saver, made the project's half-billionth download. At the same time, millions more PCs in homes, offices and universities around the globe were "folding" proteins, helping with cancer and Aids research, analysing stock market trades, or calculating ever larger prime numbers. Welcome to the grid.

The success of SETI@Home, which uses about 3.5 million PCs to analyse telescope data in the Search for Extra-Terrestrial Intelligence, should not be measured by the fact that it hasn't found any. By showing that it is possible to harness the processing power of millions of PCs connected to the internet, it has encouraged the development of dozens of projects to do the same thing in more sophisticated ways, ushering in what could be a new era in computing.

Today, most of the world's computer power is not concentrated in giant data centres. It is sitting on desktops, where it spends most of its time doing nothing. While you are getting a cup of coffee, or at lunch, or just pondering your next move in Minesweeper, the computer keeps ticking over, "wasting" processing time. Find some way to harness these spare resources and there is the potential to construct a sort of "distributed computer" more powerful than any supercomputer. SETI@Home, launched in April 1999, takes the simplest approach. A central server takes data that has been collected from the Arecibo radio telescope in Puerto Rico, and splits it into chunks that can be downloaded over the internet.

When your PC is inactive, the SETI@Home screen-saver kicks in and starts work on this data, looking for patterns that could indicate an intelligent signal. The results are sent back in exchange for the next set of data until the project is completed. The project has exceeded director David Anderson's expectations: "Our original plans were based on the assumption that we'd get 50,000 to 100,000 participants. After one week we had 200,000 users, and it's increased steadily to 3.6m. Also, the project was originally planned to last two years; we're now in our third year and, since our user base is still solid, we're planning to continue the project and variants for at least another year."

No matter how sexy it is to think that your PC may be about to find evidence of an alien civilisation, there is also the opportunity to make a more immediate contribution to science. The Folding@ Home project is not as big as SETI, with only 15,000 volunteers, but it is poised to make a big leap in popularity when it is included on the search engine Google's toolbar for Windows, which is currently being beta-tested. Proteins self-assemble, or "fold", into complex three-dimensional shapes that determine their function.

While the process is fundamental to biology, little is known about how it works on an atomic level. "Mis-folding", or changes in shape, can lead to disease: it is implicated in Alzheimer's and BSE, among others. Until now, it has been impossible to simulate folding: it would take one computer about 30 years.

However, a team led by Dr Vijay Pande at Stanford University has changed that. Pande has been researching protein folding for more than 10 years.

"With my move to Stanford, it was natural to think about new ways to perform research in the field that would make a significant contribution," he explains. Eventually, he realised you could use distributed computing to simulate protein folding.

"We have folded several proteins and the results are now either published, or awaiting publication. No other distributed computing project can state that." Pande's team started by folding the relatively simple beta hairpin molecule. "The beta hairpin was a natural candidate as it's small and we could compare with experiments," he says.

"With that 'proof of concept' successfully completed, we have moved on to more biomedically relevant proteins, such as the amyloid beta peptide that causes Alzheimer's disease." The collaboration with Google came after meeting company president Sergey Brin at Stanford, andPande hopes to "significantly increase the number of participants, to hundreds of thousands of PCs".

Perhaps you want to do more than make a contribution to science - such as win money at the same time. The Great Internet Mersenne Prime Search (Gimps) has been running since 1996 and has found five very large prime numbers. Late last year, the project discovered the largest prime number. Written in mathematical shorthand as 2 13,466,917-1 (or two times itself nearly 13 and a half million times, minus 1) it has more than 4m digits (think of roughly two and half times the whole of the copy of the Guardian, but all text). You can buy a poster with the number printed on it in a very small font (and a free magnifying glass).

This is only the 39th Mersenne prime found, a special category of prime numbers named after Marin Mersenne, a 17th century French monk who studied them. Mersenne primes are relevant to number theory, but most participants join Gimps for fun, downloading numbers that are likely candidates to test. However, The Electronic Frontier Foundation is offering a $100,000 prize to the first person to discover a 10m digit prime number. Your chance of finding it is only about 1 in 60,000, but that is better odds than the "new" Lotto.

Unlike SETI@Home or Folding@Home, Gimps is designed to run all the time the testing computer is switched on, but in the background, at a low priority. It can take a while. The record-breaker was found on an 800MHz PC with 512 megabytes of memory after 42 days of continuous processing. There are many similar projects, including Genome@Home, Fight Aids@Home, PCP@Home (tackling Post's Correspondence Problem), a Generalized Fermat Prime Search, and Seventeen or Bust, "a distributed attack on the Sierpinski problem".

Art (The Electric Sheep Project) and games (The Distributed Chess Project) are also represented. For many more examples, see the link below. But the real question is whether the same approach can be used for business. Distributed computing is nothing new - it was a popular research topic in the 1980s - but typically the processors would be inside a single box, such as a supercomputer, or a clearly defined and controlled network of computers. Exploiting millions of machines whose reliability and owners are unknown, is a different game.

However, the idea's potential is so huge that solutions are being sought. The basic idea means you stop thinking about running programs on a computer, and start running them on a network. It should work much like the National Grid that distributes electricity, but distributes computer processing power instead. Jobs are split up and tasks are assigned, according to need. The distributed computing technology for Gimps was developed by Entropia, a company that also develops grid computing for commercial enterprises.

President and chief executive John Wark sees a big future: "Over the next three years, we see grid computing becoming the standard computing infrastructure for high performance computing problems in industries like life sciences, chemical, oil and gas, engineering, etc. We will also see grid computing becoming increasingly integrated with the traditional business IT computing infrastructure." So in the future, while you are writing a memo on your PC, in the background it may be, unknown to you, helping to analyse your company's sales, holding a tiny part of your main stock database or even processing your wages.

At the Folding@Home project, Pande is also enthusiastic: "While it's bleeding edge scientific computing right now, it will become more mainstream, especially as others see what one can accomplish, given new algorithms that are suited to distributed computing." Big computer companies such as Compaq, Hewlett-Packard and IBM are working to commercialise grid computing, often in collaboration with governments who want to reduce the cost of their huge computing infrastructure. At the end of April, Gordon Brown, the chancellor, opened the National e-Science Centre (NeSC) in Edinburgh, to develop grid computing in the UK.

IBM is involved, and the company is also collaborating with the US Department of Energy to develop a massive academic grid that will link research supercomputers and off-site scientific instruments such as telescopes across America. The project is due to be completed by the end of this year. IBM's Brian Carpenter, a grid specialist, says: "The grid in a narrow sense came out of the scientific community, to do large scale e-science, but it will be much more than that.

It will be the way to manage resources of any kind on the net, without you having to think about it." In a multi-vendor networked world, this requires a standards-based architecture, and Carpenter points to the Open Grid Services Architecture (OGSA). This is an evolving version of the Globus Toolkit from a project led by the Argonne National Laboratory, the University of Southern California and the University of Chicago's Distributed Systems Laboratory. The Globus Toolkit will also be used in Butterfly.net, announced at the beginning of May, which is being built by IBM's summer students.

This will be the first computing grid dedicated to online gaming. Its multiplayer games will, in theory, allow more than a million people to play at the same time. It is claimed the system will be more reliable by eliminating problems such as game-play lag caused by too many players trying to log on at the same time, or having to stop a game for maintenance or upgrades.

Also, Butterfly.net has been designed to allow players using different platforms -PlayStations, PCs and so on - to play each other with the same game. Initially, there are 50 servers, which will allow for 5,000 players. But there is no theoretical limit as new servers are added. In the first stage, grid computing can be done the way SETI@Home does it. This isn't a peer-to-peer system, because all the control resides at the centre, with the server or host computer. It's just like home working such as knitting: jobs are dropped off, processed and collected. There's nothing intrinsically wrong with centralised control: it can be very efficient. However, the real revolution will only come when grid computing works on a democratic peer-to-peer basis over the internet, so that anyone can play.

From that point of view, SETI@Home may look much less important in 10 years' time than peer-to-peer projects such as Jabber (distributed instant messaging), Groove (collaboration), Freenet (document distribution) and Free Haven (anonymous storage). Unless, of course, the aliens say hello, and Anderson is philosophical about that. "We'll probably hear a signal within 50 years," he says. "SETI is not for the impatient!"

© Guardian News & Media 2008
Published: 6/12/2002
 
Use the feedback form below to submit your comments.
Your Comments:
Your Name:
Use the form below to email this article to your friends.
Recipient Email Address:
 Separate multiple email addresses by ;
Your Name:
Your Email Address: