Virginia Tech Power Mac G5 supercomputer cluster to run on Mac OS X beta

“Using several new technologies and more than 1,000 dual-processor Power Mac G5 computers, Virginia Tech University is building a supercomputer cluster that is likely to rank among the fastest in the world,” reports Jay Lyman for TechNewsWorld.

Lyman reports, “In addition to the G5 machines, the university said it is using a beta version of the latest release of OS X, new networking hardware from Mellanox and Cisco, and cutting-edge configuration and cooling technologies to build the powerful cluster for a fraction of the price of a traditional supercomputer. ‘The total price tag is probably a factor of 10 lower than a machine in this class in the past,’ Virginia Tech College of Engineering dean Hassan Aref told TechNewsWorld.”

“The latest announcement highlights the departure from monolithic mainframe supercomputing to less expensive, grid-like configurations, Yankee Group senior analyst Dana Gardner told TechNewsWorld. ‘This is further evidence of going away from symmetrical multiprocessing mainframes and moving more to a distributed grid of relatively low-cost nodes,’ he said. Gardner, who said OS X’s BSD kernel roots and Linux and Unix heritage make it ideal for the Virginia Tech cluster, indicated the key to the newer grid approach is the technology layer above systems,” Lyman reports.

Full article here.

12 Comments

  1. Glad to see Macs are in the running, another “left for dead” company is Cray who was featured in the latest issue of Business 2.0 talking about the disastrous merger with SGI and the bad blood between both camps.

    Either way, technology is moving in the right direction, prices going down with more innovation happening. I’d rather see different technologies fighting it out then just one (ahem, winblows).

  2. Cool! They have adopted the “Borg” philosophy. When one unit goes down, it doesn’t effect the whole collective.

    Actually, SETI@home [setiathome.berkeley.edu] was the first to do this with personal computers on a huge scale. Although their network is based on the slow internet, this is compensated by the vast numbers of CPUs working in concert. As of today, they have used 1.6 million YEARS of computer processing time. Today’s current cruising speed is 55 TeraFLOPs/sec. [setiathome.berkeley.edu/totals.html] I’m not sure how that speed compares to the Virginia Tech G5 Cluster, but I am very sure it costs a lot less (free).

    A super-computer with the cool Mac OS GUI: Does computing get any better?

  3. In other words, they’re using Panther, and not waiting for the final release before they get started. Sounds like Panther must be coming along quite well then. (Plus I’m sure Apple will upgrade them for free when Panther’s done.)

    Good press for the G5–and (later on) Panther.

  4. Folding@home is faster than all the world’s supercomputers put together.

    It’s a great project, but a different issue: it’s distributed computing and limited to one app… vs. installing a cluster (in one place, WITH fast interconnects) that VA Tech owns and can use for anything they want.

    Question–what has been done similar to this with Wintel boxes? I know the G5 is faster, but surely some massive clusters on this scale have been made using Pentiums or Xeons. I’d be interested in any info on other clusters of this size.

  5. Nagromme

    Look at the supercomputer list Apple is trying to get on. Of the 500, I believe over a hundred are Intel based:

    Highest ranked cluster (at #3) is at L.Livermore and is Intel Xeon based (7.634 Tflops) (2304 processors.)

    IBM has #4 and #5 with Power 3 chips (7.304 Tflops) but neither are clusters (SP systems.)

    IBM has #6 with a Xeon cluster (1920 processors.)

    #8 and #11 are also Intel based.

    Highest is the EarthSimulator by NEC (35.86 Tflops!!) with NEC built chips (5120 processors) which is in the lead by a whopping margin (next up is the #2 HP Alphaserver ASCI Q at 13.88 Tflops with 8192 processors)

    Apple is nowhere in the list (and has never made it into the list since the list’s inception)

    For completeness…
    Highest AMD is at #80 (Athlon MP 1.4Ghz chips/512 processors) called the HELICS
    and to show how fast things can change…
    the HELICS originally cost 1.15million US and was initially ranked #35 in June 2002… one year later they are #80.

    Which pretty much means even if Apple cracks the top 10, if they don’t start getting large customers or updating tech faster (where are those damn PBs?!!!), they’ll be bumped out very fast. This is also probably why they are so intent on making the Fall list– chances are they’d have no thunder by the June List (10 Tflops will probably rank in the late teens by then.) AMD has a lot of major projects to show off their new chips and there is currently a race to bump the EarthSimulator out of the top spot between HP and IBM (each with funding) So if Apple makes this cut, they’ll probably be at #4 or #5 (if they succeed to get 10Tflops); the reason why they won’t be #3 is because (current #4) ASCI White needs to be retested and will probably score over 10Tflops if (current #5) SEABORG (which jumped in Tflops after being retested and is of similar–but smaller– architecture of White) is any gauge. This doesn’t take into any other new entries.

  6. I wonder why Panther and not Jaguar is used by this system. Maybe Panther is better at sharing resources over a network than Jaguar.

    Anyway, would it be possible to network all these G5 PowerMacs by using the FW 800 ports? I am no expert in networking but it seems like FW is the more convenient option between several Macs than Ethernet which needs hubs and switches.

  7. Virginia Tech’s cluster only contains 1100 dual-CPU machines. It’s peak performance will thus be limited to 4Gflops per 2GHz PPC970, thus 8.8TFlops at the max. I’d be surprised if it scored higher than about 4TFlops on a real application-based benchmark like LINPACK.
    The cluster would thus still be almost an order of magnitude slower than NEC’s Earth Simulator.

  8. Sol wrote: “Anyway, would it be possible to network all these G5 PowerMacs by using the FW 800 ports?”

    FireWire would be a ridiculously low-speed and inefficient technology with which to connect fast CPUs. The Earth Simulator, for example, consists of 640 eight-CPU nodes fully interconnected by 100Gigabits/sec crossbar — i.e. virtual dedicated point-to-point — links.
    A FW800 bus runs at 800Megabits/sec which would be shared by all the machines connected to it. Such a scheme wouldn’t even begin to compare. Switched Gigabit ethernet would be much better — though still quite slow compared to the E.S.

  9. Actually, FW800 has higher throughput, lower latency and is better suited for this type of system than gigabit ehternet. While the nominal rate of gigabit is higher (1000 Mbs vs 800 Mbs), there is a significant overhead caused by the TCP/IP stack.

    Still FW800 comes no where near what can be acheived with InfiniBand with a 10 Gbs norminal rate.

  10. David wrote: “While the nominal rate of gigabit is higher (1000 Mbs vs 800 Mbs), there is a significant overhead caused by the TCP/IP stack”

    Um, no. It’s possible to encapsulate MPI packets directly in Ethernet frames that will be directly switched. There’s more to Ethernet than just TCP/IP, you know.
    Besides, the software path length and resulting latency of IP over FireWire isn’t any different from IP over Ethernet’s. And how are you going to interconnect all these machines using FW? The number of ports on the rare FireWire switches on the market isn’t exactly competitive with the port counts available on gigabit ethernet switches…

    As for Infiniband, a Paceline “enterprise” 4100 switch using IBM’s latest InfiniBlue chipset has eight full-duplex 10Gbps ports and delivers an aggregate *theoretical* throughput of 160Gigabits/sec.
    The Earth Simulator’s *single-stage* crossbar has 640 full-duplex *128Gbps* ports delivering, um, 164Terabits/sec.
    After protocol overhead and error correction, each 128Gbit/s channel delivers about 100Gbit/sec of *effective* data throughput, or 200Gbps aggregate per node. The inter-node bandwidth is thus already 4 times larger than e.g. the bandwidth between a PowerMac’s PPC970 CPU and its local 400MHz DDR RAM !
    Note also that the PowerMac’s DDR RAM banks can deliver 3.2GBytes/sec of data to the PPC970. The RAM in each node of the Earth Simulator can deliver 256Gbytes/sec of data to the CPUs.

    Building scalar parallel machines with inexpensive chips like the Pentium, IBM Power, PPC970 or the Opteron and slow interconnects might be all nice and well, but at the end of the day you get the performance and throughput you pay for, especially if the computation problem you have happens to need some serious sharing of data across nodes.

Reader Feedback

This site uses Akismet to reduce spam. Learn how your comment data is processed.