MacDailyNews - Where Mac news comes first

 MacDailyNews Poll

Deal of the Day

5 Day Most Commented

Opinion Archive

Current Headlines

Latest Joy of Tech

  • Latest Joy of Tech!

MacNN

AppleInsider

Macworld UK

TUAW

MacRumors

Yahoo! Finance AAPL

iTunes Top 10 Albums

Mac OS X Downloads

Sat, Nov 07, 2009 - 07:33 PM EST  —  AAPL: 194.34 (+0.3099, +0.16%)  |  NASDAQ: 2112.44 (+7.12, +0.34%)

US Army’s ‘MACH 5’ Apple supercomputer offers unmatched price/performance
Thursday, June 24, 2004 - 10:55 AM EST

"The Army Research and Development Command will use a giant cluster of Apple Computer Inc.'s G5 servers [Xserves] to build one of the fastest supercomputers in the world to research the aerodynamics of hypersonic flight," Brian Robinson reports for Federal Computer Week.

"The MACH 5 (Multiple Advanced Computers for Hypersonic research) supercomputer, announced earlier this week, will use 1,566 of the 64-bit dual-processor servers and is expected to top 25 teraflops per second when it comes online later this year. The fastest supercomputer in the world now is Japan's Earth Simulator with a maximum performance of just less than 36 teraflops," Robinson reports.

"MACH 5 will cost $5.8 million to construct, a fraction of the price purpose-built supercomputers bring. The Earth Simulator cost around $350 million. Apple won the Army contract after a competition among half a dozen companies based on such things as power requirements, cooling needs and floor space requirements, as well as performance," Robinson reports.

Full article here.

Bookmark and Share

Always -- Free ground shipping with orders over $50 at the Apple Store.

Reader Feedback: = registered.
Unregistered users: Feedback from multiple usernames are subject to deletion. Off-topic and posts from suspected astroturfers will be removed.

Jun 24, 04 - 11:02 am Comment from: Matthew24

OSX plus 970PPC: Unbeatable.

Jun 24, 04 - 11:04 am Comment from: Sputnik

Yeah, but what if they get a virus.

Jun 24, 04 - 11:06 am Comment from: giofoto

Incredible throughput. It amazes me.

Jun 24, 04 - 11:06 am Comment from: Jon E Wunnut

Yeah, what Sputnik said.

Jun 24, 04 - 11:07 am Comment from: giofoto

What virus....there are none.

Jun 24, 04 - 11:14 am Comment from: webbyswim

wow! we got sputnik and one-nut posting on the same article!

hey guys, how's it going? must be a slow news day today. nothing witty yet to have fun with...

Jun 24, 04 - 11:15 am Comment from: West

Sputnik you must have the windows head cold

Jun 24, 04 - 11:16 am Comment from: G-Spank

I think Sputnik was joking, thereby showing how much better a solution this is than windows, on top of the speed to $ performance.

Jun 24, 04 - 11:17 am Comment from: numb nuts

How many super cluster gigaflops does it take to model how much force required to beat a terrorist into submission without leaving any tell-tale marks for the Red Cross inspection?

Jun 24, 04 - 11:20 am Comment from: DakRoland

This is really cool. I can't wait to hear how well it does, but hopefully it won't be Classified.

Jun 24, 04 - 11:21 am Comment from: Viridian

"Yeah, but what if they get a virus."

Then they can call Sophos.

Jun 24, 04 - 11:22 am Comment from: Al

Just buy off the Red Cross. They are always looking for donations.

Jun 24, 04 - 11:26 am Comment from: NoPCZone

Good News for Apple & IBM
When the MACH 5 and VT Clusters are online later this year Apple should have 2 of the 10 fastest Supercomputers in the world. Why is it good for IBM? IBM is already a huge player in this area and the G5 processor is an IBM product. The PPC design is making major strides these days and there will be more to come.

Jun 24, 04 - 11:53 am Comment from: Aryugaetu

No one here caught the obvious error?!
"... is expected to top 25 teraflops per second"

1 flop = 1 instruction per second
1 teraflop = 1 trillion instructions per second
1 teraflop per second = 1 trillion instructions per second per second

I hardly think the author is talking about the rate at which instruction execution is being accelerated, but was rather being ignorantly redundant.

It was correctly used later, "...just less than 36 teraflops".

Jun 24, 04 - 12:01 pm Comment from: JB

Way to go Apple! Go Army!

Jun 24, 04 - 12:16 pm Comment from: Ace

Here's an idea. They should get the Apple supercomputer to finish developing Longhorn for Windows and charge em that hefty 56 billion bank account they got.

BTW, where's my AI software? There is now the hardware out there to support massive AI but no sign of it yet. The closest we have is ReadIris for scanning to digital or a Speech Recognition piece of crap software (from any company). I'll have to do it myself, damn it!

Jun 24, 04 - 12:19 pm Comment from: Aryugaetu

By the way, talking about numbers...

Given:
25 teraflops = 25,000,000,000,000 instructions per second
Speed of light = 186,000 miles per second
1 mile = 5280 feet
A computer user sits 2 feet from the monitor

Then:
The Army's Supercomputer will be able to perform 50,915 instructions in the time it takes the light to go from the monitor to the user's eyes.

Or, to put it another way, the computer can average 1 instruction in the time it takes a photon to travel 1/2100th of an inch (.00047", .01 mm).

Technically, it actually takes 200 times this long to do 1 instruction, but it can do over 200 instructions at once. Am I the only one amazed at this feat?

Waiting for the user to type a character must seem like an eternity to the computer.

...and the speed of light doesn't seem so fast any more.

Jun 24, 04 - 12:31 pm Comment from: artiom

stop showing off lol

Jun 24, 04 - 12:34 pm Comment from: King Mel

This cluster is optimized differently than the Va Tech cluster. The MACH 5 employs a larger number of nodes but with Gigabit ethernet interconnects . Clearly the focus is on batch processing CPU intensive tasks with less massive data transfer requirements. I liken this to the SETI@Home approach where is can take six hours to work through a 350KB data packet. The Big Mac has roughly 70% of the number of nodes as MACH 5, but uses infiniband interconnection.

I am not familiar with the testing methodology for the supercomputer cluster ranking, but Big Mac may be better in some of the tests and worse in others. It will be interesting to see the results. I will also be interested in how long it takes to set up MACH 5 and start performing useful work.

Jun 24, 04 - 12:38 pm Comment from: Simple1

That information is freaking amazing Arugaetu!! thx for the info. well Apple is sure wowing the people who said they couldn't compete with the wintel powers!

Jun 24, 04 - 12:53 pm Comment from: the sputnik appreciation society

We just wanted to post our appreciation for the ironic humour of Sputnik's many posts in various threads. And what's even funnier is the response of the posters who totally don't see it and think he's trolling. Fantastic! Get a humour transplant guys!!

Jun 24, 04 - 12:59 pm Comment from: jfbiii

So when is somebody going to take the next step: spend $50 million, build 10 clusters, and cluster them together.

Jun 24, 04 - 01:55 pm Comment from: webbyswim

i do enjoy sput's and one-nut's commentary. but today is a lsow news day- thus the wit not top notch. maybe tomorrow!

Jun 24, 04 - 02:08 pm Comment from: MacBuddy

See, I made a claim one time the Sput was a sarcastic smartass. And I got 'corrected'.

Well I think he's MDN's 'Don Rickles'. Either way, his comments are far to outrageous to be taken seriously.

Jun 24, 04 - 02:21 pm Comment from: shadowself

Aryugaetu,

Just to be a stickler for accuracy

flop == FLoating-point OPeration (the original definition back in the 70s when a 12 MFLOP/s Cray-1 was considered extremely fast)

It has since become to be understood as
flop == FLoating-point OPeration per second

So really either is correct depending upon how far back you want to go.

Main point, however, is that it is specifically floating point operations, not instructions. The measure of instructions is MIPS (million instructions per second) or GIPS or TIPS. Of course there was the argument for years between IBM and DEC about how many instructions it took to perform a specific operation. The general consensus in the comunity back then was that it took two DEC VAX instructions (on average) to perform what one IBM 360/3090/etc. instruction did.

Also the PPC chip has a Multiply-Add-Fuse instruction which does several operations with one instruction.

This is why the supercomputing community tries to standardize on the floating point operations done. Who cares how many instructions or memory moves or NOPS (no-ops) are done in the process of getting the work done? What matters to them is how fast it can accurately calculate the final answer.

While I don't believe the LINPAC benchmark is a great one (my personal favorite when I was actively in that field years ago was SLALOM), it is much better than counting the number of instructions performed.

--- Just my 3 cents from an old hacker (not cracker).

Jun 24, 04 - 02:37 pm Comment from: Aryugaetu

Thank you, Shadowself, I am always willing to learn something new.

Jun 24, 04 - 02:42 pm Comment from: mike

Aryugaetu- you need help.. your posts are by far the geekiest

Jun 24, 04 - 03:05 pm Comment from: Less is More

Hypersonic means Mach 5 or more, so it's a very cool acronym, even if its application (weaponry) is less cool than "Earth Simulator." Whatever, nice coup for Apple. Congrats. Apple should build one for itself to simulate the mind of Windoze users (marketing purposes). Let's see ... hmm, a cluster of three G3s should do it and save the G4s and 5s the ignominy of dealing with that.

Jun 24, 04 - 03:17 pm Comment from: Dan

Some how 25 TF seems a bit high when you consider that Big Mac, 1,100 nodes only did 10 TF. One might expect 12 or 13 TF if performance scales linearly with node count, maybe as high as 15 with improved system performance and better optimized code.

Of course it's not entirely beyond reason since the theoretical performance of a G5 is 4 flops. So a 2*2GHz node could theoretically do 16 GigaFlops. That would make the theoretical peak of a 1,556 node cluster around 24.9 TF.

Hmm - The theoretical peak is about 25 TF but in practice it'll probably do 12 or 13 TF. Still nothing to sneeze at but certainly not as high as 25.

Jun 24, 04 - 03:17 pm Comment from: shadowself

jfbiii,

It is a matter of diminishing returns.
Except for the very few, very specific, very processor intensive tasks which can be very highly decoupled and "parallelized" (some computational fluid dynamic modelling is like this, many Monte Carlo based simulations are like this, however most applications are not) adding a second processor does not double the computational throughput. Having 2 XServes tied together is not twice as fast as having one.

Except in those rare cases, 1566 XServes are not even close to 1500 times as fast as one XServe.

Theoretical Peak Performance (TPP in "supercomputerese") for 1,566 XServes is about 37 to 38 TFLOP. I will be very pleasantly surprised if the Mach5 team reaches their Peak Performance (PP in "supercomputerese") goal of 25 TFLOP on the LINPAC benchmark. I will actually be pleasantly surprised if the PP is over 20 TFLOP.

In common supercomputer applications roll-off in additional capability is not too severe, but even at an assumed 67% effectiveness (what the Mach5 team is expecting) the addition of another 1,566 machines would take a significant hit in effective throughput.

Making the extremely gross assumption that the Mach5 team's scaling continues on to higher clustering this becomes--- in very, very approximate terms... (performance given in TFLOP and $$ in millions)
Processors TPP PP $$ PP/$$
1,566 37 25 5.8 4.3
3,132 75 42 11 3.8
6,264 150 64 22 2.9
12,528 300 94 44 2.1

Doing the same thing with VT's scaling factor
Processors TPP PP $$ PP/$$
1,100 26 10.5 5.4 1.9
2,200 53 15 10 1.5
4,400 106 19 20 0.9
8,800 211 26 40 0.6
17,600 422 36 80 0.4

In reality the Peak Performance (PP) numbers in these tables is probably overly optimistic.

True, even the VT cluster scaled up to beat the Earth Simulator is much less expensive than the Earth Simulator was, but you can see that the roll-off in performance is significant compared to the more modest 1,000 to 2,000 machine systems.

Jun 24, 04 - 03:35 pm Comment from: Jon E Wunnut

For a bunch of tree huggin hippies, you guys are very hurtful.

Jun 24, 04 - 03:40 pm Comment from: Nobody

<i>"Hypersonic means Mach 5 or more, so it's a very cool acronym, even if its application (weaponry) is less cool than "Earth Simulator." "<i> - Less is More

Not necessarily. Once the military developed this technology to connect computers together. Then, it became the Internet. The same thing happened with a bunch of satellites to pinpoint the enemy's and military's locations. Today, GPS is one of the most important navigational tools.

Science is science. It's neither good nor evil. Once upon a time, NASA dreamt of making a hypersonic airplane to reach Mach 25. While the project died, I think the dream is still alive and one day perhaps, one can travel to the furthest corner of the earth in an hour. When that happens, probably research like this one contributes a lot to it.

Jun 24, 04 - 03:49 pm Comment from: tom

I'm sorry, I lost track of all the flops somewhere. I'm still trying to count all the viruses, trojans, worms, etc. released for windoze machines so far this month.

wink

Jun 24, 04 - 03:51 pm Comment from: iSteve

I think Apple should give them a buy one a regular price get one free sale price - basically doubling the size. Screw #2 - go for #1. The few million dollars it would cost Apple is cheap advertising.

Jun 24, 04 - 04:44 pm Comment from: Yuk

Yeah BUT. Apple has always had the mantality of buy two for one..In other words, charge you two times for one...

i think this would be a great system for M$ Windows HPC Edition smile

EEEeeeeekkk!!

Jun 24, 04 - 04:44 pm Comment from: Yuk

Yeah BUT. Apple has always had the mentality of buy two for one..In other words, charge you two times for one...

i think this would be a great system for M$ Windows HPC Edition smile

EEEeeeeekkk!!

Jun 24, 04 - 04:45 pm Comment from: Jayplus

iSteve, I second that. Nothing would be better if Apple could claim that the #1 supercomputer in the world is a cluster of OSX/G5 machines. IBM would eat it up! Take that MS! Screw you!!

Jun 24, 04 - 05:27 pm Comment from: Less is More

Your point, Nobody, is what? The acronym is not cool? Simulating hypersonic projectiles is cooler than simulating natural phenomena? Did I say science was good or evil? The knowledge gained from any kind of research may lead to advances in other fields, such as GPS, as you say, but it dudn't have anything to do with cool [a subjective term if ever there was any]. I'd rather simulate the flight dynamics of SpaceShipOne's feathered wings at high altitude than how a hypersonic projectile penetrates various surfaces. I find one activity cooler than the other, even if from a scientific standpoint, both may be cool for you.

Jun 24, 04 - 08:34 pm Comment from: Sol

5.8 million dollars is not a lot of money for the US Army or any of its contractors. This MACH 5 system sounds like an experiment to test the viability of an OS X cluster. If it delivers the goods then bigger and better systems will propably be built with XServe G5s.

Jun 24, 04 - 09:33 pm Comment from: Nobody

My point is, the application of MACH 5 is to do scientific research. Just because it's done by the military and you don't see the peaceful purpose for it <i>now</b>, it doesn't automatically less cooler than research done by non-military. You may have a distaste of anything done by the military, but lots of state-of-the-art technology used for the benefit of the public now are originated from military research.

Jun 24, 04 - 09:43 pm Comment from: Joe McConnel

blah blah blah........where is the headless g5 imac?

maybe a dual g4 headless imac?

anything that is worth buying? +$2k cheese graters and +$3k xservers aren't it. So says the non buying public.

Jun 24, 04 - 10:30 pm Comment from: sjk

Science is science. It's neither good nor evil.

"Technology is a whore: it doesn’t know how to say no"

Jun 24, 04 - 11:13 pm Comment from: AjaxBruno

Sol it right, 5.8 mil is chump change for the Army. They just ordered helmets to the tune of 80 million for 230,000 units. Not that helmets are not important technology; I'm just pointing out that the Army certainly knows how to spend when they've still got unwritten checks in their book.

Jun 25, 04 - 01:30 am Comment from: Less is More

Nobody,
If I remember correctly, the Germans and Japanese did a lot of scientific research in the big war with Jews and Chinese; as did the US with its above-ground nuclear tests in the Pacific. I just dropped a casual comment two posts ago that certain fields of research were cooler than others. I didn't imply anything about the ethics or value of the various fields of research. You just assumed that I was injecting some editorial content, and responded to that assumption. Maybe you are overly sensitive to that topic or you don't read properly. So let me put it this way:

I'd rather do research on how the instinct to propagate the human species ~ to reproduce, to mate ~ affects adolescent behaviour than to study the effects of varying fiber intake on bowel movement. For me, one is cooler than the other. Note I didn't say necessary, valuable, ethical, equivalent, important ... just cool.

Jun 25, 04 - 05:52 am Comment from: Luke_in_Oz

Aryugaetu:

...and the speed of light doesn't seem so fast any more

ummm, not that up with my qantum physics, but I'm sure that no one has proven that ANYTHING can travel faster than light? (And if they have it is only a theory - yet to be proven)

I'd say that the speed of light is still the ultimate measure of speed of a "thing".

When talking of instructions, they are talking about the "volume" of information, not the speed.

I have limited knowledge of physics, so may I explain my point with an analogy:

Say I pass a piece of paper to you in 1 sec, and it has 1000 words on it. I now pass you a dictionary in 1 second, it has 40,000+ words.

The speed is the same - the amount of data is significantly larger.

These PÇ's are not doing things FASTER than the speed of light, they are simply passing more "words" (instructions) along a path at a given speed. I'd say that is the reason for terms such as bandwidth etc.

I'd love to know if Apple has succeeded in proving Einstien (and many other great minds) incorrect!

Just my 2 cents. It doesn't make it an any less amazing achievment, I'm just loathe to credit Apple with redefining the physical laws of of the universe as we know it.

Cheers,

Luke

Jun 25, 04 - 08:17 am Comment from: Sal

shadowself wrote:
"Theoretical Peak Performance (TPP in "supercomputerese") for 1,566 XServes is about 37 to 38 TFLOP"

The PPC970 has two independent floating-pont units. Each FP unit can execute a multiplication and addition simultaneously (fused multiply-add, i.e. something like a := a + b * c)
At 2GHz, each PPC970 can thus execute 4 billion multiplication and 4 billion addition operations per second.
The TPP of a 1,566 dual-CPU 2GHz Xserve cluster is thus 1566*2*8 ~= 25 TFLOPS

Jun 25, 04 - 09:15 am Comment from: Less is More

The speed of enlightenment is related to the level of intelligence of the subject.

Jun 25, 04 - 10:59 am Comment from: Nobody

Point taken.

Jun 25, 04 - 11:44 pm Comment from: shadowself

Sal,

Ah but you forget the Multiply-Add-Fuse instruction -- two FLOPs in a single clock cycle.

Also it is possible to do 64 bit floating point (not easy, but possible) in the vector processor.

After taking these into account, I stand by my TPP number for the XServe. This is "Theoretical" Peak Performance after all.

Jun 26, 04 - 02:45 am Comment from: Sal

shadowself:
> Ah but you forget the Multiply-Add-Fuse instruction -- two FLOPs in a single clock cycle.


Let's see: fused multiply-add (a:=a+b*c) two FLOPs per pipeline clock cycle.
Execution pipeline clocked at 2GHz.
This means 4GFLOPs (2GHz times 2 FLOP/cycle)
There are two independent floating-point units per CPU
Peak performance per CPU is thus 4GFLOPs times two = 8GFLOPs

A cluster of 1566 dual-CPU Xserves contains 3,132 CPUs.
The theoretical peak performance is thus... drum roll ... 3132x8 ~= 25TFLOPs


> Also it is possible to do 64 bit floating point (not easy, but possible) in the vector processor.


Possible, but it would be quite cumbersome and slow.
AltiVec natively supports 32-bit floating-point numbers, with a 23-bit mantissa.
A 64-bit FP number has a 52-bit mantissa. To maintain precision, most any FPU with 64-bit support -- be it from AMD, Motorola, IBM, Intel, Sun... -- computes intermediary results with 80-bit numbers before reducing them to 64-bit. Don't expect much performance piecing together e.g. 32-bit floating point instructions and arithmetic shifts in a vector processor to try to construct a 52-bit or 64-bit intermediary mantissa...


> After taking these into account, I stand by my TPP number for the XServe

I'm afraid your numbers are irrelevant.

Aug 17, 04 - 06:07 am Comment from: zubro

To my knowledge, internet started in Geneva
http://public.web.cern.ch/public/
to connect scientists together as the ring of the accelerator was so large...
I might be wrong..

Reader feedback page 1 of 1 pages:

Always -- Free ground shipping with orders over $50 at the Apple Store.

Add Your Feedback:

Register or Login

Name:

Email: (optional)

Emoticons | Allowed HTML Tags

Remember my info   Notify me of follow-up comments?

Please enter the "MDN Magic Word" you see in the image below: