Sooty Solutions: Consulting to Business Managers on Information Technology and Security

Sooty Solutions - Burnaby BC Consulting Company - Advising Business Managers on Security, Information Technology, Business Process Performance, & Best Practices

Business and Technology Trends

Trend Articles

What is Delaying the Adoption of EPIC Computers Like Itanium?

The first computers were nothing more than a few hundred gates so all they could implement was a "reduced instruction set". As technology improved the instruction sets grew and we entered an era of CISC "Complex Instruction Set Computers". The problem with CICS is they need a lot of circuitry to decode their instructions so designers went back to basics and built modern "Reduced Instruction Set Computers" or RISC. Ten years ago the battle between CICS and RISC was supposed to end and the winner was supposed to be EPIC computers like the Intel Itanium.

Simple Computers

When computers were built from simple components (tubes, transistors, and simple integrated circuits) every "bit" was expensive. Every component also added its own probability of failure so large complex designs had failure rates too high to make them commercially viable. Instruction sets were very limited and commercial computers would have only 10 to 20 instructions. Programmers understood these instructions so well they remembered the hex values and could write in machine code. Instruction sets grew in size and complexity as the semiconductor industry learned how to put more transistors on an integrated chip.

Simple computers have not gone way, they are now called embedded computers and they live on in TV remotes, microwave ovens, automobiles, toys and practically anything with a keypad. Soon these simple computers will show up as RFID "smart dust" that replace bar codes on everything you buy. Programmers who can program in hex will never be without a job.

Complex Instruction Set Computers (CICS)

Moore's Law drove Large Scale Integration (LSI) to Very Large Scale Integration (VLSI) and soon there were several million transistors available on the CPU chip to implement more complex instructions. Memory chips also used the same technology but several million transistors only provided systems with a few megabytes. Memory systems were slow because they needed lots of interconnected chips for capacity and they needed to spread them out to keep them cool. CPU designers had to be very careful with memory use and when they fetched an instruction from memory they wanted it to do a lot before they had to get the next instruction.

The solution was a large, complex instruction set. Current instruction sets are so complete that simple programming languages like C or Fortran only need one or two CPU instructions for every line of high level language. The down side is the instruction set is too large for the average programmer to remember so to get the most from a CICS you need a high level language compiler. Programmers writing compilers have to know all the instructions but everyone can forget how to program in assembler or machine code.

Microcoded CPUs

The complex instructions required every more complex decode and execution units on the CPU chip and just designing and testing all the connections became unmanageable. The solution was a CPU within a CPU; a simple CPU with its own memory and simple instruction set that implemented the complex instructions. The instruction "add register one to register two" becomes a little program that runs on the little CPU on the chip. This way designer could add new even more complex instructions by just writing little programs in the microcoding language used by the little CPU. All CISC chips are now microcoded and some (not Intel) will actually let you write and load microcode so you can add new instructions to the CPU.

Reduced Instruction Set Computers (RISC)

When advances in semiconductors increased the capacity and speed of memory chips the memory system was no longer a bottleneck. Now CPU designers could throw away all the microcoding circuitry and drive the little microcoded CPU directly. More important without all that extra circuitry they could make the chips smaller and increase the clock speed. Early RISC systems had clock speeds three or four times faster than CISC systems. They needed more of the simple instructions to do useful work but even with the extra memory fetch cycles the computer system was still two or three times faster for the same price.

Pipelining was one more advantage of a RISC CPU. Since there are a limited number of simple instructions the program flow is more predictable and you can do the fetch, decode, execute functions of the CPU in parallel. This depends on the fact that most programs have long stretches of code without any branching. If you hit a branch (is z = 0 ?) you have to flush the pipeline and figure out which path to take before you start pipelining again.

The RISC designers have one more advantage; since the instruction decode and execution units are simple you can put a bunch of them on the CPU chip. When the program hits a branch you can just load both paths in parallel and throw away the results from the path not taken. As the RISC CPU reads ahead it creates a tree of these code branches. Some CPUs have enough extra units to manage a tree with four or five branches before they have to wait for the results of the first branch. Keeping track of this branching table and the status of all the units requires a little administrative CPU. At some point the administrative CPU is doing more work than the user's program so this strategy has a limit.

RISC vs. CISC a Battle of Heat and Light

If there were no other limits a RISC CPU would always be faster than a CISC CPU for a given semiconductor technology. At higher clock speeds there are serious problems with providing clean power and removing heat from the active areas of the chip surface. When the clock speeds get to the gigahertz range the speed of light starts to limit how long a signal takes to cross the chip surface. With these limits you cannot just increase the clock speed for a RISC CPU to give it an advantage over a CICS CPU. At very high clock speeds the RISC CPU is forced to run at the same speed as the CISC CPU and the RISC advantage disappears entirely. While the battle raged the CICS designers were not sleeping, they were optimizing their microcode so common instructions had fewer lines in their microprogram and they were implementing pipelining. The net result is CICS CPUs have the advantage at high clock speeds.

Explicitly Parallel Instruction Set Computers (EPIC)

The battle between RISC and CISC had created a number of advances in CPU design that made EPIC CPUs possible. The extra decode and execution units in RISC CPUs were able to do parallel actions. The user of the RISC CPU only saw one thread of execution but the little administrative CPU on the RISC chip was managing multiple parallel threads.

What if the administrative CPU on the RISC chip ran microcode like the little CPU on the CISC chip? The user of the CPU can explicitly control all the parallel units from a high level language. In one clock tick the CPU can be executing several independent threads of parallel activity. This means raw clock speed is no longer the only limit to raw CPU speed. The CPU speed is now the clock speed times the number of parallel threads or "issues".

A cheap six issue one gigahertz CPU can run as fast as an expensive six gigahertz CPU. If you want more parallel threads you just add more units. If you have a hundred floating point units on the chip surface you could organize them as an array processor and your Excel spreadsheet runs blindingly fast. If you have a hundred bit-blitter units you can move pixels around on your computer screen faster than taking the time to load the problem into a separate video card.

So When Do I Get One?

If EPIC computers have so many advantages why don't I have one on my desktop? The simple answer is they are a very difficult to build and program properly. In 1989 the now defunct Digital Equipment built a 64 bit EPIC computer called an Alpha. They had the hardware for two years before they could to get a compiler that worked. When they started shipping Alphas in 1992 they would only do one "issue". By the next year they had a two issue compiler, and suddenly all the old Alphas ran twice as fast. Both HP and Intel had EPIC design teams but Digital was well ahead of the pack in shipping a commercial product. In 1994 HP dropped their EPIC project in favour of the Intel IA-64 design. In 1997 as part of long painful decline of missed deadlines and problems trying to build faster EPIC designs Digital sold its Alpha technology and chip foundary to Intel for $700 million. Intel in still makes six issue Alphas but HP stopped selling new systems after October 26, 2006.

If Intel had Alpha technology in 1997 why aren't they shipping EPIC computers in volume in 2004? The major reason is they thought EPIC was easy and they wanted to ship a "made in Intel" design. Like Digital, Intel found it was harder than it looked. The problem is an Itanium is more than just a bunch of decode and execute units - it is a very complex machine. The latest generation of Itanium 2 systems are finally showing the promised performance.

Meanwhile AMD has added 32 extra bits to the old IA-32 architecture and are selling 64 bit 8086 chips. It looks like Intel has given in and has done the same and added 32 bit to their Xeon chips. It is very likely the distraction of 64 bit 8086 chips will once again delay volume shipments Itanium chips.

2005 Update - Cell Architecture Trumps EPIC

The main advantage of EPIC computers is the number of "issues" or threads of activity they can run. An eight issue EPIC with lots of work packets to keep it busy could run eight times as fast as a single threaded RISC or CISC chip. Now IBM, Sony and Toshiba have announced a Cell processor technology that gets multiple treads of activity from slave CPUs managed by a modified PowerPC master. The PowerPC actually runs two treads of activity and the eight slaves (called SPEs or Synergistic Processing Elements) can each run one thread. The PowerPC is only responsible for starting and stoping activity on the SPEs so in a well behaved application the PowerPC threads can also do useful work.

One ominous feature is that each of the SPEs has its own serial number (GUID for global unit identification) which it will need to do GRID processing. GRID processing allows computational work packages to be scheduled on any CPU registered with the GRID. The result is a massively parallel computer which could be scattered across the internet (like the SETI project). Registration requires CPUs have a unique identifier so each of SPEs has its own serial number to register with the GRID. Of course these serial numbers could also be used to register software licenses or even track the activities of the owner. There was a huge protest when Intel started to put serial numbers in its pentiums chips - so much so that Intel backed down and removed the serial numbers.

For a detailed look at Cell on Wikipedia provides an excellent analysis. The chips (with only two SPEs) should first start showing up in Sony Playstation 3 units under the Christmas tree in 2006. Games developers and GRID academics are likely to get Cell workstations first and news about Cell Linux should start to appear as a regular headline.

2006 Update - More Delays for Intel EPIC - Cell Pulls Ahead

The Itanium project remains stalled at a two issue core. The Tanglewood project to create a four issue core has been delayed until at least 2008. As another sign of bad news the Tanglewood project has been renamed Tukwila. Troubled projects often try to reinvent themselves with a name change long before a product release and this appears to be the case with Tanglewood.

On the Cell processor front Linus Torvalds has included Cell support with the Linux kernel version 2.6.16 so it should quickly show up in your favorite Fedora, Debian, SuSe distributions. With the Cell being multi-threaded, cheap, simple and supported by many vendors it probably means the EPIC design that started with the DEC Alpha will fade into the museum of dead end architectures.