Intel i860
I was going through some old boxes in the attic the other day, and I ran into a tray full of these.
That’s a chip called the i860. Intel made them from about 1989 until the early ‘90s. If you Google the 860, you’ll find a lot of complaining about how hard it was to program and how it never lived up to its theoretical performance numbers. I actually loved programming the 860. When you figured out how to use it, it really was fast, and programming it was always an adventure.
The basic layout of the 860 was this. It was LIW architecture with two pipelines. One pipeline was for an integer ALU and the second for a pair of floating point ALUs (an adder and a multiplier). The floating point units could also be used to do multiple, simultaneous 8 bit integer operations (like MMX in the Pentium). The pipeline in the floating point unit was 4 instructions deep, and that pipeline was completely exposed. What does that mean? Consider this example (from an early edition of H&P CA, AQA):
PFADD.SS F4, F2, F3 PFSUB.DD F10, F8, F6 PFMUL.DD F16, F12, F14 PFADD.SS F19, F17, F18 PFADD.SS F22, F20, F21
At the end of this, register F22 contains the result of adding registers F2 and F3. Wait, say what?!?!
That first instruction started an add going through the floating point adder. The second instruction started a subtract in the same unit and advanced the pipeline. The third instruction started a multiply going in the floating point multiplier, but didn’t advance the adder’s pipeline. The fourth instruction started another adder going in the first unit, and advanced its pipeline. The final instruction starts another add (F20 + F21) on the first unit, and places the result coming out of that unit into the destination register (F22). Got it?
To make things more complicated, you could do muladds which connected the two units. When you did this, you had complete control over how the two units were connected. You could decide on the fly that you wanted the output of the adder to go into an input of the multiplier or vice versa.
In addition, each of those lines could have a ‘;’ and another instruction for the integer unit. The resulting assembler could be confusing and hard to read, but when you combine all of that with a blazing (for the time) 40Mhz clock, then you can really push some pixels.
We used these chips at Stardent to build a small desktop machine. The code name for the project was Warbler. A couple of us started it in a backroom because the company’s other small desktop machine (Stiletto) was running way late and was looking like it was too complex to land before the company ran out of cash. We needed a backup plan.
Warbler was a really simple design. There was a motherboard with an 860 on it. On top of that sat the AGC. That was a simple 24 bpp frame buffer. Here’s Kerry holding the first one.
On top of that sat the AGM. This card had two more 860’s which acted as slaves to the one on the motherboard. This was basically the GPU in modern terms. Each of the processors owned every other scanline. Here’s Andy holding the first AGM.
The clever part of the design was the way we did the memory management on the AGM. All of the memory in the system was on a single bus, but there were bus splitters which could isolate each of the AGM’s 860s and some local memory. Each of these two processors could operate independently drawing into the scanlines it owned. When one of those 860s took a page fault, it sent an interrupt to the one on the motherboard and put itself to sleep. Then the main processor would close the bus splitter, swap pages of memory around, update the slave processor’s TLB, and then open the bus splitter and start the slave processor again. This meant that you had virtual memory all the way down to the rasterization stage! That made it a pretty sweet environment for writing graphics code.
While Kerry and Andy were putting these cards together, Abe, John, and I were cranking out code. First we took the C version of the graphics pipeline from one of our earlier machines and compiled it for the 860. The compilers weren’t able to take advantage of all of the features of the 860, but it gave us a starting point to work from. Next we would start picking out hotspots and rewriting them in assembler. After a while we got pretty good at it. Within about 6 months we had an the entire pipeline running including features like texture mapping, antialiasing, volume rendering and depth peeling. I used to describe programming the 860 as doing crossword puzzles for a living because of the way you had to get the different units and pipeline stages lined up to get things to work. There were days where I’d get 10 instructions written and go home feeling like I’d accomplished a lot. We don’t have to worry about those kind of details today, and we can certainly write code faster, but I’m not sure that I really enjoy it as much as I enjoyed programming the warbler.
We did manage to ship the machine in May of 1991 (less than 3 months after the first hardware in those pictures!), but it wasn’t enough. We sold quite a few of them (as the Vistra 800), but we ran out of money and closed the company in the fall. A couple of us went to Tokyo for awhile as we shut things down and sold pieces off. The last 150 warblers went to sit in a warehouse at Logan to wait for their own trip to Japan, but that never happened. After a year or so, John and I got a call asking whether we wanted them. I don’t know what happened to them in the end. I still have one, but it’s been a long time since I booted it up.