Thursday, May 25, 2006

Bandwidth: the Final Frontier

Bharath said I should write about why I think FPGAs are being seen in more new networking devices. Not having much exposure to this use of FPGAs, I will instead attempt to address another area where these amazing devices are seeing ever more use: high speed DSP.

Barrier Traditional DSP manufacturers (like ADI and TI) have long since realized the limitations of the usual von Neumann architecture that's so common in commodity computer systems, the most important one (from a DSP point of view) is memory bandwidth. The single bus over which both instructions and data must be fetched is the stumbling block when it comes to systems designed to crunch vast amounts of data down to manageable chunks. One method through which this is attacked involves the use of small blocks of high-speed cache memory, however this approach does not help DSP applications when the data must be fetched from a high-speed sensor and constantly floods the cache, negating it's benefits. These factors all resulted in the familiar DSP chips of the 90s, the ADI SHARC and the TI C6xx series. The ADI design, in particular, is interesting in that they rely on all the main memory being a high-speed type residing on-die, typical SHARC designs did not use external RAM. This guarantees single-cycle execution and the benefits of the Harvard architecture (simultaneous data/program access), without the associated explosion in pin count (back in the days when BGAs were as exotic as Leprechauns). However, once the data is inside one of these chips, options are limited by the number of ALUs and multiplier units, usually not more than 5-10 operations per clock cycle.

Now that we're used to desktop CPU speeds in the several-GHz range, a 40 MHz chip seems quite puny. However, it's still enough to perform quite complex DSP tasks like FFTs, and this is indeed what DSPs were destined to do most of their lives. One thing they could not do well, though, is very simple processing (eg: an FIR filter) at extremely high sample rates (eg: 50-100 MHz). In the few cases where it could be done, the memory bandwidth would be strained to near breaking point and the DSP cost could not justify it's application to such a 'trivial' task. The gap got filled in by ASSPs like the famous Graychip GC4016, a digital downconverter. These take the vast amounts of data spewing from ADCs and crunch it down for DSPs. The price you pay though, is flexibility. Replace the ASSP with an FPGA and you now have a device that can reduce both the vast quantities of data as well as complex algorithms required to further process the reduced data. What's more, you are now free to reprogram the data reduction filters as your design changes (not always a good thing, though). These design changes can include simply slapping on an extra bank of memory on some unused I/Os if you find you don't have enough memory bandwidth. This is simply not an option with most traditional CPUs and DSPs, which are limited to one (or at most two) buses. Exceptions exist, as we'll soon see.

Desktop chips have attacked the DSP problem in several ways. The most obvious is inclusion of vector instructions like MMX, SSE and AltiVec, which addresses the ALU/multiplier restriction. Cache control instructions can be used to set up "Fences" in memory, the fenced-off portions are not cached. The idea is you can set up your sensor to DMA data into a fenced region of memory. The CPU will then not bother trying to cache this, since it'll anyway only be accessed once. This prevents flushing of more important program variables. Still, in very high speed applications, a CPU will still spend most of it's time reading data in and out, not performing calculations. ASICs and FPGAs, however, are designed to stream data through, preventing this phenomenon. Simple example: a radar receiver FPGA I designed runs at a rather tame 80 MHz, but performs 1920 million multiplies/sec since it runs 24 multipliers in parallel. Few CPUs would be capable of sustaining this performance (peak performance doesn't cut it) or getting the memory bandwidth required to do it. Even if they could, it would be a tremendous waste of a desktop CPU to simply have it do something like digital downconversion.

One promising technique desktop CPUs have started to adopt is on-die memory controllers. Now, memory bandwidth scales with an increase in CPUs. This gives an additional degree of freedom to the system designer to increase the available memory bandwidth.

How does all this tie in with networking applications? Well, like I said, I'm not the best person to comment, but I can see some parallels in the requirement for vast bandwidth in and out of the CPU in line-speed applications. FPGA costs have been getting ever smaller (thanks to our good friend, Gordon Moore :-), and inclusion of hard-IP like Xilinx' Rocket-I/O is going to make FPGAs a more viable alternative to NPs, especially for multi-gigabit applications and switch-fabrics.

Monday, May 22, 2006

Flexibility, a blessing?

The electronics industry at large sort of revels in the fact that it's flexible and ever changing, that we can build products that, with the click of a mouse, turn into something completely different. I sometimes wonder, though, if this is entirely a good thing, at least for the blokes who make these things.

My argument is simple. Ninety nine percent of electronic products today contain some form of programmable element (usually a microprocessor), which sits on top of other hardware. While flexibility in the form of a re-flashable microprocessor certainly helps everyone (most importantly, the end user), the MPU has traditionally relied upon a layer of bedrock, the hardware on which it runs, which remains relatively unchanged. These days, though, programmable logic has changed much of this, and we have systems where the hardware, a previously impermeable and immutable layer, now shifts around like quicksand beneath developers' feet. This hardware-du-jour phenomenon has made designing embedded systems doubly difficult, with constant bickering between the hardware guys, the software guys and (oh horrors!) management. Instruments designed with such an arrangement must now additionally tag all data with a 'hardware version' number, sometimes one for each chip the data passes through.

On the other hand, it can be a good thing from the "the more data the merrier" school of thought. No, I don't mean more work, I only mean that the more data you preserve (or the closer you preserve data to a sensor), the better you can process it offline. The only trouble, of course, is bandwidth.

Debugging such an application can invoke some rather extreme displays of hair pulling, mostly when you need to figure out if the problem lies in software or hardware. I have a hard enough time doing it alone, I wonder how these guys at various companies with 'hardware teams' and 'software teams' see eye-to-eye :)

In the end, though, decisions postponed in the name of "flexibility" are eventually done so out of sheer laziness, and the hope that someone else will pick up the slack. Eventually no-one does, and another deadline whooshes by. When will it stop??? :-(

Friday, May 19, 2006

The Trouble with Perfection

After another few busy weeks at the radar site where I work, I've come to the conclusion that so many others have come to before: perfection is a moving target and not really worth it in the end. Our task is a formidable one, by any standards: take a research radar that's mostly in pieces, put it together for a rather large multi-organization project, and oh yes, do it in two months or so. All this sounds easy enough, except that there's only two engineers and a tech working on it. Tensions ran high, as component shipments got delayed, cables broke, computers refused to boot, FPGAs started dancing to the beat of their own drummer... our normally-pessimistic site manager had a field day on this one!

To top it all, there's always this sense of aiming for "perfection", where things have to look and work "just right." Words that no self-respecting engineer should ever use, this stuff is strictly reserved for upper management and marketing droids. Unfortunately, yours truly (engineer, don't know about the self-respecting bit) started aiming a liiittle too high and hoped to build, test and validate a working radar receiver FPGA with all sorts of fancy features in about... heh... a week! Surprise surprise, after days of simulations and a 45-minute PAR run, I got a bitstream of pure junk. With only a day left, I decided it's best to drop the whizbang features and go back to last year's design, with tweaks. By some good fortune, Neptune was ascending when I did my last PAR run, and I have a working receiver :-) Lesson learned, sometimes aiming for mediocrity has it's benefits, at least it meets deadlines!

Well anyway, now you know why there's not been much by way of posts here. As if you cared...

In other news, Xilinx has finally decided to update their tried-and-tested architecture based on 4-input LUTs, they've released the Virtex-5 series with 6-input LUTs! Man, this is just what I've been waiting for, for someone who eschews synthesizers and likes to place BELs, this is a dream. I only wish Maxwell's laws would make it possible to get these in non-BGA packages so poor schmoes like me can actually solder them :/

After all the excitement, I did manage to squeeze in some telly, and, heh, I watched re-runs of "Mad About You" Fun stuff.

Friday, May 05, 2006


Tick tock!Time. The final frontier. The one thing we can almost simultaneously have too much and too little of. Both equally tragic. There's times when I think I'd never see the end of something, can't wait for it to be over, wishing I could reach escape velocity and break free from time's grasp. There's other times I find yourself mulling over "if I spend two minutes a day waiting for a bus, that means I waste a whole hour every month. A whole hour!!! DAMN!!!". For completely different reasons, though, there are times when I wish time would stop whooshing by, to smell the roses, as the saying goes. Once in a while, I get a chance, and guess what, I smell manure instead of the roses that grow in it. Bleah.

Tick tock, tick tock. Every minute I sweep under the rug is a ticking little time bomb, and the ticking's driving me nuts! Where's that light at the end of the tunnel when you need one?

This post is brought to you by Bleary Red Eyes incorporated. BRE, where the insomniac workaholics (and slave labourers) go. Crunchy goodness in every bite.