It’s no secret that the bulk of time running simulations is spent on the testbench side. You can throw as many cores as you want at the DUT when you’re running RTL tests, but let's face it, it’s the testbench itself that’s bottlenecking you. Let’s say it takes thirty minutes to run the testbench side, and thirty minutes to run the tests themselves. You can use multi-core parallelism to massively shrink the DUT time, but at the end of the day, those RTL tests will still take at least thirty minutes per iteration.
Without changing how the testbench itself runs, simulation time can only ever asymptotically approach the testbench time. This is true for the GLS stage too. With a thirty-minute testbench time, and, say, five hours and thirty minutes of tests, you’re looking at a total of six hours per iteration—six times longer than our RTL example. In that RTL example, using multi-core parallelism over an arbitrarily high number of cores puts you close to a 2X speedup. For GLS, an arbitrary amount of multicore parallelism could, in this simplified example, bring you down to roughly a thirty-minute runtime, which is equal to about a 12X speedup. Sounds great, right?
While that appears to be a huge amount, you’ve got to factor in that the testbench side doesn’t care how many cores you’re running the tests on—you can’t parallelize those processes, and that’s going to take thirty minutes to run per iteration regardless of how many cores you have. You still can’t reduce the run time to less than thirty minutes.
Once you get to the DFT ATPG side of things, the actual tests being run are enormous. Fault analysis takes ages—in this case, around three weeks, which is functionally eons in dev-time terms. Running those tests on multi-core shrinks that down a lot—but the DFT ATPG step isn’t why you’re sweating about what put your schedule behind the deadlines. The real issue lies with the GLS section and that obnoxious little half-hour of testbench time that has to be re-run over and over again.
With modern SoCs, the issue becomes even more complicated. Chips nowadays have to be tested with full-chip, comprehensive real-world use cases. Consider, for instance, the use case of viewing a video while uploading it. On the surface, it seems fairly simple, but when you move down to the system level, lots of things are happening: the video buffer has to be converted to MPEG4 format with a certain resolution using whatever graphics processor is available, then that has to be transmitted through the modem via a communications processor—and while that’s happening, it has to be decoded using another available graphics processor so the video can be displayed on the screen.
It takes a very heavy testbench to push all those events—so wouldn’t it be great if that block of testbench time could get shrunken down, too?
Perspec Can Help
Luckily, the Cadence Verification Suite has a tool for that. Enter Perspec: the tool for creating use-case C tests. Most designs nowadays come with integrated cores, sourced from third parties. You already know those cores work, since they’re sourced from third-party vendors—so tests aren’t really written for them. Perspec leverages those internal cores to drive the test activity from within the design—and this activity can be accelerated as part of the DUT while at the same time off-loading the compiled code simulator by making the non-acceleratable (NACC) TB portion much smaller. This allows those embedded cores to do the work of pushing the tests instead of the testbench, which takes a huge amount of data off the buses between the testbench and the design. By utilizing the CPU to exercise the design, Perspec effectively turns your real-world testbench with embedded cores and turns it into a lightweight testbench—which means you can actually cut that testbench time down to size, and maybe panic a little bit less when you start into the fault detection phase.
Before, when using multi-core simulation, you were stuck running the testbench on a single core, because only a small amount of the testbench code can be parallelized. This gave rise to the “two knobs” notion mentioned in an earlier blog—where one “knob” was the number of single core machines dedicated to improving the efficiency of that side, and the second “knob” was the number of cores a given DUT was parallelized over. Engineers could only adjust that first knob so much since machines are so expensive. Thanks to Perspec, though, the use-case tests can reduce the execution time without simply throwing more machines at the problem—which leaves more resources to twist up the “cores” knob.
To look at the Chalk Talk for Perspec, check here.
To see how others are using Perspec, look here.