The Current State
It seems to be a fact of life that software has bugs and, unfortunately, our software is no exception. In most cases, however, it is not the bug itself that causes you grief. Rather, it is the fact that analysis, workaround, and shipping the fix to you sometimes takes a long time, and requires a lot of deep interaction between the user and Cadence R&D developers. Although in an ideal world, the process sounds simple, in reality there are a few hard problems involved here:
- The secrecy issue: The tools, especially in the EDA world, are working on proprietary data and designs. These IP cores are a company's keys to success in the future–it's their magic sauce. Therefore, it is almost impossible for Cadence R&D to routinely obtain source files for a customer's project in order to re-create the issue. By the same token, we do not bring our source code into the customer environment and debug there. An additional obstacle is that Cadence might not even be legally allowed to obtain access to parts of the design since it is 3rd party IP and shared with the customer, but not with Cadence.
- The size issue: SoC chips today have grown to be very large, and require complex flows to manage all the dependencies that have been developed by almost every customer. It is very hard to collect all the dependencies correctly and recreate the same setup/environment or flow within Cadence.
- The manual effort issue: Building a test case manually may take days to weeks of effort of a skilled person (either our engineers or our customer’s time) who needs to spend his or her precious time to solve issues #1 and #2.
One way to solve these issues is to make the failing environment as small as possible and remove all unnecessary design parts that are not strictly needed to recreate the issue. A tool which can automatically perform such a task gives the debug engineer a much better handle with which to deal with the tool issues and to fix them within a reasonable turnaround time. Furthermore, an automatic approach is easily accepted by customers since no engineering resources are blocked and the often seen cycle of "we need a test case –vs. we have no time to create one" deadlock is broken.
Test Case Optimizer (TCO)
TCO is a small, generic utility that is shipped in recent IUS (11.x and up) installations, and allows you to automatically create a test case from a failing flow invocation. The term generic here means that it does not require any special tool support, design style, specific options, or that it can only deal with special kinds of failures. TCO attempts to strip down the input of the failing flow to the bare minimum, while still exposing the original tool failing issue. The important point is that TCO only preserves the failure signature and is otherwise free to remove any functionality/legacy information present in the input data. With this approach, TCO has proven very successful on stripping down simulation source files in languages such as SystemVerilog, Verilog, VHDL, C, and SystemC. A reduction of the overall source input data size by more then 99% is common. Since TCO does not preserve functionality (other than the same crash signature) most, if not all, legacy code is typically removed from the input. Since the remaining code is typically free from proprietary sections, the result of a TCO run can be shipped to Cadence without exposing IP information. Very often TCO can remove so much from the input that complex flows just end up in one or two plain tool invocations which illustrate the original failing issue!
And, best of all for the engineer, TCO runs automatically after a short setup and will report back when finished.
A Simple TCO Example
The following example uses a single, small IUS internal error and an old (9.2) version of the simulator to illustrate the flow, but TCO is known to handle multi-megabyte test cases.
module sub#( int MAXPORT = 4 ) ( input bit clk, input logic a, output logic chk, input logic [31:0] p[MAXPORT:0] ); always @(posedge clk) chk <= a && !$past(a); endmodule module test_m(); localparam MAXPORT = 8; bit clk = 0; always #5 clk = ~clk; logic port; logic chk; logic [31:0] count1[MAXPORT:0]; sub#(MAXPORT) sub_1( .a( port ), .p( count1 ), .* ); initial begin @(posedge clk); port <= 1; @(posedge clk); @(posedge clk); port <= 0; @(posedge clk); @(posedge clk); port <= 1; @(posedge clk); port <= 0; @(posedge clk); port <= 1; @(posedge clk); end sequence sts_ast (port); logic [31:0] count1[MAXPORT:0] ; ( chk, count1[port]+=1 ); endsequence assert property( @(posedge clk) sts_ast( port ) ); initial #100 $finish(2); endmodule |
When the example is run using a 9.2 version of IUS, the simulation ends with this message:
uwes@vl-uwe[none]~/src/tco/aet_debugging_crashes_using_tco/tco_lab/lab1$ irun test.sv -quiet ncvlog: *F,INTERR: INTERNAL EXCEPTION ----------------------------------------------------------------- The tool has encountered an unexpected condition and must exit. Contact Cadence Design Systems customer support about this problem and provide enough information to help us reproduce it, including the logfile that contains this error message. TOOL: ncvlog 09.20-s055 HOSTNAME: vl-uwe OPERATING SYSTEM: Linux 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 MESSAGE: p1_5_xabvinstances: default VST ----------------------------------------------------------------- csi-ncvlog - CSI: Cadence Support Investigation, sending details to ncvlog.err csi-ncvlog - CSI: investigation complete, send ncvlog.err to Cadence Support irun: *E,VLGERR: An error occurred during parsing. Review the log file for errors with the code *E and fix those identified problems to proceed. Exiting with code (status 255). |
At this point there are usually not many options left to try, so let's see how TCO helps us here.
The following steps outline the TCO setup used to reduce the example.
Script the Failing Tool Invocation and Failure Analysis
Essentially all we need to do here is put the failing command into a script file, which invokes the failing flow. After the flow has finished, it reports back to TCO whether the failure has been seen or not. Here we simply start the flow with the irun
command and once finished, we check the logfile for the failure signature.
#!/bin/bash # # this script should evaluate the configuration and should return [0,2] # # #set -x FAILURE_PRESENT=0 FAILURE_NOT_PRESENT=2 irun -quiet test.sv -clean > /dev/null grep p1_5_xabvinstances irun.log > /dev/null if [ $? = 0 ]; then exit ${FAILURE_PRESENT} else exit ${FAILURE_NOT_PRESENT} fi |
Now, the invocation can be tested and if the failure is present, the return status of the script should be 0
.
Run TCO
To simplify TCO startup, a startup script of just a few lines is typically used. Essentially TCO needs the following two things in order to run:
- The set of files to work on. These are typically the full or a subset of the input files for the failing flow.
- In the invocation script we build in the first step which allows TCO to run the test case automatically
#!/bin/sh tco=`ncroot`/tools/tco/bin/tco.sh # # collect list of files we want to work on # find . -name \*.sv -o -name \*.svh -o -name \*.v -o -name \*.psl > list # TCO wants to write the files so make them writable chmod +w `cat list` ${tco} -codebase . --fileset list --testscript ./run_test.sh --inplace --timeout 25 |
TCO briefly runs for a couple of seconds invoked using the script. The final status for the example is a reduction of 75%.
2014-08-20 14:16:52,781 INFO - new/original size ~206/800 2014-08-20 14:16:52,784 INFO - reduction is 74.3% 2014-08-20 14:16:52,796 INFO - result is in . 2014-08-20 14:16:52,796 INFO - took 27.000 seconds to perform 2014-08-20 14:16:52,797 INFO - final status FAILURE_NOT_PRESENT:62 FAILURE_PRESENT:13 CONFIG_REJECTED:24 reduction: 74.50% removed=596bytes currentsize=204bytes 2014-08-20 14:16:52,797 INFO - finished 2014-08-20 14:16:58,458 INFO - TCO run completed. |
Compare the Results
Once TCO has finished, the reduced results can be analyzed. Keep in mind that TCO will never create or delete files, nor will TCO rename any literals nor inject any new code (it will only remove code).That allows you to relate the code back to the original sources. Now if you ask what problem in this code is causing the tool failure, then it's simply the code line which is not required by another code line(s). In this example, it's the marked line. Since we now know which line causes the issue, we can now easily:
- Attempt to find a workaround by rewriting the offending construct/statement
- Disable the offending fragment till we get a fix (and also emit a message that this needs to be closed)
- We have a small test case we can ship and which has most other/legacy/secret code removed
- Since small dependencies are removed from the test case, we can reproduce the error outside of big flows
module test_m(); localparam MAXPORT = 8;sequence sts_ast (port); logic [31:0] count1[MAXPORT:0] ; ( chk, count1[port]+=1 ); // <<<<<<<<<<<<<<< no other line depends upon this one, therefore this is triggering the tool issue endsequence assert property( @(posedge clk) sts_ast( port ) );endmodule |
Summary
TCO is a great utility which lets you create test cases for tool failures. As a customer you can create test cases without big manual effort and time investment/engineering, The result is typically acceptable even for secrecy-concerned customers, or it can be made acceptable with minimal effort. For a vendor, the TCO approach provides test cases which can be run and debugged locally without any overhead, resulting in much faster workarounds and/or fixes. In summary, TCO is a tool you should be aware of because it provides value when you need it most. Cool, isn't it?
References
Cadence online help: search TCO
U.Simm: "Rapid creation of reduced tool failure scenarios," CTC 2013
Uwe Simm