Our previous posts in this series covered measuring parameters, switches, and profiling, as well as performing profile analysis. In this final post will examine how to analyze a basic profiler report. This includes examining the stream count, identifying the most active modules, and reviewing the summary section. Additionally, we will discuss techniques for optimizing settings and managing access levels.
Analyzing Basic Profile
Stream Count
Let's take a look at some basic profile information. The report is generated using the -profile switch and the profiler when executing the design profiler. This output file specifically pertains to stream count. Streams refer to individually generated code streams, such as non-blocking assignments, always statements, anonymous continuous assignments, and continuous assignments in different modules and statement types. The profiler report contains the percentage hits, instances, and names of the streams.
In this report, you can analyze the percentage of hits for each stream. This represents the percentage of total time used for each specific stream. For instance, the initial non-blocking assignment currently accounts for over 15% of your simulation time. The following always statement uses 13.9% of the time, while the next non-blocking assignment utilizes 13%. These are critical factors to consider since they could be the potential bottlenecks that require attention to enhance performance.
In addition, the report also provides the number of instances of various types of structures present. For example, there can be thousands of instances of non-blocking assignments.
This report offers valuable insights into which process consumes the highest percentage of time and identifies potential bottlenecks. By pinpointing the bottleneck, you can investigate and address the issue. It also gives the file name and line number, which allows you to analyze the code and identify the problem. For instance, you can inspect the sensitivity list of the always statement and evaluate how often it activates. Through this analysis, you can determine which processes are being executed and ultimately resolve any performance issues.
Most Active Modules
The "most active modules" section of the report offers you a unique perspective. You may be surprised that an “always” statement that only took up 0.1% of your simulation time contributed to 11.6% of your total simulation time.
You may be curious about why this is occurring. The explanation is that this is a highly active module. While one instance only takes up 0.1% of the time, there could be numerous instances, and their cumulated percentage adds up significantly.
Analyzing Summary Section
In the summary section, you will find a distribution breakdown between your test bench and design. This will show the percentage of hits between your RTL and TB/SV, which can help you identify whether your design or test bench takes more time.
We have observed that lower abstractions have more distribution in the stream count. It's worth noting that even if you use gate-level simulations, they can provide helpful information such as logic primitives, time usage, and the number of hits.
SystemVerilog Coding Idea – App Note
We have an App Note written by verification experts to help you pinpoint the cause of hotspots. It provides valuable advice and coding techniques to achieve optimal performance through specific semantics, data structures, UVM guidelines, randomizations, and assertions, among other tips. For instance, our app note recommends avoiding bit-blasting vectors, working with arrays, and using different data structures to improve performance. Moreover, we offer blogs on these topics, which you can access via the Cadence support website. These resources can help you optimize your coding practices and find alternatives to boost your performance.
Tuning Settings and Access Levels
Now, let us take a look at the recommended access levels. These levels include read, write, and connect.
You can thoroughly examine nets, regs, and variables with a read access level. Additionally, you can set PLI callbacks that allow you to retrieve the values of these objects. This level of access is crucial for tasks such as waveform dumping.
With write access, you can conveniently assign values to simulation projects using TCL commands and VPI deposits and manipulate variables interactively. It is important to note that write access also includes read access automatically.
Connectivity access is required for driver tracing and to get driver and load information about a specific net, reg, or other variables. Connectivity access automatically provides both write and read access.
There are several ways to access it, but these features can disable tool optimizations, such as dead code. To minimize the impact on performance, here are some recommendations to consider.
To ensure optimal security, limit using access as much as possible. It is recommended to:
- Avoid “access rwc” or “access rw.” Instead, opt for “access r” for wavedump, nocellaccess for gate-level simulations, and accessreg.
- Use “access R” and avoid “access RWC” or “RW.” Rather than “R,” employ a generic write, afile for waccess, and specify the write to a specific portion of the design.
“lwdgen” and “createdebugdb” do not require “access c.” If you are using the latest lightweight database, there is no need for connect access as it automatically provides driver tracing.
Watch the on-demand recording of the Best Practices to Achieve the Highest Performance using Xcelium Logic Simulator webinar to learn the best practices to achieve the highest performance using Xcelium Logic Simulator.