November 29, 2021

Some tips for avoiding mistakes in Verilog design

This is a list of mistakes that are often made in design. These errors often make your design unreliable or slow. To improve your design performance and speed reliability you must make sure your design passes all of these checks.

Reliability Select the global clock buffer BUFG for the clock signal
A clock that does not use the global clock buffer will introduce a bias.

Using only one clock edge to register data using the two edges of the clock is unreliable because some or both edges of the clock will drift; if the clock drifts and you only use one edge of the clock, you reduce the clock edge drift. risks of.
This problem can be solved by allowing CLKDLL to automatically correct the duty cycle of the clock to a 50% duty cycle. Otherwise it is strongly recommended that you only use one clock edge.

Do not generate a clock internally except for clocks generated with CLKDLL or DCM.
This includes generating a gated clock and a divided clock. Alternatively, a clock enable can be established or a different clock signal can be generated using CLKDLL or DCM.
For a purely synchronous design it is recommended that you use only one clock whenever possible.

Do not generate asynchronous control signals internally, such as reset signals or asynchronous control signals generated internally by the set signal. Glitch can be used instead to generate a synchronous reset/set signal. The decoding of this signal is one clock ahead of the time required. cycle

Don't use multiple clocks without phase relationships. You may not always be able to avoid this condition. Under these circumstances, make sure you have used the appropriate synchronization circuit to cross the clock domain.

Don't use multiple clocks without phase relationships. Again, you may not always be able to avoid this condition. Many designs need to do this. In these cases, you determine that you have properly constrained the path across the clock domain.

Do not use internal latch internal latches to confuse timing and often introduce additional clock signals. Internal latches can be considered as combinatorial logic when the transparent gate is open but can be considered synchronous when the gate is latched. This will confuse timing analysis. Internal latches often introduce gated clocks. Gated clocks can cause glitches that make the design unreliable.

The performance logic level delay should not exceed 50% of the timing budget. Each path logic level delay can be found in the logic level timing report or the post-layout timing report. After detailed analysis of each path, the timing analyzer will generate each The statistics of the path delay check that the total logic level delay exceeds 50% of your timing budget?

The IOB register IOB register provides the fastest clock-to-output and input-to-clock delay. First, there are some restrictions. For input registers, there can be no combined logic between pins and registers. For output registers, between registers and pins. There is a combination of logic for the three-state output of all registers in the IOB must use the same clock signal and reset signal and the IOB tri-state register must be active low to be placed in the IOB. The tristate buffer is active low so in the register No need for an inverter between the tristate buffer and the tristate buffer. You must enable the software to select the IOB register. You can set the global implementation option to select the IOB register for the input/output or input and output. The default value is off.
You can also set in the synthesis tool or in the user constraint file UCF to enable the use of the IOB register syntax: INST IOB = TRUE;

Choose a fast slew rate for critical outputs. Choose a slew rate for LVCMOS and LVTTL levels. Fast slew rate reduces output delay but increases ground bounce so you must choose a fast slew rate based on careful consideration.

Pipeline Logic If your design allows for increased latency, the pipelined operation of the combinatorial logic can improve performance. There are a large number of registers in the Xilinx FPGA. For each four-input function generator, there is a corresponding register that uses these registers in the case of sacrificial delay. Increase data throughput

Code optimization for a four-input lookup table structure Remember that each lookup table can create a four-input combinatorial logic function. If you need more functionality, remember the number of lookup tables needed to implement the function.

Using a Case statement instead of an if-then-else statement A complex if-then-else statement usually generates a priority decoding logic. This will increase the combined delay on these paths. Case statements used to generate complex logic will usually generate no. Parallel logic with too much latency for Verilog users can use the compile wizard synopsys parallel_case

Using one or more core generator block kernel generator blocks optimized for Xilinx's structure Many blocks allow user configuration including size width and pipeline delay to see the critical path in your design. Can you generate one in the core generator? Kernel to improve key path performance

Keep the finite state machine FSM at the level of your own level in order to allow the synthesis tool to fully optimize your FSM. It must be optimized in its own block. If this is not the case, this will allow the synthesis tool to FSM logic and its surroundings. Logic together optimize FSM cannot include any arithmetic logic data path logic or other combinational logic that is not related to the state machine

Finite state machine using two processes or always blocks The next state and output decode logic must be placed in a separate process or always block. This will not allow the synthesis tool to share resources between the output and the next state decode logic.

Use a valid encoding finite state machine FSM
A valid encoding usually provides the highest performance state machine in a register-rich FPGA

Providing a register output for each leaf-level leaf-level block is a block that can be inferred from logic and a structural-level block instantiates only the lower-level block. This establishes the hierarchy if the leaf-level block is latched. The output allows the synthesis tool to preserve the hierarchy. This makes it easier to analyze the static timing of these codes. Registering the boundaries allows for a defined timing relationship between the blocks.

Use data streams with appropriate pin positioning constraints
The reason why the data stream in the Xilinx device is in the horizontal direction is that there is another reason why the carry chain is in the vertical direction. The tristate buffer line is also horizontally connected directly between the blocks in the horizontal direction. In order to utilize the data stream address and data pins, it must be placed on the left or right side of the chip. At the same time, because the carry chain is bottom-up, the lowest bit is placed at the bottom of the control signal on the upper and lower parts of the chip.

Different counter styles Binary counters are very slow. If your binary counter is a critical path, consider using a different style of counter LFSRPre-scalar or Johnson.

Design is hierarchical and divided into different functional blocks and technical block designs must be divided into different functional blocks. First, the top-level functional blocks are then the lower-level blocks. You should also include the specific technology. The block design hierarchy must make the design More readable, easier to debug, easier to reuse

Copy high fanout network This can be controlled by your synthesis tool. However, in order to control copy more tightly, you can choose to copy the register.

Use four global constraints to globally constrain the design cycle for each clock bias input bias output pin-to-pin you may have other constraints for multi-cycle path failure paths and critical paths but you must always To start with specifying four global constraints

Cable Management

Cable Management,cable organizer,wire management,cable cover wall