👀 Problem

Design verification (DV) checks the correctness of hardware designs. The verification process takes in inputs, or test stimuli, passes them into the hardware design-under-test (DUT), and compares the result to expected outputs from a software golden model. The scope of the verification, also known as a coverage plan, is defined and agreed upon in advance, ensuring all functional properties of the design are tested.

A coverage plan in DV is defined to specify a list of coverage points to be tested, which are particular outputs and machine states that the verification process need to cover. Each cover point is associated with a coverage bin that counts how many times the cover point has been exercised. The goal of the verification process is to achieve (100%) functional coverage rate based on the coverage plan, meaning that all coverage bins would have a non-zero value.

Effective test stimuli generation has been a major challenge in meeting (100%) coverage. For a simple design, verification can be done with individual directed tests, in which test stimuli (inputs for the DUT) are manually generated. For more complex designs, a large number of stimuli is required for exercising as much of the design's functionality as possible. Traditionally, constrained-random testing (CRT) has been used for generating vast random but valid test stimuli and attempting to ``hit'' the bins. However, CRT is inefficient to hit bins with complicated conditions. Often, this necessitates extensive human engineering involvement in the test stimuli design process.


💭 Proposal

In this project, we are interested in extending a benchmarking framework named LLM4DV (Large Language Model for Design Verification). This is a benchmark framework has been developed in the summer with collaboration from both the University of Cambridge and LowRISC CIC. Currently, we have LLM results for several CPU hardware components and also a full IBEX CPU. This project will extend the framework with testing more on the IBEX CPU and also maybe extended the testing to a GPU and a Systolic Array accelerator that we have built in our lab.

https://github.com/lowRISC/ibex


🛫 Plan

Basic

Extensions

AutoPrompt: Eliciting Knowledge from Language Models with...

Reinforcement learning from human feedback