Generate Kernels through predefined use cases#
After signing in to KernelGen, you will see the welcome page.

On the welcome page, perform the following steps to generate Triton Kernels:
Click a predefined use case in the Use Case section.
This example uses the predefined use case named ReLU.
The operator definitions of the ReLU use case are automatically populated in the text box.
-
KernelGen searches for a list of GitHub repositories to find code snippets similar to the operator definitions. For more information about the list of GitHub repositories, Repository List.
Select one or more URLs of repositories for reference or select the direct generation checkbox, and then click Next.
In the confirmation dialogue box, click Confirm.
On the operator definition and configuration page, configure the operator definition parameters and the maximum number of iterations KernelGen attempts to pass the correctness test:
In the Operator Definition section, select an operator testing device from the Testing Device dropdown list. The default is Nvidia.
Configure the other operator definition parameters as needed.
In the KernelGen Configuration section, increase or decrease the maximum number of iterations KernelGen attempts to pass the correctness test. The default is 5.
Click Next.
In the confirmation dialogue box, click Confirm.

On the Kernel generation and test page, check the generation status on the KernelGen panel at the right.
After the statuses of the Kernel, CUDA Implementation, Correctness Test, and Speedup Ratio Test change to Completed, and the Correctness Test is passed (turns to green), click View Details. The Speedup Ratio Details table lists speedup of each scenario and overall speedup.
If the speedup information meets your performance criteria, close the Speedup Ratio Details, and click Download Kernel. If you want to use the correctness test and speedup ratio test results, click the Correctness Test and Speedup Ratio Test sections to copy and paste the corresponding codes.
