Compile Options#

The following list of options are to be used with the compile_options dictionary argument of @polyblocks_jit_* decorators. As an example:

@polyblocks_jit_torch(compile_options={
    'mm_conv_precision_mode': 'fp16',
    'target': 'nvgpu'
})

When using the polyblocks compiler from a terminal/shell, these options listed below are to be used with underscores replaced with hyphens (when supplying them on the command line).

General options#

device_one_time_load allows the load of the device kernel only once (the first time), irrespective of how many times the kernel is launched.
mm_conv_precision_mode Precision for add/mul for convolution and matmul; possible values are fp16, bf16, fp32, int32, mixed_fp16_fp32, mixed_bf16_fp32, and mixed_int8_int32; the default mode for GPUs with tensor cores is mixed_fp16_fp32, with float16 and bfloat16 input data types supported. For ‘cpu’ and ‘gpu’ with tensor core targeting disabled, the default behavior is to maintain the existing precision as dictated by the input/output types of a particular matmul/convolution operation.
mlir_disable_fast_math Disable fast math while mapping math ops to lower-level target intrinsics. Fast math is enabled by default.
num_iters The number of iterations for which the spec will be run. Each run uses the same input. Meant for testing. Default: 1.
on_gpu_tensors Generate libraries while expecting input and output tensors to already be present on on the GPU (polyblock_gen_library has to be specified as well).
polyblocks_disable_affine_fusion Disable affine fusion for PolyBlocks codegen. Enabled by default.
polyblocks_gpu_no_tensor_core Disable targeting tensor cores for PolyBlocks GPU compilation. Tensor cores are enabled by default.
polyblocks_disable_reduce_reduce_fusion Disable fusion across multiple reduction operators (e.g. matmul-softmax-matmul)
polyblocks_disable_unsafe_math_canonicalizations Disable canonicalizations for math operations that are not guaranteed to be numerically stable.
strict_target Ensure that all kernels are compiled for the specified target. Disabled by default.
target The hardware class to target: the supported targets are ‘cpu’ (CPUs), nvgpu for NVIDIA GPUs, and amdgpu for AMD GPUs. The default target is ‘cpu’.
torch_polyblocks_select_graphs Selectively run PyTorch graphs mentioned in this list via PolyBlocks.
torch_polyblocks_filter_graphs Selectively skip PyTorch graphs mentioned in this list via PolyBlocks.

AOT-specific options#

aot Generate libraries from PolyBlocks/MLIR compilation. Disabled by default.
aot_name Use the specified name for the main generated function. This name is also used as the prefix for the emitted artifacts. Default: polyblocks_mlir_artifact.
device Specify a particular target device/chip to compile or cross-compile for (in AOT mode). Examples for `nvgpu’ include ‘rtx3090’, ‘rtx4090’, ‘a10’, ‘a40’, ‘a100’, and ‘orin_nano’. Any number of chip configurations can be added to the JSON file, ‘devices.json’.
device_one_time_load (enabled by default) allows the load of the device kernel only once (the first time), irrespective of how many times the kernel is launched.
on_gpu_tensors Generate libraries while expecting input and output tensors to already be present on the GPU.
user_device_memory Generate libraries while expecting the user to provide a pool of device_allocated memory through the trailing argument of the generated function; generated code does not perform any device allocations but simply uses the supplied pool of memory. Whenever user_device_memory is specified, a user of the generated function is also expected to pass a pointer to the relevant CUDA stream (cudaStream_t) as the trailing argument; as a result, the last two arguments of the generated library’s signature have to be the device memory buffer and the CUDA stream pointer whenever -user-device-memory is specified.

Compile Options

Contents

Compile Options#

General options#

AOT-specific options#