Compile Options#

The following list of options are to be used with the compile_options dictionary argument of @polyblocks_jit_* decorators. As an example:

@polyblocks_jit_torch(compile_options={
    'mm_conv_precision_mode': 'fp16',
    'target': 'nvgpu'
})

When using the polyblocks compiler from a terminal/shell, these options listed below are to be used with underscores replaced with hyphens (when supplying them on the command line).

General options#

  • device_one_time_load allows the load of the device kernel only once (the first time), irrespective of how many times the kernel is launched.

  • mm_conv_precision_mode Precision for add/mul for convolution and matmul; possible values are fp16, fp32, int32, mixed_fp16_fp32, and mixed_int8_int32; the default mode for GPUs with tensor cores is mixed_fp16_fp32. For ‘cpu’ and ‘gpu’ with tensor core targeting disabled, the default behavior is to maintain the existing precision as dictated by the input/output types of a particular matmul/convolution operation.

  • mlir_disable_fast_math Disable fast math while mapping math ops to lower-level target intrinsics. Fast math is enabled by default.

  • num_iters The number of iterations for which the spec will be run. Each run uses the same input. Meant for testing. Default: 1.

  • on_gpu_tensors Generate libraries while expecting input and output tensors to already be present on on the GPU (polyblock_gen_library has to be specified as well).

  • polyblocks_disable_affine_fusion Disable affine fusion for PolyBlocks codegen. Enabled by default.

  • polyblocks_gpu_no_tensor_core Disable targeting tensor cores for PolyBlocks GPU compilation. Tensor cores are enabled by default.

  • polyblocks_disable_reduce_reduce_fusion Disable fusion across multiple reduction operators (e.g. matmul-softmax-matmul)

  • polyblocks_disable_unsafe_math_canonicalizations Disable canonicalizations for math operations that are not guaranteed to be numerically stable.

  • strict_target Ensure that all kernels are compiled for the specified target. Disabled by default.

  • target The hardware to target: this can be nvgpu, amdgpu, or cpu to target NVIDIA GPUs, AMD GPUs, or CPUs respectively. The default target is ‘cpu’.

  • torch_polyblocks_select_graphs Selectively run PyTorch graphs mentioned in this list via PolyBlocks.

  • torch_polyblocks_filter_graphs Selectively skip PyTorch graphs mentioned in this list via PolyBlocks.

AOT-specific options#

  • aot Generate libraries from PolyBlocks/MLIR compilation. Disabled by default.

  • aot_name Use the specified name for the main generated function. This name is also used as the prefix for the emitted artifacts. Default: polyblocks_mlir_artifact.

  • device_one_time_load (enabled by default) allows the load of the device kernel only once (the first time), irrespective of how many times the kernel is launched.

  • on_gpu_tensors Generate libraries while expecting input and output tensors to already be present on the GPU.

  • user_device_memory Generate libraries while expecting the user to provide a pool of device_allocated memory through the trailing argument of the generated function; generated code does not perform any device allocations but simply uses the supplied pool of memory. Whenever user_device_memory is specified, a user of the generated function is also expected to pass a pointer to the relevant CUDA stream (cudaStream_t) as the trailing argument; as a result, the last two arguments of the generated library’s signature have to be the device memory buffer and the CUDA stream pointer whenever -user-device-memory is specified.

  • device Specify the target device when cross-compiling.