Compile Options#
The following list of options are to be used with the compile_options
dictionary argument of @polyblocks_jit_*
decorators. As an example:
@polyblocks_jit_torch(compile_options={
'mm_conv_precision_mode': 'fp16',
'target': 'nvgpu'
})
When using the polyblocks compiler from a terminal/shell, these options listed below are to be used with underscores replaced with hyphens (when supplying them on the command line).
General options#
device_one_time_load
allows the load of the device kernel only once (the first time), irrespective of how many times the kernel is launched.mm_conv_precision_mode
Precision for add/mul for convolution and matmul; possible values are fp16
,fp32
,int32
,mixed_fp16_fp32
, andmixed_int8_int32
; the default mode for GPUs with tensor cores ismixed_fp16_fp32
. For ‘cpu’ and ‘gpu’ with tensor core targeting disabled, the default behavior is to maintain the existing precision as dictated by the input/output types of a particular matmul/convolution operation.mlir_disable_fast_math
Disable fast math while mapping math ops to lower-level target intrinsics. Fast math is enabled by default.num_iters
The number of iterations for which the spec will be run. Each run uses the same input. Meant for testing. Default: 1. on_gpu_tensors
Generate libraries while expecting input and output tensors to already be present on on the GPU (polyblock_gen_library
has to be specified as well).polyblocks_disable_affine_fusion
Disable affine fusion for PolyBlocks codegen. Enabled by default.polyblocks_gpu_no_tensor_core
Disable targeting tensor cores for PolyBlocks GPU compilation. Tensor cores are enabled by default.polyblocks_disable_reduce_reduce_fusion
Disable fusion across multiple reduction operators (e.g. matmul-softmax-matmul)strict_target
Ensure that all kernels are compiled for the specified target. Disabled by default.target
The hardware to target: this can benvgpu
,amdgpu
, orcpu
to target NVIDIA GPUs, AMD GPUs, or CPUs respectively. The default target is ‘cpu’.torch_polyblocks_select_graphs
Selectively run PyTorch graphs mentioned in this list via PolyBlocks.torch_polyblocks_filter_graphs
Selectively skip PyTorch graphs mentioned in this list via PolyBlocks.
AOT-specific options#
aot
Generate libraries from PolyBlocks/MLIR compilation. Disabled by default.aot_name
Use the specified name for the main generated function. This name is also used as the prefix for the emitted artifacts. Default: polyblocks_mlir_artifact
.device_one_time_load
(enabled by default) allows the load of the device kernel only once (the first time), irrespective of how many times the kernel is launched.on_gpu_tensors
Generate libraries while expecting input and output tensors to already be present on the GPU.user_device_memory
Generate libraries while expecting the user to provide a pool of device_allocated memory through the trailing argument of the generated function; generated code does not perform any device allocations but simply uses the supplied pool of memory. Wheneveruser_device_memory
is specified, a user of the generated function is also expected to pass a pointer to the relevant CUDA stream (cudaStream_t) as the trailing argument; as a result, the last two arguments of the generated library’s signature have to be the device memory buffer and the CUDA stream pointer whenever -user-device-memory is specified.device
Specify the target device when cross-compiling.