Compile Options#
The following list of options are to be used with the compile_options
dictionary argument of @polyblocks_jit_*
decorators. As an example:
@polyblocks_jit_torch(compile_options={
'mm_conv_precision_mode': 'fp16',
'target': 'nvgpu'
})
When using the polyblocks compiler from a terminal/shell, these options listed below are to be used with underscores replaced with hyphens (when supplying them on the command line).
General options#
device_one_time_load
allows the load of the device kernel only once (the first time), irrespective of how many times the kernel is launched.mm_conv_precision_mode
Precision for add/mul for convolution and matmul; possible values are fp16
,fp32
,int32
,mixed_fp16_fp32
, andmixed_int8_int32
; the default mode for GPUs with tensor cores ismixed_fp16_fp32
, with float16 and bfloat16 input data types supported. For ‘cpu’ and ‘gpu’ with tensor core targeting disabled, the default behavior is to maintain the existing precision as dictated by the input/output types of a particular matmul/convolution operation.mlir_disable_fast_math
Disable fast math while mapping math ops to lower-level target intrinsics. Fast math is enabled by default.num_iters
The number of iterations for which the spec will be run. Each run uses the same input. Meant for testing. Default: 1. on_gpu_tensors
Generate libraries while expecting input and output tensors to already be present on on the GPU (polyblock_gen_library
has to be specified as well).polyblocks_disable_affine_fusion
Disable affine fusion for PolyBlocks codegen. Enabled by default.polyblocks_gpu_no_tensor_core
Disable targeting tensor cores for PolyBlocks GPU compilation. Tensor cores are enabled by default.polyblocks_disable_reduce_reduce_fusion
Disable fusion across multiple reduction operators (e.g. matmul-softmax-matmul)polyblocks_disable_unsafe_math_canonicalizations
Disable canonicalizations for math operations that are not guaranteed to be numerically stable.strict_target
Ensure that all kernels are compiled for the specified target. Disabled by default.target
The hardware to target: this can benvgpu
,amdgpu
, orcpu
to target NVIDIA GPUs, AMD GPUs, or CPUs respectively. The default target is ‘cpu’.torch_polyblocks_select_graphs
Selectively run PyTorch graphs mentioned in this list via PolyBlocks.torch_polyblocks_filter_graphs
Selectively skip PyTorch graphs mentioned in this list via PolyBlocks.
AOT-specific options#
aot
Generate libraries from PolyBlocks/MLIR compilation. Disabled by default.aot_name
Use the specified name for the main generated function. This name is also used as the prefix for the emitted artifacts. Default: polyblocks_mlir_artifact
.device_one_time_load
(enabled by default) allows the load of the device kernel only once (the first time), irrespective of how many times the kernel is launched.on_gpu_tensors
Generate libraries while expecting input and output tensors to already be present on the GPU.user_device_memory
Generate libraries while expecting the user to provide a pool of device_allocated memory through the trailing argument of the generated function; generated code does not perform any device allocations but simply uses the supplied pool of memory. Wheneveruser_device_memory
is specified, a user of the generated function is also expected to pass a pointer to the relevant CUDA stream (cudaStream_t) as the trailing argument; as a result, the last two arguments of the generated library’s signature have to be the device memory buffer and the CUDA stream pointer whenever -user-device-memory is specified.device
Specify the target device when cross-compiling.