PolyBlocks Technology#

What technologies does PolyBlocks use?#

The PolyBlocks compiler engine is based on the MLIR infrastructure. For certain analyses and transformations, PolyBlocks also relies on the Integer Set Library. In addition, PolyBlocks uses LLVM for all its lower-level code generation. Within MLIR, PolyBlocks heavily uses polyhedral and affine abstractions for its transformations. Please see Compiler Overview for some more details.

How does PolyBlocks differ from other ML compilers?#

To the best of our knowledge, the PolyBlocks compiler is the only one that supports the three major AI frameworks - TensorFlow, PyTorch, and JAX - in a turnkey manner through the addition of a single-line annotation to an otherwise unmodified specification written in the respective framework. This is true for both its JIT and AOT capabilities. While TensorFlow/XLA, TorchInductor, and JAX JIT (XLA-based) are standard compilers available with TensorFlow, PyTorch, and JAX, respectively, they differ from PolyBlocks in the intermediate representations (IR) and compilation techniques used. XLA and TorchInductor do not share compiler optimization infrastructure between them beyond the lower LLVM layer. They are not “full” compilers, strictly speaking, i.e., they rely heavily on vendor libraries (hand-written tuned kernels for various cases - e.g., cuDNN, cuBLAS, cutlass, OpenBLAS, etc.) and other hand-written custom kernels. This is also the case with NVIDIA TensorRT, which primarily maps to hand/expert-written kernels with some elements of compilation (graph rewriting, fusion, and potentially other code generation). All of these systems are limited in several ways in their ability to represent and perform complex transformations for parallelism and locality due to the intermediate representation (IR) they employ or due to the fragmentation and multiple IRs throughout their stack.

On the other hand, PolyBlocks is 100% code-generating and uses MLIR for its entire stack above LLVM. It uses the same compiler engine across AI programming frameworks and hardware targets. The MLIR abstractions PolyBlocks uses allow more complex fusion and tiling than is possible with other frameworks. Please see Compiler Overview for more details.

How does PolyBlocks differ from IREE?#

While IREE is also based on the MLIR infrastructure and aimed at code generation, PolyBlocks’ approach is drastically different in its choice of MLIR dialects, abstractions, and transformations for its core optimizations. Based on the performance comparisons we performed, PolyBlocks’ optimizations are significantly more powerful, advanced, and complete in breadth and depth. Unlike IREE, PolyBlocks does not employ any hand-written kernels or micro-kernels to map to for CPU or GPU compilation. Please see Compiler Overview for more details.

Does PolyBlocks perform auto-tuning under the hood?#

No. One of our goals is to avoid resorting to auto-tuning or searching this early in the development cycle. So, PolyBlocks is 100% model-driven at this point. Compilation is thus fast. The code generation pipeline does account for the target and its characteristics: as one example, the generated code could be different for an NVIDIA GPU with 128 SMs from another with 72 SMs or with one with a different amount of (on-chip) shared memory.

State of Support#

Which coverage among TensorFlow, PyTorch, and JAX is best supported by PolyBlocks?#

While the goal is to support all three frameworks, a lot of our testing has been with PyTorch and TensorFlow, with a good set of non-deep-learning and deep learning workloads. This page maintains a sample list of PyTorch models from HuggingFace that are known to compile successfully with PolyBlocks.

For TensorFlow, we support a subset of the TensorFlow operators. The subset supported is close to the subset supported by the TF XLA JIT. Anything handled by TF/XLA but not by TF/PolyBlocks can be considered a missing feature or a bug to file. There are also a handful of operations that PolyBlocks supports that XLA doesn’t (e.g.: tfa.image_translate). With PyTorch, anything supported by torch.compile should also compile with PolyBlocks.

Which among TensorFlow, PyTorch, and JAX is best with PolyBlocks for performance?#

The PolyBlocks compiler uses the same engine with all three frameworks. We expect models that are algorithmically equivalent to perform similarly regardless of the front-end used. Any significant difference is to be treated as a performance bug.

What hardware features of the NVIDIA GPUs are supported?#

NVIDIA GPU tensor cores, on-chip shared memory, and reduction primitives are all supported. Tensor cores starting from the Volta through Ampere and Lovelace are supported. Hopper is also supported, but hardware based on it is unavailable on the playground. The following data types are supported with tensor cores:

  • FP16 with FP32 accumulate.

  • FP16 with FP16 accumulate.

  • INT8 (with INT32 accumulate)

What features of AMD GPUs are supported?#

PolyBlocks supports the special matmul units on AMD RDNA~3 GPUs in the FP16 with FP32 accumulate mode. We successfully tested several DL and non-DL learning workloads on AMD GPUs. PolyBlocks supports the special matmul units natively via LLVM and does not go through external frameworks like SPIR-V like other existing approaches.

Does PolyBlocks support the training of machine learning models?#

Currently, we only recommend using DL models in inference mode. Training is also known to work with PolyBlocks, but it has been tested to a limited extent.

Using, trying, or licensing PolyBlocks#

How can I use PolyBlocks?#

PolyBlocks is available for use in the following ways:

  • Playground: this is suitable for academic and non-commercial users.

  • Sources or binary release licensing: Please see the section below on licensing PolyBlocks.

Under what license can PolyBlocks be made available?#

There are a few options for licensing PolyBlocks, depending on the licensee’s objectives.

  • Release license for binaries: The release license provides a non-exclusive license to use PolyBlocks binaries to JIT or AOT-compile code with the freedom for unlimited use of the generated code/optimized binaries on the hardware of choice and models/data of choice. The generated code can be used perpetually, and its derivatives can be similarly used. The release license has two flavors: one for perpetual use with renewable updates and support, and the other for use restricted to the license period, which is also renewable. The release license does not deliver PolyBlocks compiler engine sources but provides a license for the sources of its Python-side compiler driver and interfacing with frontends. This license suits entities interested in accelerating their AI/ML workloads on commodity hardware. PolyBlocks binary releases can be delivered in the form of Docker containers or through a private GitHub repository with tracking and support via GitHub issues.

  • License of the sources: For entities interested in building derivative compilers using PolyBlocks, we are open to providing an ownership-like license of the sources with perpetuity and the freedom to create and use derivatives in the licensee’s products. Updates and support can be provided for desired time periods. This licensing is suitable in particular for hardware vendors who are interested in building a PolyBlocks-powered compiler stack. The binary release can be a way to experiment with PolyBlocks on available hardware before licensing its sources.

In either case, licensing costs depend on the framework x target combination from among {PyTorch, JAX, TensorFlow} x {NVIDIA GPUs, AMD GPUs, CPUs} that a subscriber is interested in and also on the time period for release updates and support. Please get in touch with software@polymagelabs.com for more information.

How does the PolyBlocks team contribute to the open-source infrastructure it is based on?#

MLIR, the base infrastructure that PolyBlocks depends on, is already open source and available as an LLVM sub-project under the Apache2 license. PolyBlocks’ authors regularly contribute improvements and fixes to the MLIR infrastructure. However, the PolyBlocks compiler engine and its compiler drivers for PyTorch, TensorFlow, and JAX are not open-source.

PolyBlocks uses a specific approach to compilation, code generation, and transformation. On the other hand, MLIR is now increasingly taking the form of a core infrastructure with all the tools and utilities to build a compiler like the PolyBlocks compiler. Hence, we do not see PolyBlocks becoming a part of the MLIR infrastructure. We only expect generic transformation and analysis utilities to be upstreamed to MLIR through the usual LLVM contribution and review process.

Can I evaluate PolyBlocks on hardware different from the ones available on the playground?#

The playground is self-hosted at PolyMage Labs. To experiment with the hardware of choice, please see the licensing section above. Downloadable binary releases can be licensed. This is especially suitable if you want to try PolyBlocks on your own systems and proprietary models/data in a commercial setting. We are also open to receiving complementary state-of-the-art hardware to expand the playground and host it on more devices/systems.

PolyBlocks Playground#

Can I load confidential data or run confidential/proprietary workloads on the playground?#

No. Please see the terms of use on the sign-up page for the playground.

I have not yet received access to the playground. How long would the wait be?#

The playground is self-hosted. We are limited by the amount of accelerator hardware we have collectively on the playground system. We will inform you of any delays in our ability to provide access.

Playground Known Issues#

Please see the separate document on known issues with the Playground.