PolyBlocks Technology#

What technologies does PolyBlocks use?#

The PolyBlocks compiler engine is based on the MLIR infrastructure. For certain analyses and transformations, PolyBlocks also relies on the Integer Set Library. In addition, PolyBlocks uses LLVM for all its lower-level code generation. Within MLIR, PolyBlocks heavily uses polyhedral and affine abstractions for its transformations. Please see Compiler Overview for some more details.

How does PolyBlocks differ from other ML compilers?#

To the best of our knowledge, the PolyBlocks compiler is the only one that supports the three major AI frameworks - TensorFlow, PyTorch, and JAX - in a turnkey manner through the addition of a single-line annotation to an otherwise unmodified specification written in the respective framework. This is true for both its JIT and AOT capabilities. While TensorFlow/XLA, TorchInductor, and JAX JIT (XLA-based) are standard compilers available with TensorFlow, PyTorch, and JAX, respectively, they differ from PolyBlocks in the intermediate representations (IR) and compilation techniques used. XLA and TorchInductor do not share compiler optimization infrastructure between them beyond the lower LLVM layer. They are not “full” compilers, strictly speaking, i.e., they rely heavily on vendor libraries (hand-written tuned kernels for various cases - e.g., cuDNN, cuBLAS, cutlass, OpenBLAS, etc.) and other hand-written custom kernels. This is also the case with NVIDIA TensorRT, which primarily maps to hand/expert-written kernels with some elements of compilation (graph rewriting, fusion, and potentially other code generation). All of these systems are limited in several ways in their ability to represent and perform complex transformations for parallelism and locality due to the intermediate representation (IR) they employ or due to the fragmentation and multiple IRs throughout their stack.

On the other hand, PolyBlocks is 100% code-generating and uses MLIR for its entire stack above LLVM. It uses the same compiler engine across AI programming frameworks and hardware targets. The MLIR abstractions PolyBlocks uses allow more complex fusion and tiling than is possible with other frameworks. Please see Compiler Overview for more details.

How does PolyBlocks differ from IREE?#

While IREE is also based on the MLIR infrastructure and aimed at code generation, PolyBlocks’ approach is drastically different in its choice of MLIR dialects, abstractions, and transformations for its core optimizations. Based on the performance comparisons we performed, PolyBlocks’ optimizations are significantly more powerful, advanced, and complete in breadth and depth. Unlike IREE, PolyBlocks does not employ any hand-written kernels or micro-kernels to map to for CPU or GPU compilation. Please see Compiler Overview for more details.

Does PolyBlocks perform auto-tuning under the hood?#

No. One of our goals is to avoid resorting to auto-tuning or searching this early in the development cycle. So, PolyBlocks is 100% model-driven at this point. Compilation is thus fast. The code generation pipeline does account for the target and its characteristics: as one example, the generated code could be different for an NVIDIA GPU with 128 SMs from another with 72 SMs or with one with a different amount of (on-chip) shared memory.

State of Support#

Which among TensorFlow, PyTorch, and JAX is best supported by PolyBlocks in terms of coverage?#

While the goal is to support all three in terms of coverage, we have sufficiently tested Tensorflow and PyTorch with a good set of non-DL workloads and DL vision models, including classification and segmentation models. We have also tried a few transformer-based models, including BERT-base and MaxVIT. For TensorFlow, we support a subset of the TensorFlow operators. The subset supported is close to the subset supported by the TF XLA JIT. Anything handled by TF/XLA but not by TF/PolyBlocks can be considered a missing feature or a bug to file. There are also a handful of operations that PolyBlocks supports that XLA doesn’t (E.g.: tfa.image_translate).

Which among TensorFlow, PyTorch, and JAX is best with PolyBlocks for performance?#

The PolyBlocks compiler uses the same engine with all three frameworks. We expect models that are algorithmically equivalent to perform similarly regardless of the front-end used. Any significant difference is to be treated as a performance bug.

What hardware features of the NVIDIA GPUs are supported?#

NVIDIA GPU tensor cores, on-chip shared memory, and reduction primitives are all supported. Tensor cores starting from the Volta through Ampere and Lovelace are supported. Hopper is also supported, but hardware based on it is unavailable on the playground. The following data types are supported with tensor cores:

  • FP16 with FP32 accumulate.

  • FP16 with FP16 accumulate.

  • INT8 (with INT32 accumulate)

What features of AMD GPUs are supported?#

PolyBlocks supports the special matmul units on AMD RDNA~3 GPUs in the FP16 with FP32 accumulate mode. We successfully tested several DL and non-DL learning workloads on AMD GPUs. PolyBlocks supports the special matmul units natively via LLVM and does not go through external frameworks like SPIR-V like other existing approaches.

Does PolyBlocks support the training of machine learning models?#

Currently, we only recommend using DL models in inference mode. Training is also known to work with PolyBlocks, but it has been tested to a limited extent.

Using or trying PolyBlocks#

How can I use PolyBlocks?#

PolyBlocks is available for use in the following ways:

  • Playground: this is suitable for academic and non-commercial users.

  • Docker release: For entities interested in accelerating their AI/ML workloads, PolyBlocks binary releases are available as Docker containers. Please contact info@polymagelabs.com for more information. Licensing costs depend on the framework x target combination from among {PyTorch, JAX, TensorFlow} x {NVIDIA GPUs, AMD GPUs, CPUs} that a subscriber is interested in and also on the time period for release updates and support. A release evaluation license is also available.

  • Sources: For commercial vendors interested in building new/derivative compilers using PolyBlocks, we can license PolyBlocks under a non-exclusive license with source code, perpetuity, and the freedom to create and use derivatives in the vendors’ products. Please email info@polymagelabs.com for more information. The docker binary release mentioned above could be used for initial experimentation and evaluation.

Will the PolyBlocks compiler’s source code be available? Under what license?#

This depends on the nature of the entity interested in PolyBlocks and the use case.

  • For hardware vendors: yes, but only under a license. We are open to providing an ownership-like license with perpetuity, source code, and the freedom to create and use derivatives in the vendors’ products, along with release updates for a desired time period. Please email info@polymagelabs.com for more information.

  • For academic users, sources are not available at this time. In the future, we may work on a license to allow free-of-cost use while being able to publish the outcomes of such activity. Until then, academic users can use the playground.

MLIR, the base infrastructure that PolyBlocks depends on, is already open source and available as an LLVM sub-project under the Apache2 license. PolyBlocks’ authors regularly contribute improvements and fixes to the MLIR infrastructure. However, the PolyBlocks compiler engine and its compiler drivers for PyTorch, TensorFlow, and JAX are not open-source.

PolyBlocks uses a specific approach to compilation, code generation, and transformation. On the other hand, MLIR is now increasingly taking the form of a core infrastructure with all the tools and utilities to build a compiler like the PolyBlocks compiler. Hence, we do not see PolyBlocks becoming a part of the MLIR infrastructure. We only expect generic transformation and analysis utilities to be upstreamed to MLIR through the usual LLVM contribution and review process.

Will the PolyBlocks compiler’s binaries be available for use outside the playground?#

Binary releases for use outside the playground can be made available under the PolyBlocks release license. Please email info@polymagelabs.com for more information. In summary, the release license provides a perpetual, non-exclusive license to use PolyBlocks to JIT or AOT-compile code with the freedom for unlimited use of the generated code/optimized binaries on the hardware of choice and models/data of choice. The generated code can be used perpetually, and its derivatives can be similarly used.

Can I evaluate PolyBlocks on hardware different from the ones available on the playground?#

The playground is self-hosted at PolyMage Labs. To experiment on your own hardware or hardware of choice, please contact info@polymagelabs.com for a downloadable binary release that is available as a docker image. This is especially suitable if you want to try PolyBlocks on your own systems and proprietary models/data in a commercial setting. We are also open to receiving complementary state-of-the-art hardware to expand the playground and host it on more devices/systems.

PolyBlocks Playground#

Can I load confidential data or run confidential/proprietary workloads on the playground?#

No. Please see the terms of use on the sign-up page for the playground.

I have not yet received access to the playground. How long would the wait be?#

The playground is self-hosted. We are limited by the amount of accelerator hardware we have collectively on the playground system. We will inform you of any delays in our ability to provide access.

Playground Known Issues#

Please see the separate document on known issues with the Playground.