Known Issues#

PolyBlocks is under active development, and we are constantly working towards addressing usability, coverage, and performance issues. Known issues are listed below. Any issues that you may find can be reported either on the polyblocks-playground-users Discord channel or via email to polyblocks@polymagelabs.com. Please see Questions and Issues as well.

NVIDIA GPUs#

On the benchmarks and models we tried, we did not see any major functionality gaps for NVIDIA GPUs. The TF32 data type is not supported with PolyBlocks. Instead, mixed precision fp16/fp32 and pure fp16 are supported.

AMD GPUs#

  • Workloads running in FP16 with FP16 accumulate mode on the matmul units will result in incorrect execution.

General: common to all programming frameworks#

  • Grouped convolutions are not supported.

  • TensorFloat32 (TF32) is not supported. Instead, mixed precision fp16/fp32 or pure fp16 are supported.

  • AOT (library generation) is supported for TensorFlow and PyTorch, but not for JAX yet.

Tensorflow#

On a large number of benchmarks we tried, we did not see any functionality issues for TensorFlow workloads.

PyTorch#

The following issues are known for PyTorch models:

  • Certain models, when run in the inference mode, may yield the error: RuntimeError: Inference tensors do not track version counter.. This will be addressed soon.

JAX#

  • Although we conducted a limited amount of testing using JAX, we expect a similar level of functionality and performance as with Tensorflow.

  • One may not see the expected amount of performance improvement when timing end-to-end (from Python) with JAX because not all Python-side overhead in our compiler driver has been optimized away. However, a comparison of GPU kernel execution times (using nsys on the Jupyterhub terminal) will reveal improved performance.

  • Quantization is not supported for JAX/PolyBlocks yet.

  • Care must be taken when passing the model parameters to functions that are being JIT’ed via PolyBlocks. They should always be passed as function arguments to the function that we are JITing. If we don’t pass them and obtain them from the enclosing scope, all the parameters will be captured as constants in the resultant IR, which will hurt compilation times as well as performance. When they are explicitly passed, they are captured as function parameters which is better for compilation times and performance. For example:

      @polyblocks_jit_jax(compile_options={'target': 'nvgpu'})
      # This is the correct usage.
      def ConvBiasRelu(variables, img):
          out = cnn.apply(variables, img)
          return out
    
      img = np.random.random_sample(input_shape).astype(np.float16)
      variables = jax_init_lazy_random(cnn, img)
      ConvBiasRelu(variables, img)
    
      # While the following is incorrect.
      img = np.random.random_sample(input_shape).astype(np.float16)
      variables = jax_init_lazy_random(cnn, img)
    
      @polyblocks_jit_jax(compile_options={'target': 'nvgpu'})
      def ConvBiasRelu(img):
          out = cnn.apply(variables, img)
          return out
    
      ConvBiasRelu(variables, img)
    
    

Playground#

NVIDIA GPU and AMD GPU within the same notebook#

It is currently not possible to execute PolyBlocks JITTed code both on the NVIDIA GPU and the AMD GPU within the same Python process. Executing on an NVIDIA GPU and then on the AMD GPU or vice versa will lead to an execution time failure when executing on the second device. Please restart the jupyterhub kernel when switching execution from the NVIDIA GPU to the AMD GPU or vice versa.

Out-of-memory issues executing on GPUs#

We have a single NVIDIA GPU with 24 GB of memory and a similar AMD GPU that is shared by all users. Also, jupyterhub’s ipython kernel will hold onto GPU memory for the notebook a user is experimenting with. One can thus frequently experience out-of-memory issues when there are many users with open processes. This can be verified by running nvidia-smi or rocm-smi from the terminal. To alleviate or overcome this issue, please consider the following:

  • Limit the amount of GPU memory you are using to 4-5 GB as much as possible.

  • Log off the playground when you don’t plan to use it.

  • Check memory usage of others’ processes and email polyblocks@polymagelabs.com if there are other users whose idle processes occupying the GPU are making it impossible to run any experiments. The admin can then clear those up.

  • Restart your jupyterhub kernel or your jupyterhub server (Stop/Start from the control panel) if the real cause is your own (stray) processes.