STOICC Logo

STOICC: The Sparse Tile MOsaIC Compiler

We introduce STOICC, a novel tile-based sparsity compiler that enables multiple sparse formats to coexist in the same matrix.

Inspector-Executor Tools for Sparsity

Compilers1 must optimize sparse matrix operations to achieve performance. This is commonly done in two phases:

  • The Inspector: Inspects data sparsity pattern and performs reordering, compression, and scheduling.
  • The Executor: Uses data and schedule to perform SpMM operations.

Inspector/Executor Framework

STOICC Pipeline

The STOICC Inspector takes in a sparse matrix. It then inspects the data sparsity to assign tile sparsity types, compress/reorder data, and create a schedule. This schedule is then used along with the data by an Executor written in Triton to perform the SpMM operation. Finally, the Sparse Compiler—our modified Triton compiler—is used to lower the executor code to the GPU.

STOICC

Results

We benchmark STOICC on an NVIDIA A100 (80GB) GPU with a mixture of dense tiles and 2:4 sparse tiles, where the 2:4 tiles are selected randomly. For example, “50% 2:4 Tiles” corresponds to a scenario where half of the tiles are pruned to 2:4 sparsity while the other half remain dense, resulting in an overall sparsity level of 25% for the whole matrix. The matrix sizes are chosen based on those used in the OPT family of models.

batch size 16 heterogeneous results batch size 32 heterogeneous results

We also compare STOICC performance with 100% 2:4 tiles against the CUTLASS 2:4 sparse kernel integrated in PyTorch.

2:4 comparison against CUTLASS 2:4, batch size 2k 2:4 comparison against CUTLASS 2:4, batch size 4k


References


Table of contents