STOICC: The Sparse Tile MOsaIC Compiler
We introduce STOICC, a novel tile-based sparsity compiler that enables multiple sparse formats to coexist in the same matrix.
Inspector-Executor Tools for Sparsity
Compilers1 must optimize sparse matrix operations to achieve performance. This is commonly done in two phases:
- The Inspector: Inspects data sparsity pattern and performs reordering, compression, and scheduling.
- The Executor: Uses data and schedule to perform SpMM operations.
STOICC Pipeline
The STOICC Inspector takes in a sparse matrix. It then inspects the data sparsity to assign tile sparsity types, compress/reorder data, and create a schedule. This schedule is then used along with the data by an Executor written in Triton to perform the SpMM operation. Finally, the Sparse Compiler—our modified Triton compiler—is used to lower the executor code to the GPU.
Results
We benchmark STOICC on an NVIDIA A100 (80GB) GPU with a mixture of dense tiles and 2:4 sparse tiles, where the 2:4 tiles are selected randomly. For example, “50% 2:4 Tiles” corresponds to a scenario where half of the tiles are pruned to 2:4 sparsity while the other half remain dense, resulting in an overall sparsity level of 25% for the whole matrix. The matrix sizes are chosen based on those used in the OPT family of models.
We also compare STOICC performance with 100% 2:4 tiles against the CUTLASS 2:4 sparse kernel integrated in PyTorch.