cr-lib A Coordinated Tiling and Batching Framework for Efficient GEMM on GPUs

cr-lib A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications

cr-lib A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs

cr-lib A Round-Efficient Distributed Betweenness Centrality Algorithm

cr-lib A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms

cr-lib Adaptive Sparse Matrix-Matrix Multiplication on the GPU

cr-lib AGGREGATHOR: Byzantine Machine Learning via Robust Gradient Aggregation

cr-lib An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines

cr-lib Beyond Data and Model Parallelism for Deep Neural Networks

cr-lib Beyond Human-Level Accuracy: Computational Challenges in Deep Learning

cr-lib BPPSA: Scaling Back-propagation by Parallel Scan Algorithm

cr-lib Bridging the Gap between Deep Learning and Sparse Matrix Format Selection

cr-lib Cache-Tries: Concurrent Lock-Free Hash Tries with Constant-Time Operations

cr-lib Checking Linearizability Using Hitting Families

cr-lib Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

cr-lib Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques

cr-lib Communication-avoiding parallel minimum cuts and connected components

cr-lib Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances

cr-lib Corrected trees for reliable group communication

cr-lib CUDAAdvisor: LLVM-based runtime profiling for modern GPUs