DocHero AI - Best paraphrasing and translation tool for academic and professional writing | DocHero AI - Best paraphrasing and translation tool for academic and professional writing

DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang et al. (11 total)

2022-06-30

SC22: International Conference for High Performance Computing, Networking, Storage and Analysis

10.1109/SC41404.2022.00051

494 citations

摘要

The landscape of transformer model inference is increasingly diverse in model size, model characteristics, latency and throughput requirements, hardware requirements, etc. With such diversity, designing a versatile inference system is challenging. DeepSpeed-Inference addresses these challenges by (1) a multi-GPU inference solution to minimize latency while maximizing throughput for both dense and sparse transformers when the model fits in aggregate GPU memory, and (2) a heterogeneous inference s...

或上传您的副本

上传此论文

拖拽PDF文件或点击选择文件

需要全文访问？

您可以请求该论文或上传您自己的副本，以访问包含翻译、AI摘要和对话功能的全文。