no code implementations • 15 Mar 2024 • Hyungjun Oh, Kihong Kim, JaeMin Kim, Sungkyun Kim, Junyeol Lee, Du-Seong Chang, Jiwon Seo
This paper presents ExeGPT, a distributed system designed for constraint-aware LLM inference.
no code implementations • 3 Oct 2021 • Hyungjun Oh, Hyeongju Kim, Jiwon Seo
In data-parallel training, we reorder the gradient computations to maximize the overlapping of computation and parameter communication; in pipeline-parallel training, we prioritize critical gradient computations to reduce the pipeline stalls. We evaluate our optimizations with twelve neural networks including a light-weight computer vision model (MobileNet) and largeNLP models (BERT and GPT-3) with up to forty eight V100 GPUs. Our scheduling algorithms effectively improve the performance of single-GPU training as well as data- and pipeline-parallel training. Compared to the respective state of the art training systems, the throughput is substantially improved for single-GPU, data-parallel, and pipeline-parallel training.