no code implementations • 17 May 2024 • Jiaxiang Yu, Yiyang Liu, Ruiyang Fan, Guobing Sun
In order to solve this kind of problem, we proposes a new data augmentation method named MixCut.
1 code implementation • 22 Apr 2024 • Tyler Griggs, Xiaoxuan Liu, Jiaxiang Yu, Doyoung Kim, Wei-Lin Chiang, Alvin Cheung, Ion Stoica
Within this space, we show that there is not a linear relationship between GPU cost and performance, and identify three key LLM service characteristics that significantly affect which GPU type is the most cost effective: model request size, request rate, and latency service-level objective (SLO).