no code implementations • 14 Feb 2024 • Sudarsanan Rajasekaran, Sanjoli Narang, Anton A. Zabreyko, Manya Ghobadi
We present MLTCP, a technique to augment today's congestion control algorithms to accelerate DNN training jobs in shared GPU clusters.