1 code implementation • 12 Dec 2023 • Chengting Yu, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Erping Li
Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization.