no code implementations • 5 May 2024 • Shixiong Qi, K. K. Ramakrishnan, Myungjin Lee
We leverage shared memory processing to achieve high-performance communication for hierarchical aggregation, which is commonly adopted to speed up FL aggregation at scale.
no code implementations • 8 Aug 2020 • Aditya Dhakal, Junguk Cho, Sameer G. Kulkarni, K. K. Ramakrishnan, Puneet Sharma
Spatial sharing of GPU enables multiplexing several DNNs on the GPU and can improve GPU utilization, thus improving throughput and lowering latency.