Search Results for author: Louis Feng

Found 8 papers, 4 papers with code

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

no code implementations • 19 Apr 2024 • Zhongyi Lin, Ning Sun, Pallab Bhattacharya, Xizhou Feng, Louis Feng, John D. Owens

Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve.

Paper
Add Code

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

1 code implementation • 23 May 2023 • Srinivas Sridharan, Taekyung Heo, Louis Feng, Zhaodong Wang, Matt Bergeron, Wenyin Fu, Shengbao Zheng, Brian Coutinho, Saeed Rashidi, Changhai Man, Tushar Krishna

Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware.

Benchmarking

Paper
Code

Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

1 code implementation • 3 May 2023 • Daochen Zha, Louis Feng, Liang Luo, Bhargav Bhushanam, Zirui Liu, Yusuo Hu, Jade Nie, Yuzhen Huang, Yuandong Tian, Arun Kejariwal, Xia Hu

In this work, we explore a "pre-train, and search" paradigm for efficient sharding.

Paper
Code

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

no code implementations • 16 Dec 2022 • Mingyu Liang, Wenyin Fu, Louis Feng, Zhongyi Lin, Pavani Panakanti, Shengbao Zheng, Srinivas Sridharan, Christina Delimitrou

We evaluate our methodology on several production AI models, and show that benchmarks generated with Mystique closely resemble original AI models, both in execution time and system-level metrics.

Paper
Add Code

DreamShard: Generalizable Embedding Table Placement for Recommender Systems

1 code implementation • 5 Oct 2022 • Daochen Zha, Louis Feng, Qiaoyu Tan, Zirui Liu, Kwei-Herng Lai, Bhargav Bhushanam, Yuandong Tian, Arun Kejariwal, Xia Hu

Although prior work has explored learning-based approaches for the device placement of computational graphs, embedding table placement remains to be a challenging problem because of 1) the operation fusion of embedding tables, and 2) the generalizability requirement on unseen placement tasks with different numbers of tables and/or devices.

Recommendation Systems Reinforcement Learning (RL)

Paper
Code

AutoShard: Automated Embedding Table Sharding for Recommender Systems

1 code implementation • 12 Aug 2022 • Daochen Zha, Louis Feng, Bhargav Bhushanam, Dhruv Choudhary, Jade Nie, Yuandong Tian, Jay Chae, Yinbin Ma, Arun Kejariwal, Xia Hu

This is a significant design challenge of distributed systems named embedding table sharding, i. e., how we should partition the embedding tables to balance the costs across devices, which is a non-trivial task because 1) it is hard to efficiently and precisely measure the cost, and 2) the partition problem is known to be NP-hard.

Recommendation Systems

Paper
Code

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

no code implementations • 19 Jan 2022 • Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee, John Lundell, Changkyu Kim, Arun Kejariwal, John D. Owens

We show that our general performance model not only achieves low prediction error on DLRM, which has highly customized configurations and is dominated by multiple factors but also yields comparable accuracy on other compute-bound ML models targeted by most previous methods.

Paper
Add Code

Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems

no code implementations • 4 May 2021 • Xiaocong Du, Bhargav Bhushanam, Jiecao Yu, Dhruv Choudhary, Tianxiang Gao, Sherman Wong, Louis Feng, Jongsoo Park, Yu Cao, Arun Kejariwal

Our method leverages structured sparsification to reduce computational cost without hurting the model capacity at the end of offline training so that a full-size model is available in the recurring training stage to learn new data in real-time.

Recommendation Systems

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.