Search Results for author: Jiyan Yang

Found 24 papers, 4 papers with code

AutoML for Large Capacity Modeling of Meta's Ranking Systems

no code implementations • 14 Nov 2023 • Hang Yin, Kuang-Hung Liu, Mengying Sun, Yuxin Chen, Buyun Zhang, Jiang Liu, Vivek Sehgal, Rudresh Rajnikant Panchal, Eugen Hotaj, Xi Liu, Daifeng Guo, Jamey Zhang, Zhou Wang, Shali Jiang, Huayu Li, Zhengxing Chen, Wen-Yen Chen, Jiyan Yang, Wei Wen

The large scale of models and tight production schedule requires AutoML to outperform human baselines by only using a small number of model evaluation trials (around 100).

Hyperparameter Optimization Neural Architecture Search

Paper
Add Code

Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale

no code implementations • 14 Nov 2023 • Wei Wen, Kuang-Hung Liu, Igor Fedorov, Xin Zhang, Hang Yin, Weiwei Chu, Kaveh Hassani, Mengying Sun, Jiang Liu, Xu Wang, Lin Jiang, Yuxin Chen, Buyun Zhang, Xi Liu, Dehua Cheng, Zhengxing Chen, Guang Zhao, Fangqiu Han, Jiyan Yang, Yuchen Hao, Liang Xiong, Wen-Yen Chen

In industry system, such as ranking system in Meta, it is unclear whether NAS algorithms from the literature can outperform production baselines because of: (1) scale - Meta ranking systems serve billions of users, (2) strong baselines - the baselines are production models optimized by hundreds to thousands of world-class engineers for years since the rise of deep learning, (3) dynamic baselines - engineers may have established new and stronger baselines during NAS search, and (4) efficiency - the search pipeline must yield results quickly in alignment with the productionization life cycle.

Neural Architecture Search

Paper
Add Code

Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking

no code implementations • 12 Jul 2023 • Xuewei Wang, Qiang Jin, Shengyu Huang, Min Zhang, Xi Liu, Zhengli Zhao, Yukun Chen, Zhengyu Zhang, Jiyan Yang, Ellie Wen, Sagar Chordia, Wenlin Chen, Qin Huang

In order to pass better ads from the early to the final stage ranking, we propose a multi-task learning framework for early stage ranking to capture multiple final stage ranking components (i. e. ads clicks and ads quality events) and their task relations.

Multi-Task Learning

Paper
Add Code

AdaTT: Adaptive Task-to-Task Fusion Network for Multitask Learning in Recommendations

1 code implementation • 11 Apr 2023 • Danwei Li, Zhengyu Zhang, Siyang Yuan, Mingze Gao, Weilin Zhang, Chaofei Yang, Xi Liu, Jiyan Yang

However, MTL research faces two challenges: 1) effectively modeling the relationships between tasks to enable knowledge sharing, and 2) jointly learning task-specific and shared knowledge.

Multi-Task Learning

Paper
Code

DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction

no code implementations • 11 Mar 2022 • Buyun Zhang, Liang Luo, Xi Liu, Jay Li, Zeliang Chen, Weilin Zhang, Xiaohan Wei, Yuchen Hao, Michael Tsang, Wenjun Wang, Yang Liu, Huayu Li, Yasmine Badr, Jongsoo Park, Jiyan Yang, Dheevatsa Mudigere, Ellie Wen

To overcome the challenge brought by DHEN's deeper and multi-layer structure in training, we propose a novel co-designed training system that can further improve the training efficiency of DHEN.

Click-Through Rate Prediction

Paper
Add Code

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

no code implementations • 12 Apr 2021 • Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, KR Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, Vijay Rao

Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers.

Paper
Add Code

CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

no code implementations • 5 Nov 2020 • Kiwan Maeng, Shivam Bharuka, Isabel Gao, Mark C. Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, Carole-Jean Wu

The paper is the first to the extent of our knowledge to perform a data-driven, in-depth analysis of applying partial recovery to recommendation models and identified a trade-off between accuracy and performance.

Paper
Add Code

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data

no code implementations • 16 Oct 2020 • Mao Ye, Dhruv Choudhary, Jiecao Yu, Ellie Wen, Zeliang Chen, Jiyan Yang, Jongsoo Park, Qiang Liu, Arun Kejariwal

To the best of our knowledge, this is the first work to provide in-depth analysis and discussion of applying pruning to online recommendation systems with non-stationary data distribution.

Recommendation Systems

Paper
Add Code

Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction

no code implementations • 29 Jun 2020 • Qingquan Song, Dehua Cheng, Hanning Zhou, Jiyan Yang, Yuandong Tian, Xia Hu

Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems, driving personalized experience for billions of consumers.

Click-Through Rate Prediction Learning-To-Rank +2

Paper
Add Code

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

no code implementations • 20 Mar 2020 • Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, Mikhail Smelyanskiy

Large-scale training is important to ensure high performance and accuracy of machine-learning models.

Distributed, Parallel, and Cluster Computing 68T05, 68M10 H.3.3; I.2.6; C.2.1

Paper
Add Code

ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training

no code implementations • 7 Mar 2020 • Qinqing Zheng, Bor-Yiing Su, Jiyan Yang, Alisson Azzolini, Qiang Wu, Ou Jin, Shri Karandikar, Hagay Lupesko, Liang Xiong, Eric Zhou

Recommendation systems are often trained with a tremendous amount of data, and distributed training is the workhorse to shorten the training time.

Click-Through Rate Prediction Recommendation Systems

Paper
Add Code

Post-Training 4-bit Quantization on Embedding Tables

no code implementations • 5 Nov 2019 • Hui Guan, Andrey Malevich, Jiyan Yang, Jongsoo Park, Hector Yuen

Continuous representations have been widely adopted in recommender systems where a large number of entities are represented using embedding vectors.

Quantization Recommendation Systems

Paper
Add Code

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

6 code implementations • 25 Sep 2019 • Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou

Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings.

Click-Through Rate Prediction Collaborative Filtering +1

3,627

Paper
Code

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

6 code implementations • 4 Sep 2019 • Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang

We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition.

Recommendation Systems

3,627

Paper
Code

A Study of BFLOAT16 for Deep Learning Training

no code implementations • 29 May 2019 • Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey

In this paper, we discuss the flow of tensors and various key operations in mixed precision training, and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16.

Image Classification Language Modelling +3

Paper
Add Code

Feature-distributed sparse regression: a screen-and-clean approach

no code implementations • NeurIPS 2016 • Jiyan Yang, Michael W. Mahoney, Michael Saunders, Yuekai Sun

Most existing approaches to distributed sparse regression assume the data is partitioned by samples.

Distributed Computing regression

Paper
Add Code

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

1 code implementation • 5 Jul 2016 • Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jiyan Yang, James Demmel, Jim Harrell, Venkat Krishnamurthy, Michael W. Mahoney, Prabhat

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms.

Distributed, Parallel, and Cluster Computing G.1.3; C.2.4

Paper
Code

Sub-sampled Newton Methods with Non-uniform Sampling

no code implementations • NeurIPS 2016 • Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, Michael W. Mahoney

As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of $\{\nabla^2 f_i(w)\}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity.

Second-order methods

Paper
Add Code

Tensor machines for learning target-specific polynomial features

no code implementations • 7 Apr 2015 • Jiyan Yang, Alex Gittens

Recent years have demonstrated that using random feature maps can significantly decrease the training and testing times of kernel-based algorithms without significantly lowering their accuracy.

Paper
Add Code

Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning

no code implementations • 12 Feb 2015 • Jiyan Yang, Yin-Lam Chow, Christopher Ré, Michael W. Mahoney

We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems---e. g., $\ell_2$ and $\ell_1$ regression problems.

regression

Paper
Add Code

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

no code implementations • 10 Feb 2015 • Jiyan Yang, Xiangrui Meng, Michael W. Mahoney

and demonstrate that $\ell_1$ and $\ell_2$ regression problems can be solved to low, medium, or high precision in existing distributed systems on up to terabyte-sized data.

regression

Paper
Add Code

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

no code implementations • 29 Dec 2014 • Haim Avron, Vikas Sindhwani, Jiyan Yang, Michael Mahoney

These approximate feature maps arise as Monte Carlo approximations to integral representations of shift-invariant kernel functions (e. g., Gaussian kernel).

Paper
Add Code

Random Laplace Feature Maps for Semigroup Kernels on Histograms

no code implementations • CVPR 2014 • Jiyan Yang, Vikas Sindhwani, Quanfu Fan, Haim Avron, Michael W. Mahoney

With the goal of accelerating the training and testing complexity of nonlinear kernel methods, several recent papers have proposed explicit embeddings of the input data into low-dimensional feature spaces, where fast linear methods can instead be used to generate approximate solutions.

Event Detection Image Classification

Paper
Add Code

Quantile Regression for Large-scale Applications

no code implementations • 1 May 2013 • Jiyan Yang, Xiangrui Meng, Michael W. Mahoney

Our empirical evaluation illustrates that our algorithm is competitive with the best previous work on small to medium-sized problems, and that in addition it can be implemented in MapReduce-like environments and applied to terabyte-sized problems.

regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.