Search Results for author: Wentao Wu

Found 22 papers, 13 papers with code

Pre-training on High Definition X-ray Images: An Experimental Study

1 code implementation • 27 Apr 2024 • Xiao Wang, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang, Chuanfu Li, Jin Tang

Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e. g., 224 $\times$ 224).

Decoder Miscellaneous

Paper
Code

State Space Model for New-Generation Network Alternative to Transformers: A Survey

1 code implementation • 15 Apr 2024 • Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, YaoWei Wang, Yonghong Tian, Jin Tang

In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM.

364

Paper
Code

Budget-aware Query Tuning: An AutoML Perspective

no code implementations • 29 Mar 2024 • Wentao Wu, Chi Wang

We further extend our study from tuning a single query to tuning a workload with multiple queries, and we call this generalized problem budget-aware workload tuning (WT), which aims for minimizing the execution time of the entire workload.

AutoML

Paper
Add Code

TablePuppet: A Generic Framework for Relational Federated Learning

1 code implementation • 23 Mar 2024 • Lijie Xu, Chulin Xie, Yiran Guo, Gustavo Alonso, Bo Li, Guoliang Li, Wei Wang, Wentao Wu, Ce Zhang

In this paper, we formalize this problem as relational federated learning (RFL).

Federated Learning

Paper
Code

Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception

1 code implementation • 15 Dec 2023 • Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai Shi, Jin Tang

To address this issue, we propose a novel vehicle-centric pre-training framework called VehicleMAE, which incorporates the structural information including the spatial structure from vehicle profile information and the semantic structure from informative high-level natural language descriptions for effective masked vehicle appearance reconstruction.

Paper
Code

VeCLIP: Improving CLIP Training via Visual-enriched Captions

1 code implementation • 11 Oct 2023 • Zhengfeng Lai, Haotian Zhang, BoWen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao

For example, VeCLIP achieves up to +25. 2% gain in COCO and Flickr30k retrieval tasks under the 12M setting.

Retrieval Text Retrieval +1

180

Paper
Code

ML-Powered Index Tuning: An Overview of Recent Progress and Open Challenges

no code implementations • 25 Aug 2023 • Tarique Siddiqui, Wentao Wu

The scale and complexity of workloads in modern cloud services have brought into sharper focus a critical challenge in automated index tuning -- the need to recommend high-quality indexes while maintaining index tuning scalability.

Paper
Add Code

MOFI: Learning Image Representations from Noisy Entity Annotated Images

1 code implementation • 13 Jun 2023 • Wentao Wu, Aleksei Timofeev, Chen Chen, BoWen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang

Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image.

Image Classification Image Retrieval +3

Paper
Code

Stochastic Gradient Descent without Full Data Shuffle

1 code implementation • 12 Jun 2022 • Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang

In this paper, we first conduct a systematic empirical study on existing data shuffling strategies, which reveals that all existing strategies have room for improvement -- they all suffer in terms of I/O performance or convergence rate.

Computational Efficiency

Paper
Code

Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines

1 code implementation • 23 Apr 2022 • Bojan Karlaš, David Dao, Matteo Interlandi, Bo Li, Sebastian Schelter, Wentao Wu, Ce Zhang

We present DataScope (ease. ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training.

BIG-bench Machine Learning Fairness

Paper
Code

VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition

3 code implementations • 19 Jul 2021 • Yang Li, Yu Shen, Wentao Zhang, Jiawei Jiang, Bolin Ding, Yaliang Li, Jingren Zhou, Zhi Yang, Wentao Wu, Ce Zhang, Bin Cui

End-to-end AutoML has attracted intensive interests from both academia and industry, which automatically searches for ML pipelines in a space induced by feature engineering, algorithm/model selection, and hyper-parameter tuning.

AutoML Feature Engineering +1

Paper
Code

OpenBox: A Generalized Black-box Optimization Service

6 code implementations • 1 Jun 2021 • Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang, Bin Cui

Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, engineering, physics, and experimental design.

Experimental Design Transfer Learning

338

Paper
Code

Towards Demystifying Serverless Machine Learning Training

1 code implementation • 17 May 2021 • Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Ce Zhang

The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML).

BIG-bench Machine Learning

Paper
Code

A Data Quality-Driven View of MLOps

no code implementations • 15 Feb 2021 • Cedric Renggli, Luka Rimanic, Nezihe Merve Gürel, Bojan Karlaš, Wentao Wu, Ce Zhang

Developing machine learning models can be seen as a process similar to the one established for traditional software development.

BIG-bench Machine Learning

Paper
Add Code

Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise

2 code implementations • 16 Oct 2020 • Cedric Renggli, Luka Rimanic, Luka Kolar, Wentao Wu, Ce Zhang

In our experience of working with domain experts who are using today's AutoML systems, a common problem we encountered is what we call "unrealistic expectations" -- when users are facing a very challenging task with a noisy data acquisition process, while being expected to achieve startlingly high accuracy with machine learning (ML).

AutoML BIG-bench Machine Learning

Paper
Code

Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

1 code implementation • 11 May 2020 • Bojan Karlaš, Peng Li, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, Ce Zhang

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data.

BIG-bench Machine Learning

Paper
Code

Data Science through the looking glass and what we found there

no code implementations • 19 Dec 2019 • Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Matteo Interlandi, Avrilia Floratou, Konstantinos Karanasos, Wentao Wu, Ce Zhang, Subru Krishnan, Carlo Curino, Markus Weimer

The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners.

Paper
Add Code

Quantitative Overfitting Management for Human-in-the-loop ML Application Development with ease.ml/meter

no code implementations • 1 Jun 2019 • Frances Ann Hubis, Wentao Wu, Ce Zhang

In particular, ease.

Management Model Selection

Paper
Add Code

Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment

no code implementations • 1 Mar 2019 • Cedric Renggli, Bojan Karlaš, Bolin Ding, Feng Liu, Kevin Schawinski, Wentao Wu, Ce Zhang

Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development.

2k BIG-bench Machine Learning

Paper
Add Code

Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads

no code implementations • 24 Aug 2017 • Tian Li, Jie Zhong, Ji Liu, Wentao Wu, Ce Zhang

We ask, as a "service provider" that manages a shared cluster of machines among all our users running machine learning workloads, what is the resource allocation strategy that maximizes the global satisfaction of all our users?

Bayesian Optimization BIG-bench Machine Learning +4

Paper
Add Code

MLBench: How Good Are Machine Learning Clouds for Binary Classification Tasks on Structured Data?

no code implementations • 29 Jul 2017 • Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, Ce Zhang

We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on mlbench.

BIG-bench Machine Learning Binary Classification +1

Paper
Add Code

Revisiting Differentially Private Regression: Lessons From Learning Theory and their Consequences

no code implementations • 20 Dec 2015 • Xi Wu, Matthew Fredrikson, Wentao Wu, Somesh Jha, Jeffrey F. Naughton

Perhaps more importantly, our theory reveals that the most basic mechanism in differential privacy, output perturbation, can be used to obtain a better tradeoff for all convex-Lipschitz-bounded learning tasks.

Learning Theory regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.