1 code implementation • 26 May 2024 • Bangzheng Li, Ningshan Ma, Zifan Wang
We found that R3 significantly outperforms PPO in Minigrid environments with sparse rewards and discrete action space, such as DoorKeyEnv and CrossingEnv, and moreover we found that the improvement margin of our method versus baseline PPO increases with the complexity of the environment.
no code implementations • 3 Apr 2024 • Siyi Wang, Zifan Wang, Xinlei Yi, Michael M. Zavlanos, Karl H. Johansson, Sandra Hirche
Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time.
1 code implementation • 5 Mar 2024 • Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Samuel Marks, Oam Patel, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Lin, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Ruoyu Wang, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs.
1 code implementation • 6 Feb 2024 • Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks
Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods.
no code implementations • 17 Jan 2024 • Yunze Liu, Changxi Chen, Zifan Wang, Li Yi
This paper introduces a novel approach named CrossVideo, which aims to enhance self-supervised cross-modal contrastive learning in the field of point cloud video understanding.
no code implementations • 1 Jan 2024 • Zifan Wang, Junyu Chen, Ziqing Chen, Pengwei Xie, Rui Chen, Li Yi
We further introduce a distillation-friendly demonstration generation method that automatically generates a million high-quality demonstrations suitable for learning.
no code implementations • 13 Dec 2023 • Zifan Wang, Zhuorui Ye, Haoran Wu, Junyu Chen, Li Yi
To tackle this challenging problem, we properly model the synergetic relationship between future forecasting and semantic scene completion through a novel network named SCSFNet.
no code implementations • 26 Nov 2023 • Zhihang Li, Zhao Song, Zifan Wang, Junze Yin
Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss.
no code implementations • 22 Nov 2023 • Chi Zhang, Zifan Wang, Ravi Mangal, Matt Fredrikson, Limin Jia, Corina Pasareanu
They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities.
1 code implementation • 6 Nov 2023 • Norman Mu, Sarah Chen, Zifan Wang, Sizhe Chen, David Karamardian, Lulwa Aljeraisy, Basel Alomair, Dan Hendrycks, David Wagner
As Large Language Models (LLMs) are deployed with increasing real-world responsibilities, it is important to be able to specify and constrain the behavior of these systems in a reliable manner.
no code implementations • 13 Oct 2023 • Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson
There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research?
1 code implementation • 4 Oct 2023 • Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson
A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training.
1 code implementation • 2 Oct 2023 • Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience.
Ranked #3 on Question Answering on TruthfulQA
no code implementations • 22 Sep 2023 • Zifan Wang, Kotaro Funakoshi, Manabu Okumura
This work proposes PMAN (Prompting-based Metric on ANswerability), a novel automatic evaluation metric to assess whether the generated questions are answerable by the reference answers for the QG tasks.
16 code implementations • 27 Jul 2023 • Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson
Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer).
no code implementations • 23 Mar 2023 • Zifan Wang, Yulong Gao, Siyi Wang, Michael M. Zavlanos, Alessandro Abate, Karl H. Johansson
Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL.
no code implementations • 8 Mar 2023 • Yichuan Deng, Zhao Song, Zifan Wang, Han Zhang
The kernel method, which is commonly used in learning algorithms such as Support Vector Machines (SVMs), has also been applied in PCA algorithms.
2 code implementations • NeurIPS 2023 • Kai Hu, Andy Zou, Zifan Wang, Klas Leino, Matt Fredrikson
We show that fast ways of bounding the Lipschitz constant for conventional ResNets are loose, and show how to address this by designing a new residual block, leading to the \emph{Linear ResNet} (LiResNet) architecture.
no code implementations • 26 Jan 2023 • Matt Fredrikson, Kaiji Lu, Saranya Vijayakumar, Somesh Jha, Vijay Ganesh, Zifan Wang
Recent techniques that integrate \emph{solver layers} into Deep Neural Networks (DNNs) have shown promise in bridging a long-standing gap between inductive learning and symbolic reasoning techniques.
no code implementations • CVPR 2023 • Zifan Wang, Nan Ding, Tomer Levinboim, Xi Chen, Radu Soricut
Recent research in robust optimization has shown an overfitting-like phenomenon in which models trained against adversarial attacks exhibit higher robustness on the training set compared to the test set.
no code implementations • 6 Sep 2022 • Zifan Wang, Yi Shen, Zachary I. Bell, Scott Nivison, Michael M. Zavlanos, Karl H. Johansson
Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions.
1 code implementation • 1 Jun 2022 • Ravi Mangal, Zifan Wang, Chi Zhang, Klas Leino, Corina Pasareanu, Matt Fredrikson
We present \emph{cascade attack} (CasA), an adversarial attack against cascading ensembles, and show that: (1) there exists an adversarial input for up to 88\% of the samples where the ensemble claims to be certifiably robust and accurate; and (2) the accuracy of a cascading ensemble under our attack is as low as 11\% when it claims to be certifiably robust and accurate on 97\% of the test set.
no code implementations • 24 May 2022 • Zifan Wang, Yuhang Yao, Chaoran Zhang, Han Zhang, Youjie Kang, Carlee Joe-Wong, Matt Fredrikson, Anupam Datta
Second, our analytical and empirical results demonstrate that feature attribution methods cannot capture the nonlinear effect of edge features, while existing subgraph explanation methods are not faithful.
no code implementations • 16 Mar 2022 • Zifan Wang, Yi Shen, Michael M. Zavlanos
To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions.
no code implementations • 29 Jan 2022 • Zifan Wang, Michael F Hyland, Younghun Bahk, Navjyoth JS Sarma
Shared-ride mobility services that incorporate traveler walking legs aim to reduce vehicle-kilometers-travelled (VKT), vehicle-hours-travelled (VHT), request rejections, fleet size, or some combination of these factors, compared to door-to-door (D2D) shared-ride services.
1 code implementation • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2021 • Dixi Yao, Liyao Xiang, Zifan Wang, Jiayu Xu, Chao Li, Xinbing Wang
Experimental results show that our system not only adapts well to, but also draws on the varying contexts, delivering a practical and efficient solution to edge-cloud model training.
Ranked #2 on Recommendation Systems on MovieLens 1M (Precision metric)
no code implementations • ICLR 2022 • Emily Black, Zifan Wang, Matt Fredrikson, Anupam Datta
Counterfactual examples are one of the most commonly-cited methods for explaining the predictions of machine learning models in key areas such as finance and medical diagnosis.
1 code implementation • 20 Mar 2021 • Zifan Wang, Matt Fredrikson, Anupam Datta
Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's ground-truth class.
2 code implementations • 16 Feb 2021 • Klas Leino, Zifan Wang, Matt Fredrikson
We show that widely-used architectures can be easily adapted to this objective by incorporating efficient global Lipschitz bounds into the network, yielding certifiably-robust models by construction that achieve state-of-the-art verifiable accuracy.
no code implementations • NeurIPS 2021 • Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta
While attention is all you need may be proving true, we do not know why: attention-based transformer models such as BERT are superior but how information flows from input tokens to output predictions are unclear.
no code implementations • 28 Sep 2020 • Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta
While “attention is all you need” may be proving true, we do not yet know why: attention-based transformer models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subject-verb number agreement(SVA) is uncertain.
no code implementations • 17 Sep 2020 • Xuan Chen, Zifan Wang, Yucai Fan, Bonan Jin, Piotr Mardziel, Carlee Joe-Wong, Anupam Datta
Feature attribution has been a foundational building block for explaining the input feature importance in supervised learning with Deep Neural Network (DNNs), but face new challenges when applied to deep Reinforcement Learning (RL). We propose a new approach to explaining deep RL actions by defining a class of \emph{action reconstruction} functions that mimic the behavior of a network in deep RL.
1 code implementation • NeurIPS 2020 • Zifan Wang, Haofan Wang, Shakul Ramkumar, Matt Fredrikson, Piotr Mardziel, Anupam Datta
Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs.
1 code implementation • 6 May 2020 • Zifan Wang, Yilin Yang, Ankit Shrivastava, Varun Rawal, Zihao Ding
We show that the vulnerability of the model against tiny distortions is a result of the model is relying on the high-frequency features, the target features of the adversarial (black and white-box) attackers, to make the prediction.
no code implementations • 19 Feb 2020 • Zifan Wang, Piotr Mardziel, Anupam Datta, Matt Fredrikson
In this work we expand the foundationsof human-understandable concepts with which attributionscan be interpreted beyond "importance" and its visualization; we incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality.
9 code implementations • 3 Oct 2019 • Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, Xia Hu
Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions.