Search Results for author: Kevin Swersky

Found 50 papers, 27 papers with code

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

no code implementations • 27 May 2024 • Cristina N. Vasconcelos, Abdullah Rashwan Austin Waters, Trevor Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston, Su Wang, Wenlei Zhou, Kevin Swersky, David J. Fleet, Jason M. Baldridge, Oliver Wang

Building on this core model, we propose a greedy algorithm that grows the architecture into high-resolution end-to-end models, while preserving the integrity of the pre-trained representation, stabilizing training, and reducing the need for large high-resolution datasets.

Paper
Add Code

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

no code implementations • 8 Mar 2024 • Gemini Team, Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry, Lepikhin, Timothy Lillicrap, Jean-Baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, James Molloy, Jilin Chen, Michael Isard, Paul Barham, Tom Hennigan, Ross Mcilroy, Melvin Johnson, Johan Schalkwyk, Eli Collins, Eliza Rutherford, Erica Moreira, Kareem Ayoub, Megha Goel, Clemens Meyer, Gregory Thornton, Zhen Yang, Henryk Michalewski, Zaheer Abbas, Nathan Schucher, Ankesh Anand, Richard Ives, James Keeling, Karel Lenc, Salem Haykal, Siamak Shakeri, Pranav Shyam, Aakanksha Chowdhery, Roman Ring, Stephen Spencer, Eren Sezener, Luke Vilnis, Oscar Chang, Nobuyuki Morioka, George Tucker, Ce Zheng, Oliver Woodman, Nithya Attaluri, Tomas Kocisky, Evgenii Eltyshev, Xi Chen, Timothy Chung, Vittorio Selo, Siddhartha Brahma, Petko Georgiev, Ambrose Slone, Zhenkai Zhu, James Lottes, Siyuan Qiao, Ben Caine, Sebastian Riedel, Alex Tomala, Martin Chadwick, Juliette Love, Peter Choy, Sid Mittal, Neil Houlsby, Yunhao Tang, Matthew Lamm, Libin Bai, Qiao Zhang, Luheng He, Yong Cheng, Peter Humphreys, Yujia Li, Sergey Brin, Albin Cassirer, Yingjie Miao, Lukas Zilka, Taylor Tobin, Kelvin Xu, Lev Proleev, Daniel Sohn, Alberto Magni, Lisa Anne Hendricks, Isabel Gao, Santiago Ontanon, Oskar Bunyan, Nathan Byrd, Abhanshu Sharma, Biao Zhang, Mario Pinto, Rishika Sinha, Harsh Mehta, Dawei Jia, Sergi Caelles, Albert Webson, Alex Morris, Becca Roelofs, Yifan Ding, Robin Strudel, Xuehan Xiong, Marvin Ritter, Mostafa Dehghani, Rahma Chaabouni, Abhijit Karmarkar, Guangda Lai, Fabian Mentzer, Bibo Xu, Yaguang Li, Yujing Zhang, Tom Le Paine, Alex Goldin, Behnam Neyshabur, Kate Baumli, Anselm Levskaya, Michael Laskin, Wenhao Jia, Jack W. Rae, Kefan Xiao, Antoine He, Skye Giordano, Lakshman Yagati, Jean-Baptiste Lespiau, Paul Natsev, Sanjay Ganapathy, Fangyu Liu, Danilo Martins, Nanxin Chen, Yunhan Xu, Megan Barnes, Rhys May, Arpi Vezer, Junhyuk Oh, Ken Franko, Sophie Bridgers, Ruizhe Zhao, Boxi Wu, Basil Mustafa, Sean Sechrist, Emilio Parisotto, Thanumalayan Sankaranarayana Pillai, Chris Larkin, Chenjie Gu, Christina Sorokin, Maxim Krikun, Alexey Guseynov, Jessica Landon, Romina Datta, Alexander Pritzel, Phoebe Thacker, Fan Yang, Kevin Hui, Anja Hauth, Chih-Kuan Yeh, David Barker, Justin Mao-Jones, Sophia Austin, Hannah Sheahan, Parker Schuh, James Svensson, Rohan Jain, Vinay Ramasesh, Anton Briukhov, Da-Woon Chung, Tamara von Glehn, Christina Butterfield, Priya Jhakra, Matthew Wiethoff, Justin Frye, Jordan Grimstad, Beer Changpinyo, Charline Le Lan, Anna Bortsova, Yonghui Wu, Paul Voigtlaender, Tara Sainath, Shane Gu, Charlotte Smith, Will Hawkins, Kris Cao, James Besley, Srivatsan Srinivasan, Mark Omernick, Colin Gaffney, Gabriela Surita, Ryan Burnell, Bogdan Damoc, Junwhan Ahn, Andrew Brock, Mantas Pajarskas, Anastasia Petrushkina, Seb Noury, Lorenzo Blanco, Kevin Swersky, Arun Ahuja, Thi Avrahami, Vedant Misra, Raoul de Liedekerke, Mariko Iinuma, Alex Polozov, Sarah York, George van den Driessche, Paul Michel, Justin Chiu, Rory Blevins, Zach Gleicher, Adrià Recasens, Alban Rrustemi, Elena Gribovskaya, Aurko Roy, Wiktor Gworek, Sébastien M. R. Arnold, Lisa Lee, James Lee-Thorp, Marcello Maggioni, Enrique Piqueras, Kartikeya Badola, Sharad Vikram, Lucas Gonzalez, Anirudh Baddepudi, Evan Senter, Jacob Devlin, James Qin, Michael Azzam, Maja Trebacz, Martin Polacek, Kashyap Krishnakumar, Shuo-Yiin Chang, Matthew Tung, Ivo Penchev, Rishabh Joshi, Kate Olszewska, Carrie Muir, Mateo Wirth, Ale Jakse Hartman, Josh Newlan, Sheleem Kashem, Vijay Bolina, Elahe Dabir, Joost van Amersfoort, Zafarali Ahmed, James Cobon-Kerr, Aishwarya Kamath, Arnar Mar Hrafnkelsson, Le Hou, Ian Mackinnon, Alexandre Frechette, Eric Noland, Xiance Si, Emanuel Taropa, Dong Li, Phil Crone, Anmol Gulati, Sébastien Cevey, Jonas Adler, Ada Ma, David Silver, Simon Tokumine, Richard Powell, Stephan Lee, Kiran Vodrahalli, Samer Hassan, Diana Mincu, Antoine Yang, Nir Levine, Jenny Brennan, Mingqiu Wang, Sarah Hodkinson, Jeffrey Zhao, Josh Lipschultz, Aedan Pope, Michael B. Chang, Cheng Li, Laurent El Shafey, Michela Paganini, Sholto Douglas, Bernd Bohnet, Fabio Pardo, Seth Odoom, Mihaela Rosca, Cicero Nogueira dos santos, Kedar Soparkar, Arthur Guez, Tom Hudson, Steven Hansen, Chulayuth Asawaroengchai, Ravi Addanki, Tianhe Yu, Wojciech Stokowiec, Mina Khan, Justin Gilmer, Jaehoon Lee, Carrie Grimes Bostock, Keran Rong, Jonathan Caton, Pedram Pejman, Filip Pavetic, Geoff Brown, Vivek Sharma, Mario Lučić, Rajkumar Samuel, Josip Djolonga, Amol Mandhane, Lars Lowe Sjösund, Elena Buchatskaya, Elspeth White, Natalie Clay, Jiepu Jiang, Hyeontaek Lim, Ross Hemsley, Zeyncep Cankara, Jane Labanowski, Nicola De Cao, David Steiner, Sayed Hadi Hashemi, Jacob Austin, Anita Gergely, Tim Blyth, Joe Stanton, Kaushik Shivakumar, Aditya Siddhant, Anders Andreassen, Carlos Araya, Nikhil Sethi, Rakesh Shivanna, Steven Hand, Ankur Bapna, Ali Khodaei, Antoine Miech, Garrett Tanzer, Andy Swing, Shantanu Thakoor, Lora Aroyo, Zhufeng Pan, Zachary Nado, Jakub Sygnowski, Stephanie Winkler, Dian Yu, Mohammad Saleh, Loren Maggiore, Yamini Bansal, Xavier Garcia, Mehran Kazemi, Piyush Patil, Ishita Dasgupta, Iain Barr, Minh Giang, Thais Kagohara, Ivo Danihelka, Amit Marathe, Vladimir Feinberg, Mohamed Elhawaty, Nimesh Ghelani, Dan Horgan, Helen Miller, Lexi Walker, Richard Tanburn, Mukarram Tariq, Disha Shrivastava, Fei Xia, Qingze Wang, Chung-Cheng Chiu, Zoe Ashwood, Khuslen Baatarsukh, Sina Samangooei, Raphaël Lopez Kaufman, Fred Alcober, Axel Stjerngren, Paul Komarek, Katerina Tsihlas, Anudhyan Boral, Ramona Comanescu, Jeremy Chen, Ruibo Liu, Chris Welty, Dawn Bloxwich, Charlie Chen, Yanhua Sun, Fangxiaoyu Feng, Matthew Mauger, Xerxes Dotiwalla, Vincent Hellendoorn, Michael Sharman, Ivy Zheng, Krishna Haridasan, Gabe Barth-Maron, Craig Swanson, Dominika Rogozińska, Alek Andreev, Paul Kishan Rubenstein, Ruoxin Sang, Dan Hurt, Gamaleldin Elsayed, Renshen Wang, Dave Lacey, Anastasija Ilić, Yao Zhao, Adam Iwanicki, Alejandro Lince, Alexander Chen, Christina Lyu, Carl Lebsack, Jordan Griffith, Meenu Gaba, Paramjit Sandhu, Phil Chen, Anna Koop, Ravi Rajwar, Soheil Hassas Yeganeh, Solomon Chang, Rui Zhu, Soroush Radpour, Elnaz Davoodi, Ving Ian Lei, Yang Xu, Daniel Toyama, Constant Segal, Martin Wicke, Hanzhao Lin, Anna Bulanova, Adrià Puigdomènech Badia, Nemanja Rakićević, Pablo Sprechmann, Angelos Filos, Shaobo Hou, Víctor Campos, Nora Kassner, Devendra Sachan, Meire Fortunato, Chimezie Iwuanyanwu, Vitaly Nikolaev, Balaji Lakshminarayanan, Sadegh Jazayeri, Mani Varadarajan, Chetan Tekur, Doug Fritz, Misha Khalman, David Reitter, Kingshuk Dasgupta, Shourya Sarcar, Tina Ornduff, Javier Snaider, Fantine Huot, Johnson Jia, Rupert Kemp, Nejc Trdin, Anitha Vijayakumar, Lucy Kim, Christof Angermueller, Li Lao, Tianqi Liu, Haibin Zhang, David Engel, Somer Greene, Anaïs White, Jessica Austin, Lilly Taylor, Shereen Ashraf, Dangyi Liu, Maria Georgaki, Irene Cai, Yana Kulizhskaya, Sonam Goenka, Brennan Saeta, Ying Xu, Christian Frank, Dario de Cesare, Brona Robenek, Harry Richardson, Mahmoud Alnahlawi, Christopher Yew, Priya Ponnapalli, Marco Tagliasacchi, Alex Korchemniy, Yelin Kim, Dinghua Li, Bill Rosgen, Kyle Levin, Jeremy Wiesner, Praseem Banzal, Praveen Srinivasan, Hongkun Yu, Çağlar Ünlü, David Reid, Zora Tung, Daniel Finchelstein, Ravin Kumar, Andre Elisseeff, Jin Huang, Ming Zhang, Ricardo Aguilar, Mai Giménez, Jiawei Xia, Olivier Dousse, Willi Gierke, Damion Yates, Komal Jalan, Lu Li, Eri Latorre-Chimoto, Duc Dung Nguyen, Ken Durden, Praveen Kallakuri, Yaxin Liu, Matthew Johnson, Tomy Tsai, Alice Talbert, Jasmine Liu, Alexander Neitz, Chen Elkind, Marco Selvi, Mimi Jasarevic, Livio Baldini Soares, Albert Cui, Pidong Wang, Alek Wenjiao Wang, Xinyu Ye, Krystal Kallarackal, Lucia Loher, Hoi Lam, Josef Broder, Dan Holtmann-Rice, Nina Martin, Bramandia Ramadhana, Mrinal Shukla, Sujoy Basu, Abhi Mohan, Nick Fernando, Noah Fiedel, Kim Paterson, Hui Li, Ankush Garg, Jane Park, DongHyun Choi, Diane Wu, Sankalp Singh, Zhishuai Zhang, Amir Globerson, Lily Yu, John Carpenter, Félix de Chaumont Quitry, Carey Radebaugh, Chu-Cheng Lin, Alex Tudor, Prakash Shroff, Drew Garmon, Dayou Du, Neera Vats, Han Lu, Shariq Iqbal, Alex Yakubovich, Nilesh Tripuraneni, James Manyika, Haroon Qureshi, Nan Hua, Christel Ngani, Maria Abi Raad, Hannah Forbes, Jeff Stanway, Mukund Sundararajan, Victor Ungureanu, Colton Bishop, Yunjie Li, Balaji Venkatraman, Bo Li, Chloe Thornton, Salvatore Scellato, Nishesh Gupta, Yicheng Wang, Ian Tenney, Xihui Wu, Ashish Shenoy, Gabriel Carvajal, Diana Gage Wright, Ben Bariach, Zhuyun Xiao, Peter Hawkins, Sid Dalmia, Clement Farabet, Pedro Valenzuela, Quan Yuan, Ananth Agarwal, Mia Chen, Wooyeol Kim, Brice Hulse, Nandita Dukkipati, Adam Paszke, Andrew Bolt, Kiam Choo, Jennifer Beattie, Jennifer Prendki, Harsha Vashisht, Rebeca Santamaria-Fernandez, Luis C. Cobo, Jarek Wilkiewicz, David Madras, Ali Elqursh, Grant Uy, Kevin Ramirez, Matt Harvey, Tyler Liechty, Heiga Zen, Jeff Seibert, Clara Huiyi Hu, Andrey Khorlin, Maigo Le, Asaf Aharoni, Megan Li, Lily Wang, Sandeep Kumar, Norman Casagrande, Jay Hoover, Dalia El Badawy, David Soergel, Denis Vnukov, Matt Miecnikowski, Jiri Simsa, Praveen Kumar, Thibault Sellam, Daniel Vlasic, Samira Daruki, Nir Shabat, John Zhang, Guolong Su, Jiageng Zhang, Jeremiah Liu, Yi Sun, Evan Palmer, Alireza Ghaffarkhah, Xi Xiong, Victor Cotruta, Michael Fink, Lucas Dixon, Ashwin Sreevatsa, Adrian Goedeckemeyer, Alek Dimitriev, Mohsen Jafari, Remi Crocker, Nicholas FitzGerald, Aviral Kumar, Sanjay Ghemawat, Ivan Philips, Frederick Liu, Yannie Liang, Rachel Sterneck, Alena Repina, Marcus Wu, Laura Knight, Marin Georgiev, Hyo Lee, Harry Askham, Abhishek Chakladar, Annie Louis, Carl Crous, Hardie Cate, Dessie Petrova, MICHAEL QUINN, Denese Owusu-Afriyie, Achintya Singhal, Nan Wei, Solomon Kim, Damien Vincent, Milad Nasr, Christopher A. Choquette-Choo, Reiko Tojo, Shawn Lu, Diego de Las Casas, Yuchung Cheng, Tolga Bolukbasi, Katherine Lee, Saaber Fatehi, Rajagopal Ananthanarayanan, Miteyan Patel, Charbel Kaed, Jing Li, Shreyas Rammohan Belle, Zhe Chen, Jaclyn Konzelmann, Siim Põder, Roopal Garg, Vinod Koverkathu, Adam Brown, Chris Dyer, Rosanne Liu, Azade Nova, Jun Xu, Alanna Walton, Alicia Parrish, Mark Epstein, Sara McCarthy, Slav Petrov, Demis Hassabis, Koray Kavukcuoglu, Jeffrey Dean, Oriol Vinyals

In this report, we present the latest model of the Gemini family, Gemini 1. 5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio.

Ranked #20 on Code Generation on HumanEval

Code Generation Math Word Problem Solving +1

Paper
Add Code

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

no code implementations • 11 Dec 2023 • Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel

To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times.

Math

Paper
Add Code

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

no code implementations • 8 Nov 2023 • C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant, Peter J. Liu, Roman Novak, Yundi Qian, Noah Fiedel, Jascha Sohl-Dickstein

We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment.

Language Modelling

Paper
Add Code

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

1 code implementation • 29 Sep 2023 • Kevin Clark, Paul Vicol, Kevin Swersky, David J Fleet

We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models.

198

Paper
Code

Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

no code implementations • 21 Apr 2023 • Paul Vicol, Zico Kolter, Kevin Swersky

We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single.

Hyperparameter Optimization

Paper
Add Code

Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks

1 code implementation • 1 Nov 2022 • Sadegh Mahdavi, Kevin Swersky, Thomas Kipf, Milad Hashemi, Christos Thrampoulidis, Renjie Liao

In this paper, we study the OOD generalization of neural algorithmic reasoning tasks, where the goal is to learn an algorithm (e. g., sorting, breadth-first search, and depth-first search) from input-output pairs using deep neural networks.

Data Augmentation Out-of-Distribution Generalization

Paper
Code

CUF: Continuous Upsampling Filters

no code implementations • CVPR 2023 • Cristina Vasconcelos, Cengiz Oztireli, Mark Matthews, Milad Hashemi, Kevin Swersky, Andrea Tagliasacchi

Neural fields have rapidly been adopted for representing 3D signals, but their application to more classical 2D image-processing has been relatively limited.

Image Super-Resolution

Paper
Add Code

Learning to Improve Code Efficiency

no code implementations • 9 Aug 2022 • Binghong Chen, Daniel Tarlow, Kevin Swersky, Martin Maas, Pablo Heiber, Ashish Naik, Milad Hashemi, Parthasarathy Ranganathan

To automatically learn these hints from the dataset, we propose a novel discrete variational auto-encoder, where each discrete latent variable represents a different learned category of code-edit that increases performance.

Paper
Add Code

Pre-training helps Bayesian optimization too

1 code implementation • 7 Jul 2022 • Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani

Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully.

Bayesian Optimization

Paper
Code

Data-Driven Offline Optimization For Architecting Hardware Accelerators

1 code implementation • ICLR 2022 • Aviral Kumar, Amir Yazdanbakhsh, Milad Hashemi, Kevin Swersky, Sergey Levine

An alternative paradigm is to use a "data-driven", offline approach that utilizes logged simulation data, to architect hardware accelerators, without needing any form of simulations.

Computer Architecture and Systems

33,133

Paper
Code

Pre-trained Gaussian processes for Bayesian optimization

4 code implementations • 16 Sep 2021 • Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani

Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully.

Bayesian Optimization Gaussian Processes

Paper
Code

Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks

1 code implementation • 12 Feb 2021 • Yujun Yan, Milad Hashemi, Kevin Swersky, Yaoqing Yang, Danai Koutra

We are the first to take a unified perspective to jointly explain the oversmoothing and heterophily problems at the node level.

Ranked #6 on Node Classification on Non-Homophilic (Heterophilic) Graphs on Cornell (48%/32%/20% fixed splits)

Node Classification on Non-Homophilic (Heterophilic) Graphs

Paper
Code

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

1 code implementation • 8 Feb 2021 • Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison

We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables.

Paper
Code

Apollo: Transferable Architecture Exploration

no code implementations • 2 Feb 2021 • Amir Yazdanbakhsh, Christof Angermueller, Berkin Akin, Yanqi Zhou, Albin Jones, Milad Hashemi, Kevin Swersky, Satrajit Chatterjee, Ravi Narayanaswami, James Laudon

We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements).

Paper
Add Code

Human 3D keypoints via spatial uncertainty modeling

no code implementations • 18 Dec 2020 • Francis Williams, Or Litany, Avneesh Sud, Kevin Swersky, Andrea Tagliasacchi

We introduce a technique for 3D human keypoint estimation that directly models the notion of spatial uncertainty of a keypoint.

Keypoint Estimation

Paper
Add Code

No MCMC for me: Amortized sampling for fast and stable training of energy-based models

1 code implementation • ICLR 2021 • Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud

Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.

Paper
Code

Learned Hardware/Software Co-Design of Neural Accelerators

no code implementations • 5 Oct 2020 • Zhan Shi, Chirag Sakhuja, Milad Hashemi, Kevin Swersky, Calvin Lin

The use of deep learning has grown at an exponential rate, giving rise to numerous specialized hardware and software systems for deep learning.

Bayesian Optimization

Paper
Add Code

Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach

no code implementations • ICML 2020 • Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig Boutilier

We develop several scalable techniques to solve the matching problem, and also draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.

Fairness Recommendation Systems

Paper
Add Code

An Imitation Learning Approach for Cache Replacement

1 code implementation • ICML 2020 • Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, Junwhan Ahn

While directly applying Belady's is infeasible since the future is unknown, we train a policy conditioned only on past accesses that accurately approximates Belady's even on diverse and complex access patterns, and call this approach Parrot.

Imitation Learning

33,128

Paper
Code

Big Self-Supervised Models are Strong Semi-Supervised Learners

8 code implementations • NeurIPS 2020 • Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton

The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge.

Ranked #5 on Semi-Supervised Image Classification on ImageNet - 1% labeled data

Self-Supervised Image Classification Semi-Supervised Image Classification

3,961

Paper
Code

Neural Execution Engines: Learning to Execute Subroutines

1 code implementation • NeurIPS 2020 • Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi

A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms.

Learning to Execute

Paper
Code

SentenceMIM: A Latent Variable Language Model

1 code implementation • 18 Feb 2020 • Micha Livne, Kevin Swersky, David J. Fleet

MIM learning encourages high mutual information between observations and latent variables, and is robust against posterior collapse.

Ranked #1 on Question Answering on YahooCQA (using extra training data)

Language Modelling Question Answering +1

Paper
Code

NEURAL EXECUTION ENGINES

no code implementations • ICLR 2020 • Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi

Turing complete computation and reasoning are often regarded as necessary pre- cursors to general intelligence.

Paper
Add Code

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

4 code implementations • ICLR 2020 • Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky

In this setting, the standard class probabilities can be easily computed as well as unnormalized values of p(x) and p(x|y).

413

Paper
Code

MIM: Mutual Information Machine

1 code implementation • 8 Oct 2019 • Micha Livne, Kevin Swersky, David J. Fleet

Experiments show that MIM learns representations with high mutual information, consistent encoding and decoding distributions, effective latent clustering, and data log likelihood comparable to VAE, while avoiding posterior collapse.

Clustering Decoder

Paper
Code

High Mutual Information in Representation Learning with Symmetric Variational Inference

no code implementations • 4 Oct 2019 • Micha Livne, Kevin Swersky, David J. Fleet

We introduce the Mutual Information Machine (MIM), a novel formulation of representation learning, using a joint distribution over the observations and latent state in an encoder/decoder framework.

Decoder Representation Learning +2

Paper
Add Code

Learning Execution through Neural Code Fusion

no code implementations • ICLR 2020 • Zhan Shi, Kevin Swersky, Daniel Tarlow, Parthasarathy Ranganathan, Milad Hashemi

In this work, we propose a new approach to use GNNs to learn fused representations of general source code and its execution.

Transfer Learning

Paper
Add Code

Flexibly Fair Representation Learning by Disentanglement

no code implementations • 6 Jun 2019 • Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa A. Weis, Kevin Swersky, Toniann Pitassi, Richard Zemel

We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes.

Disentanglement Fairness +1

Paper
Add Code

Learning Sparse Networks Using Targeted Dropout

2 code implementations • 31 May 2019 • Aidan N. Gomez, Ivan Zhang, Siddhartha Rao Kamalakara, Divyam Madaan, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton

Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights.

Network Pruning Neural Network Compression

257

Paper
Code

Graph Normalizing Flows

1 code implementation • NeurIPS 2019 • Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky

We introduce graph normalizing flows: a new, reversible graph neural network model for prediction and generation.

Graph Neural Network

Paper
Code

Neural Networks for Modeling Source Code Edits

no code implementations • 4 Apr 2019 • Rui Zhao, David Bieber, Kevin Swersky, Daniel Tarlow

In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files.

Paper
Add Code

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

13 code implementations • ICLR 2020 • Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle

Few-shot classification refers to learning a classifier for new classes given only a few examples.

Ranked #7 on Few-Shot Image Classification on Meta-Dataset Rank

Few-Shot Image Classification General Classification +1

742

Paper
Code

Targeted Dropout

1 code implementation • NIPS Workshop CDNNRIA 2018 • Aidan N. Gomez, Ivan Zhang, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton

Neural networks are extremely flexible models due to their large number of parameters, which is beneficial for learning, but also highly redundant.

257

Paper
Code

Learning Memory Access Patterns

no code implementations • ICML 2018 • Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, Parthasarathy Ranganathan

In this paper, we demonstrate the potential of deep learning to address the von Neumann bottleneck of memory performance.

BIG-bench Machine Learning

Paper
Add Code

Meta-Learning for Semi-Supervised Few-Shot Classification

9 code implementations • ICLR 2018 • Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, Richard S. Zemel

To address this paradigm, we propose novel extensions of Prototypical Networks (Snell et al., 2017) that are augmented with the ability to use unlabeled examples when producing prototypes.

General Classification Meta-Learning

546

Paper
Code

An online sequence-to-sequence model for noisy speech recognition

no code implementations • 16 Jun 2017 • Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.

Noisy Speech Recognition speech-recognition

Paper
Add Code

Learning Hard Alignments with Variational Inference

no code implementations • 16 May 2017 • Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly

There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition.

Hard Attention Image Captioning +5

Paper
Add Code

Prototypical Networks for Few-shot Learning

42 code implementations • NeurIPS 2017 • Jake Snell, Kevin Swersky, Richard S. Zemel

We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class.

Ranked #1 on Few-Shot Image Classification on CUB 200 50-way (0-shot)

Few-Shot Image Classification General Classification +3

2,562

Paper
Code

The Variational Fair Autoencoder

2 code implementations • 3 Nov 2015 • Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, Richard Zemel

We investigate the problem of learning representations that are invariant to certain nuisance or sensitive factors of variation in the data while retaining as much of the remaining information as possible.

Ranked #4 on Sentiment Analysis on Multi-Domain Sentiment Dataset

General Classification Sentiment Analysis

Paper
Code

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions

no code implementations • ICCV 2015 • Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov

One of the main challenges in Zero-Shot Learning of visual categories is gathering semantic attributes to accompany images.

Zero-Shot Learning

Paper
Add Code

Scalable Bayesian Optimization Using Deep Neural Networks

4 code implementations • 19 Feb 2015 • Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat, Ryan P. Adams

Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations.

Ranked #156 on Image Classification on CIFAR-10

Bayesian Optimization Caption Generation +4

132

Paper
Code

Generative Moment Matching Networks

3 code implementations • 10 Feb 2015 • Yujia Li, Kevin Swersky, Richard Zemel

We consider the problem of learning deep generative models from data.

Generative Adversarial Network Two-sample testing

148

Paper
Code

Learning unbiased features

no code implementations • 17 Dec 2014 • Yujia Li, Kevin Swersky, Richard Zemel

Different forms of representation learning can be derived from alternative definitions of unwanted bias, e. g., bias to particular tasks, domains, or irrelevant underlying data dimensions.

Domain Adaptation Representation Learning +1

Paper
Add Code

Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces

no code implementations • 14 Sep 2014 • Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, Michael A. Osborne

In practical Bayesian optimization, we must often search over structures with differing numbers of parameters.

Bayesian Optimization

Paper
Add Code

Freeze-Thaw Bayesian Optimization

1 code implementation • 16 Jun 2014 • Kevin Swersky, Jasper Snoek, Ryan Prescott Adams

In this paper we develop a dynamic form of Bayesian optimization for machine learning models with the goal of rapidly finding good hyperparameter settings.

Bayesian Optimization BIG-bench Machine Learning

Paper
Code

Input Warping for Bayesian Optimization of Non-stationary Functions

1 code implementation • 5 Feb 2014 • Jasper Snoek, Kevin Swersky, Richard S. Zemel, Ryan P. Adams

Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions.

Bayesian Optimization Gaussian Processes

1,540

Paper
Code

Multi-Task Bayesian Optimization

1 code implementation • NeurIPS 2013 • Kevin Swersky, Jasper Snoek, Ryan P. Adams

We demonstrate the utility of this new acquisition function by utilizing a small dataset in order to explore hyperparameter settings for a large dataset.

Ranked #93 on Image Classification on STL-10

Bayesian Optimization Gaussian Processes +1

1,540

Paper
Code

Learning Fair Representations

2 code implementations • International Conference on Machine Learning 2013 • Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, Cynthia Dwork

We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly).

Classification Fairness +1

Paper
Code

Cardinality Restricted Boltzmann Machines

no code implementations • NeurIPS 2012 • Kevin Swersky, Ilya Sutskever, Daniel Tarlow, Richard S. Zemel, Ruslan R. Salakhutdinov, Ryan P. Adams

The Restricted Boltzmann Machine (RBM) is a popular density model that is also good for extracting features.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.