no code implementations • 25 Feb 2024 • Nadav Dym, Hannah Lawrence, Jonathan W. Siegel
Canonicalization provides an architecture-agnostic method for enforcing equivariance, with generalizations such as frame-averaging recently gaining prominence as a lightweight and flexible alternative to equivariant architectures.
no code implementations • 26 Oct 2023 • Jonathan W. Siegel, Stephan Wojtowytsch
In the case of stochastic gradient descent, the summability of $\mathbb E[f(x_n) - \inf f]$ is used to prove that $f(x_n)\to \inf f$ almost surely - an improvement on the convergence almost surely up to a subsequence which follows from the $O(1/n)$ decay estimate.
no code implementations • 28 Jul 2023 • Ronald DeVore, Robert D. Nowak, Rahul Parhi, Jonathan W. Siegel
A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces.
no code implementations • 28 Jul 2023 • Jonathan W. Siegel
The second is to determine optimal approximation rates in the uniform norm for shallow ReLU$^k$ neural networks on their variation spaces.
no code implementations • 15 Jul 2023 • Jason M. Klusowski, Jonathan W. Siegel
We study the fundamental limits of matching pursuit, or the pure greedy algorithm, for approximating a target function by a sparse linear combination of elements from a dictionary.
no code implementations • 2 Feb 2023 • Jonathan W. Siegel
Specifically, we consider the question of how efficiently, in terms of the number of parameters, deep ReLU networks can interpolate values at $N$ datapoints in the unit ball which are separated by a distance $\delta$.
no code implementations • 25 Nov 2022 • Jonathan W. Siegel
We study the problem of how efficiently, in terms of the number of parameters, deep neural networks with the ReLU activation function can approximate functions in the Sobolev spaces $W^s(L_q(\Omega))$ and Besov spaces $B^s_r(L_q(\Omega))$, with error measured in the $L_p(\Omega)$ norm.
no code implementations • 9 Aug 2022 • Qingguo Hong, Jonathan W. Siegel, Qinyang Tan, Jinchao Xu
Our empirical studies also show that neural networks with the Hat activation function are trained significantly faster using stochastic gradient descent and ADAM.
no code implementations • 28 Jun 2021 • Jonathan W. Siegel, Jinchao Xu
We study the variation space corresponding to a dictionary of functions in $L^2(\Omega)$ for a bounded domain $\Omega\subset \mathbb{R}^d$.
no code implementations • 28 Jun 2021 • Jonathan W. Siegel, Jinchao Xu
In this article, we provide a solution to this problem by proving sharp lower bounds on the approximation rates for shallow neural networks, which are obtained by lower bounding the $L^2$-metric entropy of the convex hull of the neural network basis functions.
no code implementations • 29 Jan 2021 • Jonathan W. Siegel, Jinchao Xu
This result gives sharp lower bounds on the $L^2$-approximation rates, metric entropy, and $n$-widths for variation spaces corresponding to neural networks with a range of important activation functions, including ReLU$^k$ activation functions and sigmoidal activation functions with bounded variation.
no code implementations • 14 Dec 2020 • Jonathan W. Siegel, Jinchao Xu
We show that as the smoothness index $s$ of $f$ increases, shallow neural networks with ReLU$^k$ activation function obtain an improved approximation rate up to a best possible rate of $O(n^{-(k+1)}\log(n))$ in $L^2$, independent of the dimension $d$.
Numerical Analysis Numerical Analysis 41A25
1 code implementation • 21 Aug 2020 • Jonathan W. Siegel, Jianhong Chen, Pengchuan Zhang, Jinchao Xu
The adaptive weighting we introduce corresponds to a novel regularizer based on the logarithm of the absolute value of the weights.
no code implementations • 4 Apr 2019 • Jonathan W. Siegel, Jinchao Xu
Our first result concerns the rate of approximation of a two layer neural network with a polynomially-decaying non-sigmoidal activation function.