no code implementations • 5 Feb 2024 • Annie Liang, Thomas Jemielita, Andy Liaw, Vladimir Svetnik, Lingkang Huang, Richard Baumgartner, Jason M. Klusowski
Recently, several adjustments to marginal permutation utilizing feature knockoffs were proposed to address this issue, such as the variable importance measure known as conditional predictive impact (CPI).
no code implementations • 1 Jan 2024 • Xin Chen, Jason M. Klusowski
This paper introduces an iterative algorithm for training additive models that enjoys favorable memory storage and computational requirements.
no code implementations • 15 Oct 2023 • Matias D. Cattaneo, Jason M. Klusowski, William G. Underwood
Random forests are popular methods for classification and regression, and many different variants have been proposed in recent years.
no code implementations • 6 Oct 2023 • Jianqing Fan, Cheng Gao, Jason M. Klusowski
This paper addresses challenges in robust transfer learning stemming from ambiguity in Bayes classifiers and weak transferable signals between the target and source distribution.
no code implementations • 18 Sep 2023 • Xin Chen, Jason M. Klusowski, Yan Shuo Tan
In this paper, we learn these weights analogously by minimizing an estimate of the population risk subject to a nonnegativity constraint.
no code implementations • 31 Aug 2023 • Matias D. Cattaneo, Jason M. Klusowski, Boris Shigida
In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory.
no code implementations • 15 Jul 2023 • Jason M. Klusowski, Jonathan W. Siegel
We study the fundamental limits of matching pursuit, or the pure greedy algorithm, for approximating a target function by a sparse linear combination of elements from a dictionary.
no code implementations • 19 Nov 2022 • Matias D. Cattaneo, Jason M. Klusowski, Peter M. Tian
Decision tree learning is increasingly being used for pointwise inference.
no code implementations • 28 Apr 2021 • Jason M. Klusowski, Peter M. Tian
This paper shows that decision trees constructed with Classification and Regression Trees (CART) and C4. 5 methodology are consistent for regression and classification tasks, even when the number of predictor variables grows sub-exponentially with the sample size, under natural 0-norm and 1-norm sparsity constraints.
no code implementations • 5 Nov 2020 • Jason M. Klusowski, Peter M. Tian
Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening variables in a predictive model.
no code implementations • 22 Jun 2020 • Ryan Theisen, Jason M. Klusowski, Michael W. Mahoney
Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes.
no code implementations • NeurIPS 2020 • Jason M. Klusowski
In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem.
no code implementations • 22 Oct 2019 • Ryan Theisen, Jason M. Klusowski, Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
Classical results on the statistical complexity of linear models have commonly identified the norm of the weights $\|w\|$ as a fundamental capacity measure.
no code implementations • 24 Jun 2019 • Jason M. Klusowski
For binary classification and regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable.
no code implementations • 2 Feb 2019 • Andrew R. Barron, Jason M. Klusowski
For any ReLU network there is a representation in which the sum of the absolute values of the weights into each node is exactly $1$, and the input layer variables are multiplied by a value $V$ coinciding with the total variation of the path weights.
no code implementations • 10 Sep 2018 • Andrew R. Barron, Jason M. Klusowski
It has been experimentally observed in recent years that multi-layer artificial neural networks have a surprising ability to generalize, even when trained with far more parameters than observations.
no code implementations • 7 May 2018 • Jason M. Klusowski
Random forests have become an important tool for improving accuracy in regression and classification problems since their inception by Leo Breiman in 2001.
no code implementations • 21 Feb 2018 • Jason M. Klusowski, Yihong Wu
Applied researchers often construct a network from a random sample of nodes in order to infer properties of the parent network.
no code implementations • 12 Jan 2018 • Jason M. Klusowski, Yihong Wu
Learning properties of large graphs from samples has been an important problem in statistical network analysis since the early work of Goodman \cite{Goodman1949} and Frank \cite{Frank1978}.
no code implementations • 29 Dec 2017 • W. D. Brinda, Jason M. Klusowski
The MDL two-part coding $ \textit{index of resolvability} $ provides a finite-sample upper bound on the statistical risk of penalized likelihood estimators over countable models.
no code implementations • 26 Apr 2017 • Jason M. Klusowski, Dana Yang, W. D. Brinda
We also show that the population EM operator for mixtures of two regressions is anti-contractive from the target parameter vector if the cosine angle between the input vector and the target parameter vector is too small, thereby establishing the necessity of our conic condition.
no code implementations • 9 Feb 2017 • Jason M. Klusowski, Andrew R. Barron
Estimation of functions of $ d $ variables is considered using ridge combinations of the form $ \textstyle\sum_{k=1}^m c_{1, k} \phi(\textstyle\sum_{j=1}^d c_{0, j, k}x_j-b_k) $ where the activation function $ \phi $ is a function with bounded value and derivative.
no code implementations • 7 Aug 2016 • Jason M. Klusowski, W. D. Brinda
In that method, the basin of attraction for valid initialization is required to be a ball around the truth.
no code implementations • 26 Jul 2016 • Jason M. Klusowski, Andrew R. Barron
We establish $ L^{\infty} $ and $ L^2 $ error bounds for functions of many variables that are approximated by linear combinations of ReLU (rectified linear unit) and squared ReLU ridge functions with $ \ell^1 $ and $ \ell^0 $ controls on their inner and outer parameters.
no code implementations • 5 Jul 2016 • Jason M. Klusowski, Andrew R. Barron
On the other hand, if the candidate fits are chosen from a discretization, we show that $ \mathbb{E}\|\hat{f} - f^{\star} \|^2 \leq \left(v^3_{f^{\star}}\frac{\log d}{n}\right)^{2/5} $.