no code implementations • 7 Feb 2024 • Meysam Alishahi, Jeff M. Phillips
We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework.
no code implementations • 8 Nov 2023 • Benwei Shi, Aditya Bhaskara, Wai Ming Tai, Jeff M. Phillips
We show that a constant-size constant-error coreset for polytope distance is simple to maintain under merges of coresets.
no code implementations • 5 Nov 2023 • Chin-Chia Michael Yeh, Yan Zheng, Menghai Pan, Huiyuan Chen, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei zhang, Jeff M. Phillips, Eamonn Keogh
In this work, we propose a sketch for discord mining among multi-dimensional time series.
no code implementations • 5 Oct 2023 • Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Junpeng Wang, Vivian Lai, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei zhang, Jeff M. Phillips
A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing.
no code implementations • 28 Jun 2023 • Jeff M. Phillips, Hasan Pourmahmood-Aghababa
For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordinate $(R_p)_i = K(p, x_i)$ for $x_i \in X$.
no code implementations • 5 Jun 2023 • Meysam Alishahi, Anna Little, Jeff M. Phillips
In linear distance metric learning, we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible.
no code implementations • 26 May 2023 • Tao Yang, Cuize Han, Chen Luo, Parth Gupta, Jeff M. Phillips, Qingyao Ai
While previous studies have demonstrated the effectiveness of using user behavior signals (e. g., clicks) as both features and labels of LTR algorithms, we argue that existing LTR algorithms that indiscriminately treat behavior and non-behavior signals in input features could lead to suboptimal performance in practice.
no code implementations • 23 Oct 2022 • Shibo Li, Jeff M. Phillips, Xin Yu, Robert M. Kirby, Shandian Zhe
However, this method only queries at one pair of fidelity and input at a time, and hence has a risk to bring in strongly correlated examples to reduce the learning efficiency.
1 code implementation • 3 Sep 2022 • Hasan Pourmahmood-Aghababa, Jeff M. Phillips
We provide the first comprehensive study on how to classify trajectories using only their spatial representations, measured on 5 real-world data sets.
no code implementations • NeurIPS 2021 • Zhimeng Pan, Zheng Wang, Jeff M. Phillips, Shandian Zhe
Specifically, we use an embedding to represent each event type and model the event influence as an unknown function of the embeddings and time span.
no code implementations • 10 Jul 2021 • Jiahui Chen, Joe Breen, Jeff M. Phillips, Jacobus Van der Merwe
Network traffic classification that is widely applicable and highly accurate is valuable for many network security and management tasks.
no code implementations • 25 Jun 2021 • Michael Matheny, Jeff M. Phillips
For different classes of $\Phi$ we can either provide a $\Omega(|X|^{3/2 - o(1)})$ time lower bound for the exact solution with a reduction to APSP, or an $\Omega(|X| + 1/\varepsilon^{2-o(1)})$ lower bound for the approximate solution with a reduction to 3SUM.
1 code implementation • 6 Apr 2021 • Archit Rathore, Sunipa Dev, Jeff M. Phillips, Vivek Srikumar, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei zhang, Bei Wang
To aid this, we present Visualization of Embedding Representations for deBiasing system ("VERB"), an open-source web-based visualization tool that helps the users gain a technical understanding and visual intuition of the inner workings of debiasing techniques, with a focus on their geometric properties.
1 code implementation • EMNLP 2021 • Sunipa Dev, Tao Li, Jeff M. Phillips, Vivek Srikumar
Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks.
1 code implementation • 5 Feb 2020 • Benwei Shi, Jeff M. Phillips
We provide a deterministic space-efficient algorithm for estimating ridge regression.
no code implementations • 13 Oct 2019 • Yuwei Wang, Yan Zheng, Yanqing Peng, Chin-Chia Michael Yeh, Zhongfang Zhuang, Das Mahashweta, Bendre Mangesh, Feifei Li, Wei zhang, Jeff M. Phillips
Embeddings are already essential tools for large language models and image analysis, and their use is being extended to many other research domains.
no code implementations • 13 Jun 2019 • Mingxuan Han, Michael Matheny, Jeff M. Phillips
Kulldorff's (1997) seminal paper on spatial scan statistics (SSS) has led to many methods considering different regions of interest, different statistical models, and different approximations while also having numerous applications in epidemiology, environmental monitoring, and homeland security.
no code implementations • 9 Nov 2018 • Jeff M. Phillips, Wai Ming Tai
We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space.
no code implementations • 4 Jun 2018 • Sunipa Dev, Safia Hassan, Jeff M. Phillips
We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e. g., GloVe or word2vec).
no code implementations • 30 Apr 2018 • Jeff M. Phillips, Pingfan Tang
We develop a new class of distances for objects including lines, hyperplanes, and trajectories, based on the distance to a set of landmarks.
no code implementations • 6 Feb 2018 • Jeff M. Phillips, Wai Ming Tai
When $d\geq 1/\varepsilon^2$, it is known that the size of coreset can be $O(1/\varepsilon^2)$.
no code implementations • 11 Oct 2017 • Jeff M. Phillips, Wai Ming Tai
When the dimension $d$ is constant, we demonstrate much tighter bounds on the size of the coreset specifically for Gaussian kernels, showing that it is bounded by the size of the coreset for axis-aligned rectangles.
no code implementations • 13 Feb 2017 • Yan Zheng, Jeff M. Phillips
Kernel regression is an essential and ubiquitous tool for non-parametric data analysis, particularly popular among time series and spatial data.
no code implementations • NeurIPS 2016 • Pingfan Tang, Jeff M. Phillips
And so on, if the composition is of more than two estimators.
no code implementations • 17 Feb 2016 • Di Chen, Jeff M. Phillips
A reproducing kernel can define an embedding of a data point into an infinite dimensional reproducing kernel Hilbert space (RKHS).
no code implementations • 16 Dec 2015 • Mina Ghashami, Daniel Perry, Jeff M. Phillips
Kernel principal component analysis (KPCA) provides a concise set of basis vectors which capture non-linear structures within large data sets, and is a central tool in data analysis and learning.
no code implementations • 30 Oct 2015 • Jeff M. Phillips, Yan Zheng
We consider smoothed versions of geometric range spaces, so an element of the ground set (e. g. a point) can be contained in a range with a non-binary value in $[0, 1]$.
no code implementations • 8 Jan 2015 • Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff
It performed $O(d \times \ell)$ operations per row and maintains a sketch matrix $B \in R^{\ell \times d}$ such that for any $k < \ell$ $\|A^TA - B^TB \|_2 \leq \|A - A_k\|_F^2 / (\ell-k)$ and $\|A - \pi_{B_k}(A)\|_F^2 \leq \big(1 + \frac{k}{\ell-k}\big) \|A-A_k\|_F^2 $ .
Data Structures and Algorithms 68W40 (Primary)