Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge.
Robotics
We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra.
Data Structures and Algorithms Distributed, Parallel, and Cluster Computing
The LIO subsystem registers raw points (instead of feature points on e. g., edges or planes) of a new scan to an incrementally-built point cloud map.
Robotics
Within academia and industry, there has been a need for expansive simulation frameworks that include model-based simulation of sensors, mobile vehicles, and the environment around them.
Robotics Signal Processing
We suggest and evaluate methods to enhance Channel Charting with model-based localization approaches: One approach involves using information derived from classical localization methods to map channel chart locations to physical positions after conventional training of the forward charting function.
Information Theory Signal Processing Information Theory
The growing prevalence of online conferences and courses presents a new challenge in improving automatic speech recognition (ASR) with enriched textual information from video slides.
Sound Multimedia Audio and Speech Processing
Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems.
Sound Audio and Speech Processing
SAM2Act achieves a state-of-the-art average success rate of 86. 8% across 18 tasks in the RLBench benchmark, and demonstrates robust generalization on The Colosseum benchmark, with only a 4. 3% performance gap under diverse environmental perturbations.
Ranked #1 on
Robot Manipulation
on RLBench
Robot Manipulation
Robotics
However, existing SR methods that typically rely on independently trained and concatenated networks may lead to inconsistent representations and poor speech quality, especially in out-of-domain scenarios.
Sound Audio and Speech Processing
Time delay neural network (TDNN) has been proven to be efficient for speaker verification.
Sound Audio and Speech Processing