Trending Research

Empowering Robot Path Planning with Large Language Models: osmAG Map Topology & Hierarchy Comprehension with LLMs

hiyouga/llama-factory 13 Mar 2024

Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge.

Robotics

40,851
0.26 stars / hour

Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU

flashinfer-ai/flashinfer 9 Jan 2023

We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra.

Data Structures and Algorithms Distributed, Parallel, and Cluster Computing

2,083
0.23 stars / hour

FAST-LIVO: Fast and Tightly-coupled Sparse-Direct LiDAR-Inertial-Visual Odometry

hku-mars/fast-livo2 2 Mar 2022

The LIO subsystem registers raw points (instead of feature points on e. g., edges or planes) of a new scan to an incrementally-built point cloud map.

Robotics

1,738
0.19 stars / hour

Cosys-AirSim: A Real-Time Simulation Framework Expanded for Complex Industrial Applications

Cosys-Lab/Cosys-AirSim 23 Mar 2023

Within academia and industry, there has been a need for expansive simulation frameworks that include model-based simulation of sensors, mobile vehicles, and the environment around them.

Robotics Signal Processing

120
0.17 stars / hour

Augmenting Channel Charting with Classical Wireless Source Localization Techniques

jeija/toa-aoa-augmented-channelcharting 4 Dec 2023

We suggest and evaluate methods to enhance Channel Charting with model-based localization approaches: One approach involves using information derived from classical localization methods to map channel chart locations to physical positions after conventional training of the forward charting function.

Information Theory Signal Processing Information Theory

51
0.16 stars / hour

LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition

alibaba-damo-academy/FunASR 12 Jan 2024

The growing prevalence of online conferences and courses presents a new challenge in improving automatic speech recognition (ASR) with enriched textual information from video slides.

Sound Multimedia Audio and Speech Processing

8,203
0.15 stars / hour

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

olawod/freevc 11 Jun 2021

Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems.

Sound Audio and Speech Processing

646
0.15 stars / hour

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

sam2act/sam2act 30 Jan 2025

SAM2Act achieves a state-of-the-art average success rate of 86. 8% across 18 tasks in the RLBench benchmark, and demonstrates robust generalization on The Colosseum benchmark, with only a 4. 3% performance gap under diverse environmental perturbations.

Robot Manipulation Robotics

44
0.11 stars / hour

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

modelscope/ClearerVoice-Studio 17 Jan 2025

However, existing SR methods that typically rely on independently trained and concatenated networks may lead to inconsistent representations and poor speech quality, especially in out-of-domain scenarios.

Sound Audio and Speech Processing

2,233
0.11 stars / hour

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

alibaba-damo-academy/3D-Speaker 1 Mar 2023

Time delay neural network (TDNN) has been proven to be efficient for speaker verification.

Sound Audio and Speech Processing

1,636
0.09 stars / hour