SOSD (Searching on Sorted Data)

Introduced by Kipf et al. in SOSD: A Benchmark for Learned Indexes

SOSD is a collection of dataset to benchmark the lookup performance of learned indexes.

SOSD currently includes eight different datasets. Each dataset consists of 200 million 64-bit unsigned integers (keys) with very few duplicates (if at all): amzn represents book sale popularity data. face is an upsampled version of a Facebook user ID dataset. logn and norm are lognormal (0, 2) and normal distributions, respectively. osmc is uniformly sampled OpenStreetMap locations represented as Google S2 CellIds. uden is dense integers. uspr is uniformly distributed sparse integers. wiki is Wikipedia article edit timestamps.

In addition, there are 32-bit versions of all datasets (except osmc and wiki) with similar CDFs. We use different parameters, (0, 1), for logn in the 32-bit case to reduce the number of duplicates.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

SOSD (Searching on Sorted Data)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Domains Project

Usage

License

Modalities

Languages

SOSD (Searching on Sorted Data)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Domains Project

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages