Search Results for author: Stefano Cavazzi

Found 2 papers, 1 papers with code

CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl

1 code implementation17 May 2024 Ilya Ilyankou, James Haworth, Stefano Cavazzi

The Common Crawl (CC) corpus is the largest open web crawl dataset containing 9. 5+ petabytes of data captured since 2008.

Cannot find the paper you are looking for? You can Submit a new open access paper.