Parsing Millions of URLs per Second

17 Nov 2023  ·  Yagiz Nizipli, Daniel Lemire ·

URLs are fundamental elements of web applications. By applying vector algorithms, we built a fast standard-compliant C++ implementation. Our parser uses three times fewer instructions than competing parsers following the WHATWG standard (e.g., Servo's rust-url) and up to eight times fewer instructions than the popular curl parser. The Node.js environment adopted our C++ library. In our tests on realistic data, a recent Node.js version (20.0) with our parser is four to five times faster than the last version with the legacy URL parser.

PDF Abstract

Categories


Programming Languages Data Structures and Algorithms

Datasets


Introduced in the Paper:

Various URL Datasets