Natural Language Processing

Text Compression

9 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Compression

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

LLMZip: Lossless Text Compression using Large Language Models

vcskaushik/LLMzip • • 6 Jun 2023

We provide new estimates of an asymptotic upper bound on the entropy of English using the large language model LLaMA-7B as a predictor for the next token given a window of past tokens.

Paper
Code

Syntactically Informed Text Compression with Recurrent Neural Networks

davidcox143/rnn-text-compress • 8 Aug 2016

We present a self-contained system for constructing natural language models for use in text compression.

Paper
Code

Authorship Verification based on Compression-Models

8sukanya8/occav • 1 Jun 2017

Instead, the only three key components of our method are a compressing algorithm, a dissimilarity measure and a threshold, needed to accept or reject the authorship of the questioned document.

Paper
Code

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Stonesjtu/Pytorch-NCE • • 20 Aug 2017

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary.

Paper
Code

Data-efficient Neural Text Compression with Interactive Learning

UKPLab/NAACL2019-interactiveCompression • • NAACL 2019

Neural sequence-to-sequence models have been successfully applied to text compression.

Paper
Code

Contextualized Semantic Distance between Highly Overlapped Texts

Stareru/NeighboringDistributionDivergence • • 4 Oct 2021

Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.

Paper
Code

Gzip versus bag-of-words for text classification

flipz357/npc_gzip_exp • • 27 Jul 2023

The effectiveness of compression in text classification ('gzip') has recently garnered lots of attention.

Paper
Code

LLMs may Dominate Information Access: Neural Retrievers are Biased Towards LLM-Generated Texts

kid-22/llm4ir-bias • 31 Oct 2023

We refer to this category of biases in neural retrieval models towards the LLM-generated text as the \textbf{source bias}.

Paper
Code

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

microsoft/LLMLingua • • 19 Mar 2024

The challenge is that information entropy may be a suboptimal compression metric: (i) it only leverages unidirectional context and may fail to capture all essential information needed for prompt compression; (ii) it is not aligned with the prompt compression objective.

Paper
Code

Text Compression

Benchmarks Add a Result

Most implemented papers

Content

Benchmarks

Add a Result