Text Compression

9 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

LLMZip: Lossless Text Compression using Large Language Models

vcskaushik/LLMzip 6 Jun 2023

We provide new estimates of an asymptotic upper bound on the entropy of English using the large language model LLaMA-7B as a predictor for the next token given a window of past tokens.

Syntactically Informed Text Compression with Recurrent Neural Networks

davidcox143/rnn-text-compress 8 Aug 2016

We present a self-contained system for constructing natural language models for use in text compression.

Authorship Verification based on Compression-Models

8sukanya8/occav 1 Jun 2017

Instead, the only three key components of our method are a compressing algorithm, a dissimilarity measure and a threshold, needed to accept or reject the authorship of the questioned document.

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Stonesjtu/Pytorch-NCE 20 Aug 2017

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary.

Data-efficient Neural Text Compression with Interactive Learning

UKPLab/NAACL2019-interactiveCompression NAACL 2019

Neural sequence-to-sequence models have been successfully applied to text compression.

Contextualized Semantic Distance between Highly Overlapped Texts

Stareru/NeighboringDistributionDivergence 4 Oct 2021

Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.

Gzip versus bag-of-words for text classification

flipz357/npc_gzip_exp 27 Jul 2023

The effectiveness of compression in text classification ('gzip') has recently garnered lots of attention.

LLMs may Dominate Information Access: Neural Retrievers are Biased Towards LLM-Generated Texts

kid-22/llm4ir-bias 31 Oct 2023

We refer to this category of biases in neural retrieval models towards the LLM-generated text as the \textbf{source bias}.

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

microsoft/LLMLingua 19 Mar 2024

The challenge is that information entropy may be a suboptimal compression metric: (i) it only leverages unidirectional context and may fail to capture all essential information needed for prompt compression; (ii) it is not aligned with the prompt compression objective.