TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-range modeling	LRA	PSF	ListOps	38.85	# 18
Long-range modeling	LRA	PSF	Text	77.32	# 15
Long-range modeling	LRA	PSF	Retrieval	76.51	# 17
Long-range modeling	LRA	PSF	Image	45.01	# 18
Long-range modeling	LRA	PSF	Pathfinder	80.49	# 15
Long-range modeling	LRA	PSF	Avg	63.64	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparse-factorization-of-large-square-matrices/long-range-modeling-on-lra)](https://paperswithcode.com/sota/long-range-modeling-on-lra?p=sparse-factorization-of-large-square-matrices)`

Sparse Factorization of Large Square Matrices

16 Sep 2021 · Ruslan Khalitov, Tong Yu, Lei Cheng, Zhirong Yang ·

Square matrices appear in many machine learning problems and models. Optimization over a large square matrix is expensive in memory and in time. Therefore an economic approximation is needed. Conventional approximation approaches factorize the square matrix into a number matrices of much lower ranks. However, the low-rank constraint is a performance bottleneck if the approximated matrix is intrinsically high-rank or close to full rank. In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only $N(\log N)^2$ non-zero numbers for an $N\times N$ full matrix. We present both non-parametric and parametric ways to find the factorization. In the former, we learn the factorizing matrices directly, and in the latter, we train neural networks to map input data to the non-zero matrix entries. The sparse factorization method is tested for a variety of synthetic and real-world square matrices. The experimental results demonstrate that our method gives a better approximation when the approximated matrix is sparse and high-rank. Based on this finding, we use our parametric method as a scalable attention architecture that performs strongly in learning tasks for long sequential data and defeats Transformer and its several variants.

PDF Abstract

Code

Add Remove Mark official

ruslankhalitov/sparsefactorization official

Tasks

Add Remove

Long-range modeling

Datasets

LRA

Results from the Paper

Edit

Ranked #17 on Long-range modeling on LRA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-range modeling	LRA	PSF	ListOps	38.85	# 18	Compare
			Text	77.32	# 15	Compare
			Retrieval	76.51	# 17	Compare
			Image	45.01	# 18	Compare
			Pathfinder	80.49	# 15	Compare
			Avg	63.64	# 17	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Sparse Factorization of Large Square Matrices

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove