1 code implementation • 26 Mar 2024 • Leonidas Gee, Andrea Zugarini, Novi Quadrianto
To reduce the inference cost of large language models, model compression is increasingly used to create smaller scalable models.
1 code implementation • 15 Feb 2024 • Leonidas Gee, Andrea Zugarini, Leonardo Rigutini, Paolo Torroni
Real-world business applications require a trade-off between language model performance and size.
1 code implementation • 15 Feb 2024 • Leonidas Gee, Leonardo Rigutini, Marco Ernandes, Andrea Zugarini
Large Language Models have proven highly successful at modelling a variety of tasks.