Yandex Researchers Develop New Methods that Could Cut AI Deployment Costs by up to 8 Times

July 24, 2024

120

The Yandex Research team, in collaboration with researchers from IST Austria, NeuralMagic, and KAUST, have developed two innovative compression methods for large language models: Additive Quantization for Language Models (AQLM) and PV-Tuning. When combined, these methods allow for a reduction in model size by up to 8 times while preserving response quality by 95%. The methods aim to optimize resources and enhance efficiency in running large language models. The research article detailing this approach has been featured at the International Conference on Machine Learning (ICML), currently underway in Vienna, Austria.

Key features of AQLM and PV-Tuning

AQLM leverages additive quantization, traditionally used for information retrieval, for LLM compression. The resulting method preserves and even improves model accuracy under extreme compression, making it possible to deploy LLMs on everyday devices like home computers and smartphones. This results in a significant reduction in memory consumption.

PV-Tuning addresses errors that may arise during the model compression process. When combined, AQLM and PV-Tuning deliver optimal results — compact models capable of providing high-quality responses even on limited computing resources.

Method evaluation and recognition

The effectiveness of the methods was rigorously assessed using popular open source models such as LLama 2, Llama 3, Mistral, and others. Researchers compressed these large language models and evaluated answer quality against English-language benchmarks — WikiText2 and C4 — maintaining an impressive 95% answer quality as the models were compressed by 8 times.

Who can benefit from AQLM and PV-Tuning

The new methods offer substantial resource savings for companies involved in developing and deploying proprietary language models and open-source LLMs. For instance, the Llama 2 model with 13 billion parameters, post-compression, can now run on just 1 GPU instead of 4, reducing hardware costs by up to 8 times. This means that startups, individual researchers, and LLM enthusiasts can run advanced LLMs such as Llama on their everyday computers.

Exploring new LLM applications

AQLM and PV-Tuning make it possible to deploy models offline on devices with limited computing resources, enabling new use cases for smartphones, smart speakers, and more. With advanced LLMs integrated into them, users can use text and image generation, voice assistance, personalized recommendations, and even real-time language translation without needing an active internet connection.

Moreover, models compressed using the methods can operate up to 4 times faster, as they require fewer computations.

Implementation and access

Developers and researchers worldwide can already use AQLM and PV-Tuning, which are available on GitHub. Demo materials provided by the authors offer guidance for effectively training compressed LLMs for various applications. Additionally, developers can download popular open-source models that have already been compressed using the methods.

ICML highlight

A scientific article by Yandex Research on the AQLM compression method has been featured at ICML, one of the world’s most prestigious machine learning conferences. Co-authored with researchers from IST Austria and experts from AI startup Neural Magic, this work signifies a significant advancement in LLM compression technology.

Author

Supriya Rai

Yandex Researchers Develop New Methods that Could Cut AI Deployment Costs by up to 8 Times

Author

WebEngage Powers Kaspersky’s Shift Towards More Personalised B2B Customer Engagement

Analog Devices Completes Acquisition of Empower Semiconductor

Lenovo Redefines Enterprise AI Economics with Agentic AI and Inferencing Innovations

LEAVE A REPLY Cancel reply

Most Popular

Indian Companies Must Think Beyond Services and Build Global Deep-Tech Products: Joseph Sudheer Thumma, Magellanic Cloud

Hein Hellemons Named Chief Revenue Officer At New Relic

Acer, Intel and Infinity Learn Launch NEET Ready Laptop

Indian Organisations Experience Ten Times More AI-related Data Policy Violations: Netskope

Recent Comments

EDITOR PICKS

NetApp Partners with Google Cloud to Adopt AI-Driven Operations

Real Question Isn’t If AI Will Transform Us, It’s How We Shape that Transformation: Dr. Chandra Sekhar Pemmasani

Rahul Dayal Takes Charge as Chief Technology Officer at SBI Mutual Fund

POPULAR POSTS

Top Ten Google Gemini AI Photo Editing Prompts

Top 5 Google Gemini AI Photo Editing Prompts for Navratri

Blackberry Edge Stylus 2024: Price, Release Date, and Full Specifications

MOST COMMENTED

PDRL Launches BhuMeet: Aggregator Platform to Help Farmers Connect with Drone Service Providers

Rashi Peripherals Gets Top Value-Added Distributor of the Year Award

IDEMIA Collaborates on Post-Quantum Cryptography with IIT Hyderabad

Subscribe to our Newsletter

ABOUT US

FOLLOW US

CONTACT US