• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Large Language Models No Longer Require Powerful Servers

Large Language Models No Longer Require Powerful Servers

© iStock

Scientists from Yandex, HSE University, MIT, KAUST, and ISTA have made a breakthrough in optimising LLMs. Yandex Research, in collaboration with leading science and technology universities, has developed a method for rapidly compressing large language models (LLMs) without compromising quality. Now, a smartphone or laptop is enough to work with LLMs—there's no need for expensive servers or high-powered GPUs.

This method enables faster testing and more efficient implementation of new neural network-based solutions, reducing both development time and costs. As a result, LLMs are more accessible not only to large corporations, but also to smaller companies, non-profit laboratories and institutes, as well as individual developers and researchers.

Previously, running a language model on a smartphone or laptop required quantising on an expensive server—a process that could take anywhere from a few hours to several weeks. Quantisation can now be performed directly on a smartphone or laptop in just a few minutes.

Challenges in implementing LLMs

The main obstacle to using LLMs is that they require considerable computational power. This applies to open-source models as well. For example, the popular DeepSeek-R1 is too large to run even on high-end servers built for AI and machine learning workloads, meaning that very few companies can effectively use LLMs, even if the model itself is publicly available.

The new method reduces the model's size while maintaining its quality, making it possible to run on more accessible devices. This method allows even larger models, such as DeepSeek-R1 with 671 billion parameters and Llama 4 Maverick with 400 billion parameters, to be compressed, which until now could only be quantised using basic methods and resulted in significant quality loss.

The new quantisation method opens up more opportunities to use LLMs across various fields, particularly in resource-limited sectors such as education and the social sphere. Startups and independent developers can now implement compressed models to create innovative products and services without the need for costly hardware investments. Yandex is already applying the new method for prototyping—creating working versions of products and quickly validating ideas. Testing compressed models takes less time than testing the original versions.

Key details of the new method

The new quantisation method is named HIGGS (Hadamard Incoherence with Gaussian MSE-Optimal GridS). It enables the compression of neural networks without the need for additional data or computationally intensive parameter optimisation. This is especially useful in situations where there is not enough relevant data available to train the model. HIGGS strikes a balance between the quality, size, and complexity of the quantised models, making them suitable for use on a variety of devices.

The method has already been validated on the widely used Llama 3 and Qwen2.5 models. Experiments have shown that HIGGS outperforms all existing data-free quantisation methods, including NF4 (4-bit NormalFloat) and HQQ (Half-Quadratic Quantisation), in terms of both quality and model size.

© iStock

Scientists from HSE University, the Massachusetts Institute of Technology (MIT), the Austrian Institute of Science and Technology (ISTA), and King Abdullah University of Science and Technology (KAUST, Saudi Arabia), all contributed to the development of the method.

The HIGGS method is already accessible to developers and researchers on Hugging Face and GitHub, with a research paper available on arXiv.

Response from the academic community, and other methods

The paper describing the new method has been accepted for presentation at one of the largest AI conferences in the world—the North American Chapter of the Association for Computational Linguistics (NAACL). The conference will be held from April 29 to May 4, 2025, in Albuquerque, New Mexico, USA, and Yandex will be among the attendees, along with other companies and universities such as Google, Microsoft Research, and Harvard University. The paper has been cited by Red Hat AI, an American software company, as well as Peking University, Hong Kong University of Science and Technology, Fudan University, and others.

Previously, scientists from Yandex presented 12 studies focused on LLM quantisation. The company aims to make the application of LLMs more efficient, less energy-consuming, and accessible to all developers and researchers. For example, the Yandex Research team has previously developed methods for compressing LLMs, which reduce computational costs by nearly eight times, while not significantly compromising the quality of the neural network’s responses. The team has also developed a solution that allows running a model with 8 billion parameters on a regular computer or smartphone through a browser interface, even without major computational power.

See also:

How Colour Affects Pricing: Why Art Collectors Pay More for Blue

Economists from HSE University, St Petersburg State University, and the University of Florida have found which colours in abstract paintings increase their market value. An analysis of thousands of canvases sold at auctions revealed that buyers place a higher value on blue and favour bright, saturated palettes, while showing less appreciation for traditional colour schemes. The article has been published in Information Systems Frontiers.

New Method for Describing Graphene Simplifies Analysis of Nanomaterials

An international team, including scientists from HSE University, has proposed a new mathematical method to analyse the structure of graphene. The scientists demonstrated that the characteristics of a graphene lattice can be represented using a three-step random walk model of a particle. This approach allows the lattice to be described more quickly and without cumbersome calculations. The study has been published in Journal of Physics A: Mathematical and Theoretical.

Scientists Have Modelled Supercapacitor Operation at Molecular and Ionic Level

HSE scientists used supercomputer simulations to study the behaviour of ions and water molecules inside the nanopores of a supercapacitor. The results showed that even a very small amount of water alters the charge distribution inside the nanopores and influences the device’s energy storage capacity. This approach makes it possible to predict how supercapacitors behave under different electrolyte compositions and humidity conditions. The paper has been published in  Electrochimica Acta.  The study was supported by a grant from the Russian Science Foundation (RSF).

Designing an Accurate Reading Skills Test: Why Parallel Texts are Important in Dyslexia Diagnosis

Researchers from the HSE Centre for Language and Brain have developed a tool for accurately assessing reading skills in adults with reading impairments. It can be used, for instance, before and after sessions with a language therapist. The tool includes two texts that differ in content but are equal in complexity: participants were observed to read them at the same speed, make a similar number of errors, and understand the content to the same degree. Such parallel texts will enable more accurate diagnosis of dyslexia and better monitoring of the effectiveness of interventions aimed at addressing it. The paper has been published in Educational Studies.

Internal Clock: How Heart Rate and Emotions Shape Our Perception of Time

Our perception of time depends on heart rate—this is the conclusion reached by neuroscientists at HSE University. In their experiment, volunteers watched short videos designed to evoke specific emotions and estimated each video's duration, while researchers recorded their heart activity using ECG. The study found that the slower a participant's heart rate, the shorter they perceived the video to be—especially when watching unpleasant content. The study has been published in Frontiers in Psychology.

Scientists Identify Personality Traits That Help Schoolchildren Succeed Academically

Economists from HSE University and the Southern Federal University have found that personality traits such as conscientiousness and open-mindedness help schoolchildren improve their academic performance. The study, conducted across seven countries, was the first large-scale international analysis of the impact of character traits on the academic achievement of 10 and 15-year-olds. The findings have been published in the International Journal of Educational Research.

HSE Scientists Reveal How Disrupted Brain Connectivity Affects Cognitive and Social Behaviour in Children with Autism

An international team of scientists, including researchers from the HSE Centre for Language and Brain, has for the first time studied the connectivity between the brain's sensorimotor and cognitive control networks in children with autism. Using fMRI data, the researchers found that connections within the cognitive control network (responsible for attention and inhibitory control) are weakened, while connections between this network and the sensorimotor network (responsible for movement and sensory processing) are, by contrast, excessively strong. These features manifest as difficulties in social interaction and behavioural regulation in children. The study has been published in Brain Imaging and Behavior.

Similar Comprehension, Different Reading: How Native Language Affects Reading in English as a Second Language

Researchers from the MECO international project, including experts from the HSE Centre for Language and Brain, have developed a tool for analysing data on English text reading by native speakers of more than 19 languages. In a large-scale experiment involving over 1,200 people, researchers recorded participants’ eye movements as they silently read the same English texts and then assessed their level of comprehension. The results showed that even when comprehension levels were the same, the reading process—such as gaze fixations, rereading, and word skipping—varied depending on the reader's native language and their English proficiency. The study has been published in Studies in Second Language Acquisition.

Mortgage and Demography: HSE Scientists Reveal How Mortgage Debt Shapes Family Priorities

Having a mortgage increases the likelihood that a Russian family will plan to have a child within the next three years by 39 percentage points. This is the conclusion of a study by Prof. Elena Vakulenko and doctoral student Rufina Evgrafova from the HSE Faculty of Economic Sciences. The authors emphasise that this effect is most pronounced among women, people under 36, and those without children. The study findings have been published in Voprosy Ekonomiki.

Scientists Discover How Correlated Disorder Boosts Superconductivity

Superconductivity is a unique state of matter in which electric current flows without any energy loss. In materials with defects, it typically emerges at very low temperatures and develops in several stages. An international team of scientists, including physicists from HSE MIEM, has demonstrated that when defects within a material are arranged in a specific pattern rather than randomly, superconductivity can occur at a higher temperature and extend throughout the entire material. This discovery could help develop superconductors that operate without the need for extreme cooling. The study has been published in Physical Review B.