Info!
UPDATED 1 Sept: The EI library in London is temporarily closed to the public, as a precautionary measure in light of the ongoing COVID-19 situation. The Knowledge Service will still be answering email queries via email , or via live chats during working hours (09:15-17:00 GMT). Our e-library is always open for members here: eLibrary , for full-text access to over 200 e-books and millions of articles. Thank you for your patience.

ISSN 2753-7757 (Online)

How much energy does AI actually consume?

24/9/2025

10 min read

Feature

Close up photo of a rectangular computer processing unit with a row of three circular fans

Photo: AdobeStock/Lazy_Bear

A GPU manufactured by NVIDIA, used both for 3D graphics and processing AI queries

Photo: AdobeStock/Lazy_Bear

There are serious concerns about the energy consumption of AI systems – in particular large language models (LLMs) such as ChatGPT. We know they use a lot of energy during training – the process by which the millions or billions of parameters in the model ingest and analyse data. Apparently, training OpenAI’s GPT-4 (with well over a trillion parameters) consumed over 50 GWh of electricity. Toby Clark asks, how much energy do they use to answer your queries?

This is the ‘inference stage’ – whether it involves a text-only query, generating images or video, or operating as an agent to interact with other sources of information – and most sources reckon it is the most significant in terms of energy use.

Public statements about energy consumption from the firms behind proprietary AI models – OpenAI’s ChatGPT, Google’s Gemini and Anthropic’s Claude, for instance – are rare. However, in June, OpenAI CEO Sam Altman blogged: ‘People are often curious about how much energy a ChatGPT query uses; the average query uses about 0.34 Wh.’ This was seized upon, with much comment and calculation – but no more detail emerged.

If you ask ChatGPT ‘How much energy is this query using?’, it comes up with a similar figure of 0.1–1 Wh. Grok, xAI’s open-source chatbot, says ‘a ballpark estimate for a single Grok query might be in the range of 0.02–0.05 Wh’ – relatively frugal.

Clearly independent figures are needed, particularly given the scale of LLM use. ChatGPT alone gets around one billion queries a day – going by Altman’s figures, that would amount to 124 GWh/y (plus 120,000 t/y of water for cooling). But the processing required per query varies widely, depending on the complexity of the question and the type of response: text, audio, image or video. Increasingly, queries will come from other AI or software agents rather than directly from human users – ‘agentic’ AI will increase demand by orders of magnitude. It is also likely to promote more complex, multi-stage tasks.

Earlier this year, AI research firm Hugging Face launched the AI Energy Score project, ‘with a goal of bringing transparency to the energy consumption of AI models and empowering sustainable decisions across the industry’. This uses standardised tests to benchmark a model’s power consumption in different types of tasks: text generation, image generation, text classification, image classification, image capturing and summarisation.

Models are classified according to parameter size and the number of GPUs (graphics processing units) that they run on. AI calculations rely on huge quantities of matrix multiplication, which happens to be the same processing technique required to produce 3D graphics. There is a wide disparity between the best and worst in each class – some models use more than 100 times the energy of others – but it is also evident that some models are much more efficient at particular tasks.

The larger proprietary models might well be more versatile, but Hugging Face’s Dr Sasha Luccioni has said that ‘closed AI model providers have yet to submit their in-production models for testing, or share this data in any other way… making objective comparisons impossible’.

There is some progress: Mistral AI has just released an environmental audit of its Large 2 model (123 billion parameters), working with France’s Agency for Ecological Transition. This estimates that asking its AI assistant Le Chat for a lengthy 400-token (roughly one-page) response generates 1.14 g CO₂e. They don’t give a straightforward energy figure, but France’s low average CO₂ output (27 g/kWh) would suggest that the query uses about 42 Wh – while using the US CO₂e figure (368 g/kWh) the answer comes to 3.1 Wh.

A team from UCL recently performed experiments to see if energy use could be reduced (see Box 1). Their most complex query (a 400-word prompt and 400-word response) consumed 1.03 Wh, while summarisation and translation tasks used around 0.1 Wh, and a simple question used only 0.025 Wh. This was using a relatively small general-purpose LLM (Llama 3.1 8B version, with eight billion parameters).

University of Michigan publishes the ML.Energy leaderboard, ranking open-source AI models on standard hardware. Its typical chat task uses around 0.01 Wh on a highly efficient model (Mistral 7B, with seven billion parameters). By way of comparison, its text-generating prompts took about 0.2 Wh on Mistral Large and 0.02 Wh on Llama 3.1 8B.

The bigger models consume proportionally more energy. When ML.Energy tested a model 50 times larger (Llama 3.1 405B, with 405 billion parameters) it used 0.93 Wh – that’s 59 times more energy. Smaller and more specialist models use far less energy than general-purpose models with more parameters. UNESCO has just published a report called Smarter, Smaller, Stronger: Resource-efficient generative AI & the future of digital transformation, which looks at these approaches and includes UCL’s practical testing (see Box 1).

Most mainstream AI firms produce smaller versions of their models. Zekun Wu of Holistic AI (who contributed to the OECD report) points out: ‘None of the major providers frame their offerings around low energy consumption per se – but many emphasise cost, speed and scalability, which are tightly correlated with energy/resource use.’ Examples include OpenAI’s GPT-4o mini and Google’s Gemini Nano.

Even GPU maker NVIDIA sees the point of specialist models. Wu notes that its research arm has just published a paper called Small language models are the future of agentic AI, which finds that ‘small, task-specialist models can be 10–30 times cheaper per inference’.

The UNESCO report says: ‘One promising approach is... mixture of experts (MoE) or multi-agent systems.’ Instead of relying on a single large model to do everything, these systems use many smaller, specialised models, and only activate the ones needed for a particular task. This modular and on-demand approach reduces wasted computation, improving energy efficiency while maintaining strong performance across diverse applications.’

The report refers to this as ‘scaling smarter, not just bigger’.

So, users can help by optimising their queries (see Box 2), and hardware manufacturers can work on the efficiency of their systems, but the best way to optimise AI’s energy use is to choose the most appropriate model. Hugging Face’s Luccioni said: ‘Given the reluctance of closed AI model providers to participate in the AI Energy Score, we believe the most effective path forward is to apply pressure through their customers, especially enterprises.’

Testing different approaches

The UNESCO report was written by a team including Professor Ivana Drobnjak of UCL and Hristijan Bosilkovski. The latter points out that as the industry moves towards ever-larger generalised AI models, ‘we keep scaling up hardware rather than designing more efficient software, or at least we design software that uses up more of the available hardware’.

So, the team looked at whether energy use could be reduced using straightforward techniques. The results were encouraging.

Much of UCL’s testing was done with Meta’s open-source Llama 3.1 8B model, across three tasks: summarisation, translation and question answering. The results were assessed according to a standardised measure, so that both energy consumption and quality of answers could be measured.

The three basic techniques tested were:

Quantisation
This reduces the precision of numbers used by the model – like rounding values to simplify arithmetic. Drobnjak says: ‘When numbers get multiplied or summed, using numbers without decimal points is much easier.’ Three types of quantisation were tested:

Bits and bytes quantisation (BNBQ): using low precision throughout, potentially sacrificing accuracy on larger numbers. This achieved a 22% reduction in energy consumption.
Generalised post-training quantisation (GPTQ): adjusting precision across the model, preserving essential information to maintain overall accuracy. This gave a 35% reduction in consumption.
Activation aware quantisation (AWQ): prioritising high precision for critical components of the model, reducing precision for less important parts. This led to a 44% cut in energy consumption, while maintaining accuracy; it even outperformed the standard model on certain tasks.

Optimising user prompt and LLM response lengths
Energy consumption is directly impacted by the length of the user input, but even more by the length of the response.

Halving the length of a 400-word prompt reduced energy expenditure by 5%, while halving the length of a 400-word response cut energy consumption proportionally, by 54%. Halving it again to 100 words gave an overall reduction of 76%.

Small language models (SLMs) versus large general-purpose models
Large, general-purpose LLMs can be replaced with smaller models fine-tuned for designated tasks. In targeted testing this approach used 15 to 50 times less energy while producing higher-quality outputs. For instance, a translation task went down from 0.112 Wh to just 0.003 Wh – with better accuracy.

Asking the right questions

How do you ensure you’re not wasting energy when querying an LLM? Edit your question and ask for a brief answer. Shorter questions use less energy, but shorter answers use a lot less: if you only need a 50-word answer, ask for one.

UCL’s Drobnjak points out that even simple suggestions such as saying ‘be brief’ or asking for answers to be presented as bullet points can reduce waste.

Mistral AI recommends ‘writing precise prompts and asking for short, grouped answers wherever possible’.

You could even ask the AI for a preliminary edit of your question; Drobnjak suggests that something like a ‘mini processor between the user and the model – just a little script’ could optimise the prompt and save considerable energy.

And don’t be too polite: earlier this year, Hugging Face assessed how much energy a ‘Thank you’ demanded. They concluded that ‘a single polite reply to Llama 3 8B costs about 0.245 Wh’. Larger proprietary models are likely to use much more power, so ‘the aggregate daily energy cost of politeness could reach several MWh, equivalent to powering hundreds of homes’.

Further reading: ‘Power hungry: How AI fuels data centre energy demand and calls for more sustainability’. AI is changing the game for data centres across the globe. However, its adoption is not simply about handling complex tasks faster, it also consumes far more power and creates a surge in greenhouse gas emissions. As a result, tech giants are under pressure to make their vast data centres more energy efficient and sustainable in concert with rapid AI growth.
What are the risks associated with data centre growth? AI demands large amounts of data and power. And the demand is rising. But what does this mean in terms of energy consumption? Where are the new data centres likely to be located, and what will be the associated costs and risks? These and other issues were addressed in an Aurora Energy Research spring forum held in London in May 2025.

New Energy World™

How much energy does AI actually consume?

Testing different approaches

Asking the right questions

You might also be interested in