Sustainable AI
Explores the environmental impact of ML systems. Covers carbon footprint estimation, energy efficiency, and strategies for green AI.
- Estimate the carbon footprint of ML training using power consumption, duration, and grid carbon intensity
- Compare the environmental impact of training vs. inference across the full model lifecycle
- Evaluate energy efficiency techniques including architecture selection, mixed precision, and early stopping
- Implement carbon tracking in ML workflows using tools like CodeCarbon and carbontracker
- Design carbon-aware computing strategies that schedule workloads to minimize environmental impact
- Analyze data center sustainability metrics including PUE, WUE, and renewable energy percentages
- Apply lifecycle assessment principles to evaluate the total environmental impact of ML systems beyond carbon
01 The Environmental Impact of ML Viz
The environmental impact of machine learning has grown dramatically as models have scaled in size and training compute. Training a single large language model can emit as much carbon as five cars over their entire lifetimes. As ML becomes pervasive, its collective environmental footprint has become a significant concern for the research community and society at large.
Strubell et al. estimated that training a large Transformer model with neural architecture search produced approximately 284 tonnes of CO2 -- equivalent to five times the lifetime emissions of an average American car including manufacturing. The paper also found that the NLP community's total compute had increased by 300,000x from 2012 to 2019. This single paper galvanized the sustainability conversation in ML.
Embodied Carbon
The greenhouse gas emissions associated with manufacturing, transporting, and disposing of hardware (GPUs, servers, networking equipment), as distinct from operational emissions from electricity consumption during use. For modern ML accelerators, embodied carbon can represent 30-50% of total lifecycle emissions.
Primary Environmental Costs
The primary environmental costs of ML come from multiple sources that span the full hardware and software lifecycle. Understanding these costs is essential for prioritizing reduction efforts.
Figure: Carbon Footprint of Large-Scale ML Training
- Electricity for computation: Thousands of GPUs running for weeks or months consume megawatt-hours of power. GPT-4 training is estimated to have consumed 50-100 GWh of electricity.
- Cooling: Data centers require significant energy and water to keep hardware within operating temperatures. Cooling accounts for 30-40% of total data center energy in older facilities.
- Embodied carbon: Manufacturing GPUs and servers involves energy-intensive semiconductor fabrication, rare earth mineral extraction, and global shipping logistics.
- Water usage: Large data centers consume millions of gallons of water annually for evaporative cooling. A single Google data center can use 450,000 gallons of water per day.
- Electronic waste: GPU replacement cycles of 3-5 years generate growing volumes of e-waste containing toxic materials and precious metals.
The Scale of Training Costs
| Model | Year | Params | Est. Training Energy (MWh) | Est. CO2 (tonnes) | Est. Cost ($M) |
|---|---|---|---|---|---|
| BERT-Large | 2018 | 340M | ~12 | ~1.5 | ~0.01 |
| GPT-3 | 2020 | 175B | ~1,287 | ~552 | ~4.6 |
| PaLM | 2022 | 540B | ~3,400 | ~271* | ~8-12 |
| LLaMA 2 70B | 2023 | 70B | ~291 | ~31* | ~2 |
| GPT-4 (est.) | 2023 | ~1.8T MoE | ~51,000 | ~5,100 | ~63-78 |
| Llama 3 405B | 2024 | 405B | ~7,400 | ~590* | ~30-40 |
Table 18.1: Estimated training costs for major foundation models. *Lower CO2 despite high energy reflects cleaner grid sources. Figures from published papers and credible third-party estimates.
Meta made a deliberate transparency effort with OPT-175B (2022), publishing its training logbook and carbon footprint. The model consumed approximately 324 MWh and emitted ~75 tonnes of CO2. Meta noted that pre-training accounted for only a fraction of total compute spent -- failed runs, restarts due to hardware faults, and hyperparameter exploration roughly doubled the effective cost. This candor revealed how published numbers often undercount total impact.
Full Lifecycle Footprint
| Lifecycle Stage | Contribution | Often Overlooked? |
|---|---|---|
| Training | High per-run cost, but one-time for a given model | No -- most widely discussed |
| Inference | Can dominate total cost for widely deployed models | Yes -- often ignored in research |
| Experimentation | Hyperparameter search can multiply training cost 10-100x | Yes -- rarely reported |
| Hardware lifecycle | Manufacturing, shipping, and e-waste disposal | Yes -- embodied carbon frequently omitted |
| Data processing | Data cleaning, tokenization, deduplication at scale | Yes -- assumed negligible but grows with data |
Table 18.2: Components of ML's full lifecycle environmental impact.
For a model serving millions of users daily, inference energy can exceed training energy within days of deployment. Google reported that inference accounts for roughly 60% of their total ML energy consumption. By early 2025, ChatGPT alone was estimated to consume 564 MWh of electricity per day for inference -- equivalent to powering roughly 18,000 average US homes. Optimizing inference efficiency is therefore critical for deployed systems.
Awareness of ML's environmental impact has grown significantly, driven by seminal research and growing public attention. This awareness is leading to new practices, metrics, and tools for measuring and reducing the carbon footprint of ML systems.
You cannot reduce what you do not measure. Begin tracking the energy consumption and carbon emissions of your ML workloads today using tools like CodeCarbon or ML CO2 Impact. Even rough estimates reveal the relative costs of different approaches and guide optimization efforts.
Carbon Footprint
The total greenhouse gas emissions caused by ML system development and operation, measured in equivalent tons of CO2.
Embodied Carbon
The carbon emissions associated with manufacturing, transporting, and disposing of hardware, distinct from operational emissions from electricity use.
02 Carbon Footprint Estimation
Estimating the carbon footprint of ML training requires combining hardware power consumption, training duration, and the carbon intensity of the electricity source. Multiple formulas and frameworks exist, but the core relationship is straightforward.
Carbon Intensity
The amount of CO2 emitted per unit of electricity generated, measured in grams of CO2 per kilowatt-hour (gCO2/kWh). This value varies dramatically by energy source: near zero for solar/wind, ~20 for hydro, ~400 for natural gas, and ~800-1000 gCO2/kWh for coal.
The Carbon Emissions Formula
CO_2 = E_{\text{consumed}} \times CI_{\text{grid}} \times PUEWhere E is energy consumed in kWh, CI is the carbon intensity of the local grid in gCO2/kWh, and PUE is the Power Usage Effectiveness of the data center (typically 1.1-1.6 for modern facilities).
Power Usage Effectiveness (PUE)
The ratio of total data center energy consumption to the energy consumed by IT equipment alone. A PUE of 1.0 means all energy goes to computing; a PUE of 1.5 means 50% additional energy is used for cooling, lighting, and other overhead. Modern hyperscale data centers achieve PUE of 1.1-1.2.
PUE = \frac{\text{Total Facility Energy}}{\text{IT Equipment Energy}}Strubell et al. Training Cost Model
Strubell et al. (2019) proposed a widely-cited model for estimating training costs that accounts for GPU count, training time, GPU power draw, and data center overhead.
CO_2 = \frac{p \cdot t \cdot PUE \cdot CI}{1000}Measuring Power Consumption
Power consumption can be measured directly using hardware monitoring tools or estimated from hardware specifications and utilization rates. Direct measurement is always preferred over estimation.
| GPU Model | TDP (Watts) | FP16 TFLOPS | Typical ML Power Draw | Perf/Watt (TFLOPS/W) |
|---|---|---|---|---|
| NVIDIA V100 (2017) | 300W | 125 | 250-280W | 0.42 |
| NVIDIA A100 (2020) | 400W | 312 | 300-350W | 0.78 |
| NVIDIA H100 SXM (2023) | 700W | 990 | 550-650W | 1.41 |
| NVIDIA B200 (2024) | 1000W | 2,250 | 700-900W | 2.25 |
| Google TPU v5e (2023) | ~200W | ~197 | ~170W | ~0.99 |
| Google TPU v5p (2024) | ~400W | ~459 | ~350W | ~1.15 |
| AMD MI300X (2024) | 750W | 1,307 | 550-700W | 1.74 |
Table 18.3: GPU/accelerator power consumption and efficiency comparison (2024-2025 data). While newer GPUs consume more absolute power, their performance-per-watt improves substantially each generation.
class="tok-comment"># Measuring GPU power with CodeCarbon
from codecarbon import EmissionsTracker
class="tok-comment"># Track emissions for an entire training run
tracker = EmissionsTracker(
project_name=class="tok-string">"bert-finetune",
measure_power_secs=class="tok-number">15, class="tok-comment"># Sample power every class="tok-number">15 seconds
tracking_mode=class="tok-string">"process", class="tok-comment"># Track only this process
log_level=class="tok-string">"warning",
save_to_file=True,
output_dir=class="tok-string">"./carbon_logs",
)
tracker.start()
class="tok-comment"># ... your training code ...
for epoch in range(num_epochs):
train_one_epoch(model, train_loader, optimizer)
val_loss = evaluate(model, val_loader)
print(class="tok-string">f"Epoch {epoch}: val_loss={val_loss:.4f}")
emissions_kg = tracker.stop()
print(class="tok-string">f"Training emissions: {emissions_kg:.4f} kg CO2eq")
print(class="tok-string">f"Energy consumed: {tracker.final_emissions_data.energy_consumed:.4f} kWh")
print(class="tok-string">f"Duration: {tracker.final_emissions_data.duration:.0f} seconds")class="tok-comment"># Direct GPU power measurement with nvidia-smi and carbontracker
import subprocess
import time
from carbontracker.tracker import CarbonTracker
class="tok-comment"># Option class="tok-number">1: Manual nvidia-smi power sampling
def get_gpu_power_watts():
class="tok-string">class="tok-string">""class="tok-string">"Read instantaneous GPU power draw using nvidia-smi."class="tok-string">""
result = subprocess.run(
[class="tok-string">"nvidia-smi", class="tok-string">"--query-gpu=power.draw",
class="tok-string">"--format=csv,noheader,nounits"],
capture_output=True, text=True,
)
powers = [float(p) for p in result.stdout.strip().split(class="tok-string">"\n")]
return sum(powers) class="tok-comment"># Total across all GPUs
def estimate_training_carbon(
training_fn,
carbon_intensity_gco2_kwh: float = class="tok-number">400,
pue: float = class="tok-number">1.1,
):
class="tok-string">class="tok-string">""class="tok-string">"Estimate carbon emissions from a training function."class="tok-string">""
power_samples = []
start = time.time()
class="tok-comment"># Sample power in a background thread in production;
class="tok-comment"># simplified here for clarity
training_fn()
elapsed_hours = (time.time() - start) / class="tok-number">3600
avg_power_w = sum(power_samples) / len(power_samples) if power_samples else class="tok-number">300
energy_kwh = (avg_power_w / class="tok-number">1000) * elapsed_hours * pue
carbon_kg = energy_kwh * carbon_intensity_gco2_kwh / class="tok-number">1000
return {class="tok-string">"energy_kwh": energy_kwh, class="tok-string">"carbon_kg": carbon_kg}
class="tok-comment"># Option class="tok-number">2: Use carbontracker (alternative to CodeCarbon)
tracker = CarbonTracker(epochs=num_epochs)
for epoch in range(num_epochs):
tracker.epoch_start()
train_one_epoch(model, train_loader, optimizer)
tracker.epoch_end()
tracker.stop() class="tok-comment"># Prints predicted total emissionsGeographic Variation in Carbon Intensity
Carbon intensity varies by more than 40x across regions, making location one of the highest-leverage decisions for reducing ML carbon emissions.
| Region / Cloud Zone | Primary Source | gCO2/kWh (2024) | Renewable % |
|---|---|---|---|
| Iceland | Geothermal + Hydro | ~15 | ~100% |
| Norway / Nordics | Hydroelectric | ~20 | ~98% |
| Quebec, Canada | Hydroelectric | ~20 | ~95% |
| France | Nuclear | ~55 | ~92% |
| Oregon, US (us-west) | Hydro + Wind | ~90 | ~75% |
| California, US | Mixed (solar, gas) | ~200 | ~55% |
| Virginia, US (us-east-1) | Mixed (gas, nuclear) | ~300 | ~35% |
| Germany | Mixed (coal, renewables) | ~350 | ~50% |
| Japan | Mixed (gas, coal) | ~450 | ~25% |
| Poland | Coal | ~650 | ~22% |
| India | Coal-dominant | ~700 | ~20% |
| South Africa | Coal | ~900 | ~10% |
Table 18.4: Carbon intensity by region with 2024 data. Training the same model in South Africa vs. Iceland produces ~60x more carbon emissions.
Carbon intensity fluctuates throughout the day as the generation mix changes. Solar-heavy grids are cleanest midday; wind-heavy grids vary with weather. Training the same model at noon vs. midnight in California can result in 2-3x different emissions. Services like ElectricityMaps.com and WattTime.org provide real-time carbon intensity data via API, enabling automated scheduling.
Cloud Provider Carbon Footprints
| Cloud Provider | Carbon-Free Energy % (2024) | Carbon Reporting Tool | Lowest-Carbon Regions |
|---|---|---|---|
| Google Cloud | 64% (global avg) | Carbon Footprint Dashboard | Finland, Iowa, Oregon |
| Microsoft Azure | ~100% matched (contracted) | Emissions Impact Dashboard | Sweden, Norway, Switzerland |
| AWS | ~100% matched (contracted) | Customer Carbon Footprint Tool | Canada, Oregon, Ireland |
| Oracle Cloud | ~50% (estimated) | Limited reporting | UK, Switzerland |
Table 18.5: Major cloud provider renewable energy commitments (2024). Note: "matched" means renewable energy certificates purchased to offset usage, not necessarily real-time clean energy consumption.
Cloud providers often report renewable energy percentages using annual matching with Renewable Energy Certificates (RECs) -- they buy enough certificates to "match" annual electricity use. However, a data center may still run on fossil fuels at night while matching with solar credits from daytime. Google's "24/7 carbon-free energy" initiative aims for real-time matching, which is a more stringent and meaningful metric.
Reporting Best Practices
- Report total energy consumption in kWh, including failed runs and hyperparameter search
- State the carbon intensity of the electricity used (gCO2/kWh) and its source
- Report resulting carbon emissions in kg CO2eq, including PUE overhead
- Include hardware type, count, and training duration for reproducibility
- Distinguish between market-based (with RECs) and location-based (actual grid) accounting
- Provide familiar reference comparisons (e.g., equivalent transatlantic flights) for non-technical audiences
Carbon Intensity
The amount of CO2 emitted per unit of electricity generated, measured in gCO2/kWh, which varies dramatically by energy source and location.
CodeCarbon
An open-source Python package that tracks the carbon emissions of computing by measuring electricity consumption and applying regional carbon intensity data.
03 Energy Efficiency Techniques
Energy-efficient ML starts with choosing the right model architecture for the task. Over-parameterized models waste energy on unnecessary computation. Architecture selection is the single highest-leverage decision for reducing energy consumption.
Energy Efficiency
The ratio of useful computation (model quality achieved) to energy consumed. In ML, this is often expressed as accuracy per watt-hour or quality per FLOP, enabling comparison across approaches.
Before optimizing training procedures or hardware, first ask whether a smaller or more efficient architecture can meet your quality requirements. EfficientNet achieves ImageNet accuracy comparable to much larger models at 8x fewer FLOPs. Knowledge-distilled models can match their teachers at a fraction of the inference cost. Mixture-of-Experts (MoE) models activate only a subset of parameters per input, achieving high capacity with lower per-sample energy.
Hardware Selection and Efficiency
Hardware selection and utilization directly impact energy efficiency. Each GPU generation typically offers 2-3x better energy efficiency for ML workloads over the previous generation, though absolute power consumption also increases.
\text{Energy Efficiency} = \frac{\text{Useful FLOPS}}{\text{Power (Watts)}} = \frac{\text{Throughput (samples/sec)} \times \text{FLOPs/sample}}{\text{Power Draw (W)}}| Hardware | TDP (Watts) | FP16 TFLOPS | Efficiency (TFLOPS/W) | Year |
|---|---|---|---|---|
| NVIDIA V100 | 300W | 125 | 0.42 | 2017 |
| NVIDIA A100 SXM | 400W | 312 | 0.78 | 2020 |
| NVIDIA H100 SXM | 700W | 990 | 1.41 | 2023 |
| NVIDIA B200 | 1000W | 2,250 | 2.25 | 2024 |
| Google TPU v4 | ~275W | ~275 | ~1.00 | 2022 |
| Google TPU v5p | ~400W | ~459 | ~1.15 | 2024 |
| AMD MI300X | 750W | 1,307 | 1.74 | 2024 |
| Intel Gaudi 3 | ~600W | ~1,835 (BF16) | ~3.06 | 2024 |
Table 18.6: Accelerator generations show significant improvements in energy efficiency (TFLOPS per watt), though absolute power consumption also rises.
A GPU idling between batches or waiting for data still draws 60-80% of its peak power. Poor GPU utilization means you pay the energy cost without getting proportional computation. Tools like NVIDIA Nsight, PyTorch Profiler, and DCGM can identify utilization bottlenecks. Typical ML workloads achieve only 30-50% of peak FLOPS; closing this gap is a significant energy efficiency opportunity.
Training Efficiency Techniques
- Mixed precision training: Use FP16/BF16 for most operations, reducing energy per computation while maintaining accuracy
- Learning rate scheduling: Cosine annealing and warm restarts help converge faster with less total computation
- Early stopping: Monitor validation loss and stop training when improvement plateaus, avoiding wasted energy
- Efficient hyperparameter search: Bayesian optimization explores the space 3-10x more efficiently than grid search
- Gradient accumulation: Simulate larger batches without proportionally more memory or energy
- Progressive resizing: Start training on smaller inputs and gradually increase resolution (common in vision)
- Checkpoint and resume: Save checkpoints frequently so hardware failures do not waste all prior computation
class="tok-comment"># Energy-aware training configuration example
import torch
from torch.cuda.amp import autocast, GradScaler
class="tok-comment"># Mixed precision training reduces energy by class="tok-number">1.5-2x on modern GPUs
scaler = GradScaler()
for epoch in range(num_epochs):
for batch in train_loader:
optimizer.zero_grad()
class="tok-comment"># BF16 mixed precision -- more FLOPS per watt
with autocast(dtype=torch.bfloat16):
outputs = model(batch[class="tok-string">"input_ids"], batch[class="tok-string">"attention_mask"])
loss = criterion(outputs, batch[class="tok-string">"labels"])
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
class="tok-comment"># Early stopping saves energy on unproductive epochs
val_loss = evaluate(model, val_loader)
if early_stopper.should_stop(val_loss):
print(class="tok-string">f"Early stopping at epoch {epoch} -- saving energy")
breakMixed precision training on an A100 GPU can achieve nearly 2x the throughput of FP32 training with negligible accuracy loss. For a 100-hour training run, this translates to ~50 hours of GPU time saved and proportional energy reduction. On H100 with FP8 support, the speedup can reach 3x over FP32 for supported operations.
Inference Efficiency
Inference energy efficiency is critical because inference accounts for the majority of total energy consumption for widely deployed models. A model serving millions of requests daily consumes far more cumulative energy than its one-time training cost.
For a model that took 1,000 GPU-hours to train but serves 10 million requests per day, inference energy can exceed the total training energy within the first week of deployment. By mid-2025, OpenAI processes an estimated 1 billion+ API calls per day. At scale, even small per-request efficiency gains translate to massive energy savings.
- Quantization: Reduce precision to INT8 or INT4 for 2-4x speedup with minimal quality loss
- Pruning: Remove unimportant weights to reduce computation by 50-90%
- Knowledge distillation: Train a smaller student model to mimic the larger teacher
- Speculative decoding: Use a small draft model to propose tokens, verified in parallel by the large model
- KV-cache optimization: Paged attention (vLLM) reduces memory waste and enables higher throughput
- Batching and caching: Maximize GPU utilization through continuous batching and semantic caching
Energy Efficiency
The ratio of useful computation (model quality achieved) to energy consumed, a key metric for sustainable ML systems.
PUE
Power Usage Effectiveness, the ratio of total facility energy to IT equipment energy in a data center, measuring cooling and infrastructure overhead.
04 Green AI and Sustainable Practices
The Green AI movement advocates for making efficiency a first-class research metric alongside accuracy. Traditional "Red AI" focuses solely on pushing accuracy higher regardless of computational cost, while Green AI seeks to achieve good accuracy efficiently.
Green AI
A research philosophy and movement that treats computational efficiency as a first-class evaluation metric alongside model accuracy, advocating that papers and projects report their computational costs and seek to minimize environmental impact. Coined by Schwartz et al. (2020).
Schwartz et al. (2020) coined the terms "Red AI" and "Green AI." Red AI pursues state-of-the-art accuracy through massive computation (e.g., scaling laws that demand ever-larger models). Green AI asks: can we achieve 95% of the quality at 10% of the cost? The answer is frequently yes. They proposed reporting FLOPs alongside accuracy in all papers, similar to how we report model size today.
Carbon-Aware Computing
Carbon-aware computing schedules ML workloads to run when and where the electricity grid is cleanest. By shifting training to times of high renewable energy generation or to low-carbon data center regions, organizations can reduce emissions by 30-50% without changing models or hardware.
Google has implemented carbon-intelligent computing that shifts flexible workloads (including ML training) to times and locations with cleaner energy. Their system uses 48-hour carbon intensity forecasts to schedule batch workloads. In 2023, Google reported that carbon-aware scheduling shifted 23% of compute to times with lower carbon intensity, avoiding significant emissions with zero impact on performance. This was possible because many ML training jobs have flexible deadlines.
class="tok-comment"># Carbon-aware job scheduler using ElectricityMaps API
import requests
from datetime import datetime, timedelta
ELECTRICITY_MAPS_TOKEN = class="tok-string">"your_api_token"
REGIONS = {
class="tok-string">"us-west-class="tok-number">2": class="tok-string">"US-NW-BPAT", class="tok-comment"># Oregon (BPA, mostly hydro)
class="tok-string">"us-east-class="tok-number">1": class="tok-string">"US-MIDA-PJM", class="tok-comment"># Virginia (PJM, mixed)
class="tok-string">"eu-west-class="tok-number">1": class="tok-string">"IE", class="tok-comment"># Ireland
class="tok-string">"eu-north-class="tok-number">1": class="tok-string">"SE", class="tok-comment"># Sweden (clean grid)
}
def get_carbon_intensity(zone: str) -> float:
class="tok-string">class="tok-string">""class="tok-string">"Fetch real-time carbon intensity for a grid zone."class="tok-string">""
resp = requests.get(
class="tok-string">f"https://api.electricitymap.org/v3/carbon-intensity/latest",
params={class="tok-string">"zone": zone},
headers={class="tok-string">"auth-token": ELECTRICITY_MAPS_TOKEN},
)
return resp.json()[class="tok-string">"carbonIntensity"] class="tok-comment"># gCO2/kWh
def select_greenest_region(regions: dict) -> tuple[str, float]:
class="tok-string">class="tok-string">""class="tok-string">"Find the cloud region with lowest carbon intensity right now."class="tok-string">""
best_region, best_ci = None, float(class="tok-string">"inf")
for cloud_region, grid_zone in regions.items():
ci = get_carbon_intensity(grid_zone)
print(class="tok-string">f" {cloud_region} ({grid_zone}): {ci:.0f} gCO2/kWh")
if ci < best_ci:
best_ci = ci
best_region = cloud_region
return best_region, best_ci
class="tok-comment"># Usage: select the greenest region before launching training
region, intensity = select_greenest_region(REGIONS)
print(class="tok-string">f"\nLaunch training in {region} ({intensity:.0f} gCO2/kWh)")Many ML training jobs have flexible deadlines. If your model does not need to be ready until Friday, you can schedule training to run during the cleanest grid windows over the week. Even shifting by a few hours can reduce carbon emissions by 20-40% on some grids. The Google Carbon Aware SDK and Green Software Foundation's carbon-aware SDK provide libraries for integrating this into your pipelines.
Reuse and Transfer Learning
Reuse through pre-trained models and transfer learning is one of the most effective sustainability strategies. Training a large foundation model once and sharing it for thousands of downstream tasks amortizes the training cost and eliminates redundant computation.
\text{Amortized Cost} = \frac{C_{\text{pretrain}}}{N_{\text{downstream}}} + C_{\text{finetune}}The Hugging Face model hub hosts over 800,000 pre-trained models (as of early 2026). When a researcher fine-tunes BERT for a new classification task (taking ~1 GPU-hour), they avoid re-running the original BERT pre-training (~1,000 GPU-hours). Across hundreds of thousands of users, model sharing prevents enormous amounts of redundant computation. Fine-tuning a pre-trained model uses roughly 0.1% of the energy of training from scratch.
Organizational Sustainability Practices
- Set carbon budgets for ML projects and track spending against them
- Report energy consumption and carbon emissions in papers and model cards
- Choose cloud regions with low-carbon electricity grids
- Invest in renewable energy procurement or power purchase agreements (PPAs)
- Prefer fine-tuning or adapting existing models (LoRA, QLoRA) over training from scratch
- Retire unused models and endpoints to avoid idle energy consumption
- Use spot/preemptible instances where possible to improve overall fleet utilization
- Set maximum training run budgets (FLOPs, GPU-hours, or cost) to prevent runaway experiments
The EU AI Act (effective 2025-2026) includes provisions requiring reporting of energy consumption and environmental impact for high-risk AI systems. Similar regulations are being considered in other jurisdictions. Organizations that build measurement practices now will be ahead of upcoming compliance requirements.
Green AI
A research philosophy that prioritizes computational efficiency alongside accuracy, seeking high-quality results with minimal environmental impact.
Carbon-Aware Computing
Scheduling computational workloads to coincide with periods of low carbon intensity on the electrical grid, reducing emissions without changing the computation.
05 Measuring and Reducing Environmental Impact
Comprehensive environmental accounting for ML goes beyond carbon emissions to include water usage, electronic waste, and resource extraction for hardware manufacturing. A full picture of environmental impact requires considering all of these dimensions.
Lifecycle Assessment (LCA)
A systematic methodology for evaluating the total environmental impact of a product or system across its entire lifecycle, from raw material extraction and manufacturing through operational use to end-of-life disposal and recycling. ISO 14040/14044 provides the international standard framework for LCA.
Water Usage in ML
Data center water consumption is a rapidly growing concern, especially as AI workloads surge. Water is used in evaporative cooling systems, cooling towers, and chilled water loops. In water-stressed regions, this consumption directly competes with agricultural and municipal water needs.
| Environmental Dimension | Source in ML | Scale of Impact (2024-2025) |
|---|---|---|
| Carbon emissions | Electricity for training and inference | GPT-4 training: est. ~5,000 tonnes CO2 |
| Water usage | Data center cooling (evaporative and chilled water) | Google: 6.1B gallons (2023); Microsoft: 7.8B gallons (2023) |
| Electronic waste | GPU and server replacement cycles (3-5 years) | ~50M tonnes global e-waste/year; AI hardware share growing |
| Resource extraction | Rare earth minerals for chips and hardware | Cobalt, lithium, gallium, germanium -- geopolitically concentrated |
| Land use | Data center construction and solar/wind farms | Hyperscale data centers: 50-100 acres each; growing rapidly |
Table 18.7: Environmental dimensions of ML beyond carbon, with 2024-2025 scale estimates.
Microsoft's water consumption rose 34% from 2021 to 2022, largely attributed to AI workloads. Google consumed 6.1 billion gallons in 2023. Li et al. (2023) estimated that training GPT-3 alone consumed approximately 700,000 liters of freshwater for cooling. A single ChatGPT conversation of 20-50 queries may require the equivalent of a 500ml bottle of water. As water scarcity becomes a growing global concern, water usage in ML deserves far more attention.
| Data Center Metric | Industry Average | Best-in-Class (Hyperscale) | Frontier Target (2026+) |
|---|---|---|---|
| PUE | 1.55-1.80 | 1.08-1.12 | <1.05 |
| WUE (L/kWh) | 1.8-2.5 | 0.5-1.0 | <0.3 (air-cooled) |
| Carbon-free energy % | 30-50% | 80-95% | 100% 24/7 matched |
| Server utilization | 15-25% | 40-60% | >70% |
| Hardware lifecycle | 3-4 years | 5-6 years | 6-8 years with refurb |
Table 18.8: Data center sustainability metrics -- industry average vs. best-in-class hyperscale facilities. WUE = Water Usage Effectiveness.
Water Usage Effectiveness (WUE)
The ratio of annual water usage (in liters) to IT equipment energy (in kWh). Lower WUE indicates more water-efficient cooling. Air-cooled facilities can achieve near-zero WUE but may have higher PUE. The tradeoff between water and energy efficiency is a key design consideration.
class="tok-comment"># Comprehensive sustainability reporting for an ML project
from dataclasses import dataclass
from codecarbon import EmissionsTracker
class="tok-decorator">@dataclass
class SustainabilityReport:
class="tok-string">class="tok-string">""class="tok-string">"Full sustainability report for an ML training run."class="tok-string">""
model_name: str
gpu_type: str
gpu_count: int
training_hours: float
energy_kwh: float
carbon_kg: float
carbon_intensity_gco2kwh: float
pue: float
cloud_region: str
estimated_water_liters: float class="tok-comment"># Based on WUE
class="tok-decorator">@property
def car_mile_equivalent(self) -> float:
class="tok-string">class="tok-string">""class="tok-string">"CO2 equivalent in car miles (404g CO2/mile for avg US car)."class="tok-string">""
return self.carbon_kg * class="tok-number">1000 / class="tok-number">404
class="tok-decorator">@property
def household_day_equivalent(self) -> float:
class="tok-string">class="tok-string">""class="tok-string">"Energy equivalent in US household days (class="tok-number">30 kWh/day avg)."class="tok-string">""
return self.energy_kwh / class="tok-number">30
def to_model_card_section(self) -> str:
return (
class="tok-string">f"class="tok-comment">## Environmental Impact\n"
class="tok-string">f"- **Energy consumed**: {self.energy_kwh:.1f} kWh\n"
class="tok-string">f"- **Carbon emitted**: {self.carbon_kg:.1f} kg CO2eq\n"
class="tok-string">f"- **Cloud region**: {self.cloud_region}\n"
class="tok-string">f"- **Grid intensity**: {self.carbon_intensity_gco2kwh:.0f} "
class="tok-string">f"gCO2/kWh\n"
class="tok-string">f"- **Hardware**: {self.gpu_count}x {self.gpu_type} for "
class="tok-string">f"{self.training_hours:.1f} hours\n"
class="tok-string">f"- **Equivalent to**: {self.car_mile_equivalent:.0f} car "
class="tok-string">f"miles or {self.household_day_equivalent:.1f} US household "
class="tok-string">f"days of electricity\n"
class="tok-string">f"- **Estimated water**: {self.estimated_water_liters:.0f} "
class="tok-string">f"liters\n"
)
class="tok-comment"># Usage example
tracker = EmissionsTracker(project_name=class="tok-string">"my-model")
tracker.start()
class="tok-comment"># ... training ...
emissions = tracker.stop()
report = SustainabilityReport(
model_name=class="tok-string">"my-model-v1",
gpu_type=class="tok-string">"A100-80GB",
gpu_count=class="tok-number">8,
training_hours=class="tok-number">24.0,
energy_kwh=tracker.final_emissions_data.energy_consumed,
carbon_kg=emissions,
carbon_intensity_gco2kwh=class="tok-number">200,
pue=class="tok-number">1.1,
cloud_region=class="tok-string">"us-west-class="tok-number">2 (Oregon)",
estimated_water_liters=tracker.final_emissions_data.energy_consumed * class="tok-number">1.8,
)
print(report.to_model_card_section())The Rebound Effect
The rebound effect, also known as Jevons paradox, poses a fundamental challenge to efficiency improvements. When ML becomes more efficient, it tends to be deployed more widely, potentially increasing total environmental impact even as per-unit impact decreases.
When GPT-3 required millions of dollars to train, few organizations attempted it. As efficient training techniques, smaller models like Llama, and LoRA fine-tuning emerged, hundreds of thousands of organizations fine-tuned and deployed their own language models. The total compute spent on language model training and inference has increased by orders of magnitude even as per-model costs decreased. IEA estimates that global data center electricity consumption could double from 2024 to 2026, driven largely by AI. This pattern underscores the need for absolute emissions targets, not just efficiency improvements.
The Path Forward
The path toward sustainable ML requires coordinated progress on multiple fronts. No single intervention is sufficient, but the following strategies can collectively reduce ML's environmental footprint.
- More efficient algorithms and architectures that achieve target quality with less computation (e.g., MoE, sparse attention)
- Cleaner energy sources for computation through renewable energy procurement, PPAs, and 24/7 carbon-free matching
- Longer hardware lifecycles and better recycling to reduce embodied carbon and e-waste
- Better measurement and reporting tools that make environmental costs visible and comparable across organizations
- Cultural and incentive shifts that reward efficiency alongside accuracy in research and industry (e.g., efficiency tracks at conferences)
- Regulatory frameworks that require transparency in energy and carbon reporting for AI systems
- Investment in next-generation cooling technologies (liquid cooling, immersion cooling) that reduce both PUE and WUE
Start with what you can control today: (1) Measure your training and inference energy with CodeCarbon or carbontracker. (2) Choose efficient architectures and use pre-trained models when possible. (3) Select low-carbon cloud regions. (4) Use mixed precision and early stopping. (5) Report your environmental costs in model cards. (6) Set a carbon budget for your team. Collective action from individual practitioners creates the cultural shift the field needs.
Efficiency is doing things right; sustainability is doing the right things. In ML, we need both.
Lifecycle Assessment
A comprehensive evaluation of the total environmental impact of a product or system across its entire lifecycle, from manufacturing through use to disposal.
Rebound Effect
The phenomenon where efficiency improvements lead to increased usage that partially or fully offsets the environmental gains, also known as Jevons paradox.
Key Takeaways
- 1Training large ML models has significant environmental impact, with a single training run potentially emitting thousands of tonnes of CO2.
- 2Carbon footprint depends heavily on location and timing due to large variations in electricity grid carbon intensity -- up to 60x across regions.
- 3Architecture selection is the highest-leverage decision for energy efficiency; efficient models can reduce energy by orders of magnitude.
- 4Inference energy often dominates training energy for widely deployed models, making serving optimization critical.
- 5Green AI advocates treating computational efficiency as a first-class research metric alongside accuracy.
- 6Carbon-aware computing can reduce emissions by 20-50% by scheduling workloads when and where renewable energy is abundant.
- 7Water consumption is a rapidly growing and often overlooked environmental cost of AI, with major providers consuming billions of gallons annually.
- 8The rebound effect (Jevons paradox) means efficiency improvements alone are insufficient -- absolute emissions targets are needed.
CH.18
Chapter Complete
Chapter Progress
Interact with the visualization
Sustainable AI Quiz
Test your understanding of AI's environmental impact, carbon footprint, energy efficiency, and green AI.
Ready to test your knowledge?