ML SystemPart 5: Safety & TrustChapter 18

Trustworthy CH.18 ~25 min

Sustainable AI

Explores the environmental impact of ML systems. Covers carbon footprint estimation, energy efficiency, and strategies for green AI.

carbon footprintenergy efficiencygreen AIsustainabilityenvironmental impact

Read in mlsysbook.ai

Estimate the carbon footprint of ML training using power consumption, duration, and grid carbon intensity
Compare the environmental impact of training vs. inference across the full model lifecycle
Evaluate energy efficiency techniques including architecture selection, mixed precision, and early stopping
Implement carbon tracking in ML workflows using tools like CodeCarbon and carbontracker
Design carbon-aware computing strategies that schedule workloads to minimize environmental impact
Analyze data center sustainability metrics including PUE, WUE, and renewable energy percentages
Apply lifecycle assessment principles to evaluate the total environmental impact of ML systems beyond carbon

01 The Environmental Impact of ML Viz

The environmental impact of machine learning has grown dramatically as models have scaled in size and training compute. Training a single large language model can emit as much carbon as five cars over their entire lifetimes. As ML becomes pervasive, its collective environmental footprint has become a significant concern for the research community and society at large.

0 tonnes

CO2 from training one large Transformer with NAS

NLP compute increase (2012 to 2019)

Strubell et al. (2019): A Wake-Up Call

Strubell et al. estimated that training a large Transformer model with neural architecture search produced approximately 284 tonnes of CO2 -- equivalent to five times the lifetime emissions of an average American car including manufacturing. The paper also found that the NLP community's total compute had increased by 300,000x from 2012 to 2019. This single paper galvanized the sustainability conversation in ML.

Definition

Embodied Carbon

The greenhouse gas emissions associated with manufacturing, transporting, and disposing of hardware (GPUs, servers, networking equipment), as distinct from operational emissions from electricity consumption during use. For modern ML accelerators, embodied carbon can represent 30-50% of total lifecycle emissions.

Primary Environmental Costs

The primary environmental costs of ML come from multiple sources that span the full hardware and software lifecycle. Understanding these costs is essential for prioritizing reduction efforts.

Figure: Carbon Footprint of Large-Scale ML Training

Figure 18.1: Carbon Footprint Breakdown of ML Systems

Electricity for computation: Thousands of GPUs running for weeks or months consume megawatt-hours of power. GPT-4 training is estimated to have consumed 50-100 GWh of electricity.
Cooling: Data centers require significant energy and water to keep hardware within operating temperatures. Cooling accounts for 30-40% of total data center energy in older facilities.
Embodied carbon: Manufacturing GPUs and servers involves energy-intensive semiconductor fabrication, rare earth mineral extraction, and global shipping logistics.
Water usage: Large data centers consume millions of gallons of water annually for evaporative cooling. A single Google data center can use 450,000 gallons of water per day.
Electronic waste: GPU replacement cycles of 3-5 years generate growing volumes of e-waste containing toxic materials and precious metals.

The Scale of Training Costs

Table 18.1: Estimated training costs for major foundation models. *Lower CO2 despite high energy reflects cleaner grid sources. Figures from published papers and credible third-party estimates.
Model	Year	Params	Est. Training Energy (MWh)	Est. CO2 (tonnes)	Est. Cost ($M)
BERT-Large	2018	340M	~12	~1.5	~0.01
GPT-3	2020	175B	~1,287	~552	~4.6
PaLM	2022	540B	~3,400	~271*	~8-12
LLaMA 2 70B	2023	70B	~291	~31*	~2
GPT-4 (est.)	2023	~1.8T MoE	~51,000	~5,100	~63-78
Llama 3 405B	2024	405B	~7,400	~590*	~30-40

Table 18.1: Estimated training costs for major foundation models. *Lower CO2 despite high energy reflects cleaner grid sources. Figures from published papers and credible third-party estimates.

Meta's OPT-175B Carbon Report

Meta made a deliberate transparency effort with OPT-175B (2022), publishing its training logbook and carbon footprint. The model consumed approximately 324 MWh and emitted ~75 tonnes of CO2. Meta noted that pre-training accounted for only a fraction of total compute spent -- failed runs, restarts due to hardware faults, and hyperparameter exploration roughly doubled the effective cost. This candor revealed how published numbers often undercount total impact.

Full Lifecycle Footprint

Table 18.2: Components of ML's full lifecycle environmental impact.
Lifecycle Stage	Contribution	Often Overlooked?
Training	High per-run cost, but one-time for a given model	No -- most widely discussed
Inference	Can dominate total cost for widely deployed models	Yes -- often ignored in research
Experimentation	Hyperparameter search can multiply training cost 10-100x	Yes -- rarely reported
Hardware lifecycle	Manufacturing, shipping, and e-waste disposal	Yes -- embodied carbon frequently omitted
Data processing	Data cleaning, tokenization, deduplication at scale	Yes -- assumed negligible but grows with data

Table 18.2: Components of ML's full lifecycle environmental impact.

Inference Dominates at Scale

For a model serving millions of users daily, inference energy can exceed training energy within days of deployment. Google reported that inference accounts for roughly 60% of their total ML energy consumption. By early 2025, ChatGPT alone was estimated to consume 564 MWh of electricity per day for inference -- equivalent to powering roughly 18,000 average US homes. Optimizing inference efficiency is therefore critical for deployed systems.

0 MWh/day

ChatGPT daily inference electricity (est. 2025)

Awareness of ML's environmental impact has grown significantly, driven by seminal research and growing public attention. This awareness is leading to new practices, metrics, and tools for measuring and reducing the carbon footprint of ML systems.

Start Measuring

You cannot reduce what you do not measure. Begin tracking the energy consumption and carbon emissions of your ML workloads today using tools like CodeCarbon or ML CO2 Impact. Even rough estimates reveal the relative costs of different approaches and guide optimization efforts.

Carbon Footprint

The total greenhouse gas emissions caused by ML system development and operation, measured in equivalent tons of CO2.

Embodied Carbon

The carbon emissions associated with manufacturing, transporting, and disposing of hardware, distinct from operational emissions from electricity use.

02 Carbon Footprint Estimation

Estimating the carbon footprint of ML training requires combining hardware power consumption, training duration, and the carbon intensity of the electricity source. Multiple formulas and frameworks exist, but the core relationship is straightforward.

Definition

Carbon Intensity

The amount of CO2 emitted per unit of electricity generated, measured in grams of CO2 per kilowatt-hour (gCO2/kWh). This value varies dramatically by energy source: near zero for solar/wind, ~20 for hydro, ~400 for natural gas, and ~800-1000 gCO2/kWh for coal.

The Carbon Emissions Formula

CO_2 = E_{\text{consumed}} \times CI_{\text{grid}} \times PUE

Where E is energy consumed in kWh, CI is the carbon intensity of the local grid in gCO2/kWh, and PUE is the Power Usage Effectiveness of the data center (typically 1.1-1.6 for modern facilities).

Definition

Power Usage Effectiveness (PUE)

The ratio of total data center energy consumption to the energy consumed by IT equipment alone. A PUE of 1.0 means all energy goes to computing; a PUE of 1.5 means 50% additional energy is used for cooling, lighting, and other overhead. Modern hyperscale data centers achieve PUE of 1.1-1.2.

PUE = \frac{\text{Total Facility Energy}}{\text{IT Equipment Energy}}

Strubell et al. Training Cost Model

Strubell et al. (2019) proposed a widely-cited model for estimating training costs that accounts for GPU count, training time, GPU power draw, and data center overhead.

CO_2 = \frac{p \cdot t \cdot PUE \cdot CI}{1000}

Measuring Power Consumption

Power consumption can be measured directly using hardware monitoring tools or estimated from hardware specifications and utilization rates. Direct measurement is always preferred over estimation.

Table 18.3: GPU/accelerator power consumption and efficiency comparison (2024-2025 data). While newer GPUs consume more absolute power, their performance-per-watt improves substantially each generation.
GPU Model	TDP (Watts)	FP16 TFLOPS	Typical ML Power Draw	Perf/Watt (TFLOPS/W)
NVIDIA V100 (2017)	300W	125	250-280W	0.42
NVIDIA A100 (2020)	400W	312	300-350W	0.78
NVIDIA H100 SXM (2023)	700W	990	550-650W	1.41
NVIDIA B200 (2024)	1000W	2,250	700-900W	2.25
Google TPU v5e (2023)	~200W	~197	~170W	~0.99
Google TPU v5p (2024)	~400W	~459	~350W	~1.15
AMD MI300X (2024)	750W	1,307	550-700W	1.74

Table 18.3: GPU/accelerator power consumption and efficiency comparison (2024-2025 data). While newer GPUs consume more absolute power, their performance-per-watt improves substantially each generation.

python
class="tok-comment"># Measuring GPU power with CodeCarbon
from codecarbon import EmissionsTracker

class="tok-comment"># Track emissions for an entire training run
tracker = EmissionsTracker(
    project_name=class="tok-string">"bert-finetune",
    measure_power_secs=class="tok-number">15,     class="tok-comment"># Sample power every class="tok-number">15 seconds
    tracking_mode=class="tok-string">"process",    class="tok-comment"># Track only this process
    log_level=class="tok-string">"warning",
    save_to_file=True,
    output_dir=class="tok-string">"./carbon_logs",
)
tracker.start()

class="tok-comment"># ... your training code ...
for epoch in range(num_epochs):
    train_one_epoch(model, train_loader, optimizer)
    val_loss = evaluate(model, val_loader)
    print(class="tok-string">f"Epoch {epoch}: val_loss={val_loss:.4f}")

emissions_kg = tracker.stop()
print(class="tok-string">f"Training emissions: {emissions_kg:.4f} kg CO2eq")
print(class="tok-string">f"Energy consumed: {tracker.final_emissions_data.energy_consumed:.4f} kWh")
print(class="tok-string">f"Duration: {tracker.final_emissions_data.duration:.0f} seconds")

python
class="tok-comment"># Direct GPU power measurement with nvidia-smi and carbontracker
import subprocess
import time
from carbontracker.tracker import CarbonTracker

class="tok-comment"># Option class="tok-number">1: Manual nvidia-smi power sampling
def get_gpu_power_watts():
    class="tok-string">class="tok-string">""class="tok-string">"Read instantaneous GPU power draw using nvidia-smi."class="tok-string">""
    result = subprocess.run(
        [class="tok-string">"nvidia-smi", class="tok-string">"--query-gpu=power.draw",
         class="tok-string">"--format=csv,noheader,nounits"],
        capture_output=True, text=True,
    )
    powers = [float(p) for p in result.stdout.strip().split(class="tok-string">"\n")]
    return sum(powers)  class="tok-comment"># Total across all GPUs

def estimate_training_carbon(
    training_fn,
    carbon_intensity_gco2_kwh: float = class="tok-number">400,
    pue: float = class="tok-number">1.1,
):
    class="tok-string">class="tok-string">""class="tok-string">"Estimate carbon emissions from a training function."class="tok-string">""
    power_samples = []
    start = time.time()

    class="tok-comment"># Sample power in a background thread in production;
    class="tok-comment"># simplified here for clarity
    training_fn()

    elapsed_hours = (time.time() - start) / class="tok-number">3600
    avg_power_w = sum(power_samples) / len(power_samples) if power_samples else class="tok-number">300
    energy_kwh = (avg_power_w / class="tok-number">1000) * elapsed_hours * pue
    carbon_kg = energy_kwh * carbon_intensity_gco2_kwh / class="tok-number">1000
    return {class="tok-string">"energy_kwh": energy_kwh, class="tok-string">"carbon_kg": carbon_kg}

class="tok-comment"># Option class="tok-number">2: Use carbontracker (alternative to CodeCarbon)
tracker = CarbonTracker(epochs=num_epochs)
for epoch in range(num_epochs):
    tracker.epoch_start()
    train_one_epoch(model, train_loader, optimizer)
    tracker.epoch_end()
tracker.stop()  class="tok-comment"># Prints predicted total emissions

Geographic Variation in Carbon Intensity

Carbon intensity varies by more than 40x across regions, making location one of the highest-leverage decisions for reducing ML carbon emissions.

Table 18.4: Carbon intensity by region with 2024 data. Training the same model in South Africa vs. Iceland produces ~60x more carbon emissions.
Region / Cloud Zone	Primary Source	gCO2/kWh (2024)	Renewable %
Iceland	Geothermal + Hydro	~15	~100%
Norway / Nordics	Hydroelectric	~20	~98%
Quebec, Canada	Hydroelectric	~20	~95%
France	Nuclear	~55	~92%
Oregon, US (us-west)	Hydro + Wind	~90	~75%
California, US	Mixed (solar, gas)	~200	~55%
Virginia, US (us-east-1)	Mixed (gas, nuclear)	~300	~35%
Germany	Mixed (coal, renewables)	~350	~50%
Japan	Mixed (gas, coal)	~450	~25%
Poland	Coal	~650	~22%
India	Coal-dominant	~700	~20%
South Africa	Coal	~900	~10%

Table 18.4: Carbon intensity by region with 2024 data. Training the same model in South Africa vs. Iceland produces ~60x more carbon emissions.

Time-of-Day Matters

Carbon intensity fluctuates throughout the day as the generation mix changes. Solar-heavy grids are cleanest midday; wind-heavy grids vary with weather. Training the same model at noon vs. midnight in California can result in 2-3x different emissions. Services like ElectricityMaps.com and WattTime.org provide real-time carbon intensity data via API, enabling automated scheduling.

Cloud Provider Carbon Footprints

Table 18.5: Major cloud provider renewable energy commitments (2024). Note: "matched" means renewable energy certificates purchased to offset usage, not necessarily real-time clean energy consumption.
Cloud Provider	Carbon-Free Energy % (2024)	Carbon Reporting Tool	Lowest-Carbon Regions
Google Cloud	64% (global avg)	Carbon Footprint Dashboard	Finland, Iowa, Oregon
Microsoft Azure	~100% matched (contracted)	Emissions Impact Dashboard	Sweden, Norway, Switzerland
AWS	~100% matched (contracted)	Customer Carbon Footprint Tool	Canada, Oregon, Ireland
Oracle Cloud	~50% (estimated)	Limited reporting	UK, Switzerland

Table 18.5: Major cloud provider renewable energy commitments (2024). Note: "matched" means renewable energy certificates purchased to offset usage, not necessarily real-time clean energy consumption.

Beware of Carbon Accounting Methods

Cloud providers often report renewable energy percentages using annual matching with Renewable Energy Certificates (RECs) -- they buy enough certificates to "match" annual electricity use. However, a data center may still run on fossil fuels at night while matching with solar credits from daytime. Google's "24/7 carbon-free energy" initiative aims for real-time matching, which is a more stringent and meaningful metric.

Reporting Best Practices

Report total energy consumption in kWh, including failed runs and hyperparameter search
State the carbon intensity of the electricity used (gCO2/kWh) and its source
Report resulting carbon emissions in kg CO2eq, including PUE overhead
Include hardware type, count, and training duration for reproducibility
Distinguish between market-based (with RECs) and location-based (actual grid) accounting
Provide familiar reference comparisons (e.g., equivalent transatlantic flights) for non-technical audiences

Carbon Intensity

The amount of CO2 emitted per unit of electricity generated, measured in gCO2/kWh, which varies dramatically by energy source and location.

CodeCarbon

An open-source Python package that tracks the carbon emissions of computing by measuring electricity consumption and applying regional carbon intensity data.

03 Energy Efficiency Techniques

Energy-efficient ML starts with choosing the right model architecture for the task. Over-parameterized models waste energy on unnecessary computation. Architecture selection is the single highest-leverage decision for reducing energy consumption.

Definition

Energy Efficiency

The ratio of useful computation (model quality achieved) to energy consumed. In ML, this is often expressed as accuracy per watt-hour or quality per FLOP, enabling comparison across approaches.

Architecture Is King

Before optimizing training procedures or hardware, first ask whether a smaller or more efficient architecture can meet your quality requirements. EfficientNet achieves ImageNet accuracy comparable to much larger models at 8x fewer FLOPs. Knowledge-distilled models can match their teachers at a fraction of the inference cost. Mixture-of-Experts (MoE) models activate only a subset of parameters per input, achieving high capacity with lower per-sample energy.

Hardware Selection and Efficiency

Hardware selection and utilization directly impact energy efficiency. Each GPU generation typically offers 2-3x better energy efficiency for ML workloads over the previous generation, though absolute power consumption also increases.

\text{Energy Efficiency} = \frac{\text{Useful FLOPS}}{\text{Power (Watts)}} = \frac{\text{Throughput (samples/sec)} \times \text{FLOPs/sample}}{\text{Power Draw (W)}}

Table 18.6: Accelerator generations show significant improvements in energy efficiency (TFLOPS per watt), though absolute power consumption also rises.
Hardware	TDP (Watts)	FP16 TFLOPS	Efficiency (TFLOPS/W)	Year
NVIDIA V100	300W	125	0.42	2017
NVIDIA A100 SXM	400W	312	0.78	2020
NVIDIA H100 SXM	700W	990	1.41	2023
NVIDIA B200	1000W	2,250	2.25	2024
Google TPU v4	~275W	~275	~1.00	2022
Google TPU v5p	~400W	~459	~1.15	2024
AMD MI300X	750W	1,307	1.74	2024
Intel Gaudi 3	~600W	~1,835 (BF16)	~3.06	2024

Table 18.6: Accelerator generations show significant improvements in energy efficiency (TFLOPS per watt), though absolute power consumption also rises.

GPU Utilization Is Critical

A GPU idling between batches or waiting for data still draws 60-80% of its peak power. Poor GPU utilization means you pay the energy cost without getting proportional computation. Tools like NVIDIA Nsight, PyTorch Profiler, and DCGM can identify utilization bottlenecks. Typical ML workloads achieve only 30-50% of peak FLOPS; closing this gap is a significant energy efficiency opportunity.

Training Efficiency Techniques

Mixed precision training: Use FP16/BF16 for most operations, reducing energy per computation while maintaining accuracy
Learning rate scheduling: Cosine annealing and warm restarts help converge faster with less total computation
Early stopping: Monitor validation loss and stop training when improvement plateaus, avoiding wasted energy
Efficient hyperparameter search: Bayesian optimization explores the space 3-10x more efficiently than grid search
Gradient accumulation: Simulate larger batches without proportionally more memory or energy
Progressive resizing: Start training on smaller inputs and gradually increase resolution (common in vision)
Checkpoint and resume: Save checkpoints frequently so hardware failures do not waste all prior computation

python
class="tok-comment"># Energy-aware training configuration example
import torch
from torch.cuda.amp import autocast, GradScaler

class="tok-comment"># Mixed precision training reduces energy by class="tok-number">1.5-2x on modern GPUs
scaler = GradScaler()

for epoch in range(num_epochs):
    for batch in train_loader:
        optimizer.zero_grad()

        class="tok-comment"># BF16 mixed precision -- more FLOPS per watt
        with autocast(dtype=torch.bfloat16):
            outputs = model(batch[class="tok-string">"input_ids"], batch[class="tok-string">"attention_mask"])
            loss = criterion(outputs, batch[class="tok-string">"labels"])

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

    class="tok-comment"># Early stopping saves energy on unproductive epochs
    val_loss = evaluate(model, val_loader)
    if early_stopper.should_stop(val_loss):
        print(class="tok-string">f"Early stopping at epoch {epoch} -- saving energy")
        break

Mixed Precision Savings

Mixed precision training on an A100 GPU can achieve nearly 2x the throughput of FP32 training with negligible accuracy loss. For a 100-hour training run, this translates to ~50 hours of GPU time saved and proportional energy reduction. On H100 with FP8 support, the speedup can reach 3x over FP32 for supported operations.

Inference Efficiency

Inference energy efficiency is critical because inference accounts for the majority of total energy consumption for widely deployed models. A model serving millions of requests daily consumes far more cumulative energy than its one-time training cost.

The Inference Multiplier

For a model that took 1,000 GPU-hours to train but serves 10 million requests per day, inference energy can exceed the total training energy within the first week of deployment. By mid-2025, OpenAI processes an estimated 1 billion+ API calls per day. At scale, even small per-request efficiency gains translate to massive energy savings.

Quantization: Reduce precision to INT8 or INT4 for 2-4x speedup with minimal quality loss
Pruning: Remove unimportant weights to reduce computation by 50-90%
Knowledge distillation: Train a smaller student model to mimic the larger teacher
Speculative decoding: Use a small draft model to propose tokens, verified in parallel by the large model
KV-cache optimization: Paged attention (vLLM) reduces memory waste and enables higher throughput
Batching and caching: Maximize GPU utilization through continuous batching and semantic caching

Energy Efficiency

The ratio of useful computation (model quality achieved) to energy consumed, a key metric for sustainable ML systems.

PUE

Power Usage Effectiveness, the ratio of total facility energy to IT equipment energy in a data center, measuring cooling and infrastructure overhead.

04 Green AI and Sustainable Practices

The Green AI movement advocates for making efficiency a first-class research metric alongside accuracy. Traditional "Red AI" focuses solely on pushing accuracy higher regardless of computational cost, while Green AI seeks to achieve good accuracy efficiently.

Definition

Green AI

A research philosophy and movement that treats computational efficiency as a first-class evaluation metric alongside model accuracy, advocating that papers and projects report their computational costs and seek to minimize environmental impact. Coined by Schwartz et al. (2020).

Red AI vs. Green AI

Schwartz et al. (2020) coined the terms "Red AI" and "Green AI." Red AI pursues state-of-the-art accuracy through massive computation (e.g., scaling laws that demand ever-larger models). Green AI asks: can we achieve 95% of the quality at 10% of the cost? The answer is frequently yes. They proposed reporting FLOPs alongside accuracy in all papers, similar to how we report model size today.

Carbon-Aware Computing

Carbon-aware computing schedules ML workloads to run when and where the electricity grid is cleanest. By shifting training to times of high renewable energy generation or to low-carbon data center regions, organizations can reduce emissions by 30-50% without changing models or hardware.

Google's Carbon-Aware Scheduling

Google has implemented carbon-intelligent computing that shifts flexible workloads (including ML training) to times and locations with cleaner energy. Their system uses 48-hour carbon intensity forecasts to schedule batch workloads. In 2023, Google reported that carbon-aware scheduling shifted 23% of compute to times with lower carbon intensity, avoiding significant emissions with zero impact on performance. This was possible because many ML training jobs have flexible deadlines.

python
class="tok-comment"># Carbon-aware job scheduler using ElectricityMaps API
import requests
from datetime import datetime, timedelta

ELECTRICITY_MAPS_TOKEN = class="tok-string">"your_api_token"
REGIONS = {
    class="tok-string">"us-west-class="tok-number">2": class="tok-string">"US-NW-BPAT",   class="tok-comment"># Oregon (BPA, mostly hydro)
    class="tok-string">"us-east-class="tok-number">1": class="tok-string">"US-MIDA-PJM",   class="tok-comment"># Virginia (PJM, mixed)
    class="tok-string">"eu-west-class="tok-number">1": class="tok-string">"IE",             class="tok-comment"># Ireland
    class="tok-string">"eu-north-class="tok-number">1": class="tok-string">"SE",            class="tok-comment"># Sweden (clean grid)
}

def get_carbon_intensity(zone: str) -> float:
    class="tok-string">class="tok-string">""class="tok-string">"Fetch real-time carbon intensity for a grid zone."class="tok-string">""
    resp = requests.get(
        class="tok-string">f"https://api.electricitymap.org/v3/carbon-intensity/latest",
        params={class="tok-string">"zone": zone},
        headers={class="tok-string">"auth-token": ELECTRICITY_MAPS_TOKEN},
    )
    return resp.json()[class="tok-string">"carbonIntensity"]  class="tok-comment"># gCO2/kWh

def select_greenest_region(regions: dict) -> tuple[str, float]:
    class="tok-string">class="tok-string">""class="tok-string">"Find the cloud region with lowest carbon intensity right now."class="tok-string">""
    best_region, best_ci = None, float(class="tok-string">"inf")
    for cloud_region, grid_zone in regions.items():
        ci = get_carbon_intensity(grid_zone)
        print(class="tok-string">f"  {cloud_region} ({grid_zone}): {ci:.0f} gCO2/kWh")
        if ci < best_ci:
            best_ci = ci
            best_region = cloud_region
    return best_region, best_ci

class="tok-comment"># Usage: select the greenest region before launching training
region, intensity = select_greenest_region(REGIONS)
print(class="tok-string">f"\nLaunch training in {region} ({intensity:.0f} gCO2/kWh)")

Leverage Flexibility

Many ML training jobs have flexible deadlines. If your model does not need to be ready until Friday, you can schedule training to run during the cleanest grid windows over the week. Even shifting by a few hours can reduce carbon emissions by 20-40% on some grids. The Google Carbon Aware SDK and Green Software Foundation's carbon-aware SDK provide libraries for integrating this into your pipelines.

Reuse and Transfer Learning

Reuse through pre-trained models and transfer learning is one of the most effective sustainability strategies. Training a large foundation model once and sharing it for thousands of downstream tasks amortizes the training cost and eliminates redundant computation.

\text{Amortized Cost} = \frac{C_{\text{pretrain}}}{N_{\text{downstream}}} + C_{\text{finetune}}

The Hugging Face Effect

The Hugging Face model hub hosts over 800,000 pre-trained models (as of early 2026). When a researcher fine-tunes BERT for a new classification task (taking ~1 GPU-hour), they avoid re-running the original BERT pre-training (~1,000 GPU-hours). Across hundreds of thousands of users, model sharing prevents enormous amounts of redundant computation. Fine-tuning a pre-trained model uses roughly 0.1% of the energy of training from scratch.

Organizational Sustainability Practices

Set carbon budgets for ML projects and track spending against them
Report energy consumption and carbon emissions in papers and model cards
Choose cloud regions with low-carbon electricity grids
Invest in renewable energy procurement or power purchase agreements (PPAs)
Prefer fine-tuning or adapting existing models (LoRA, QLoRA) over training from scratch
Retire unused models and endpoints to avoid idle energy consumption
Use spot/preemptible instances where possible to improve overall fleet utilization
Set maximum training run budgets (FLOPs, GPU-hours, or cost) to prevent runaway experiments

EU AI Act and Sustainability Reporting

The EU AI Act (effective 2025-2026) includes provisions requiring reporting of energy consumption and environmental impact for high-risk AI systems. Similar regulations are being considered in other jurisdictions. Organizations that build measurement practices now will be ahead of upcoming compliance requirements.

Green AI

A research philosophy that prioritizes computational efficiency alongside accuracy, seeking high-quality results with minimal environmental impact.

Carbon-Aware Computing

Scheduling computational workloads to coincide with periods of low carbon intensity on the electrical grid, reducing emissions without changing the computation.

05 Measuring and Reducing Environmental Impact

Comprehensive environmental accounting for ML goes beyond carbon emissions to include water usage, electronic waste, and resource extraction for hardware manufacturing. A full picture of environmental impact requires considering all of these dimensions.

Definition

Lifecycle Assessment (LCA)

A systematic methodology for evaluating the total environmental impact of a product or system across its entire lifecycle, from raw material extraction and manufacturing through operational use to end-of-life disposal and recycling. ISO 14040/14044 provides the international standard framework for LCA.

Water Usage in ML

Data center water consumption is a rapidly growing concern, especially as AI workloads surge. Water is used in evaporative cooling systems, cooling towers, and chilled water loops. In water-stressed regions, this consumption directly competes with agricultural and municipal water needs.

Table 18.7: Environmental dimensions of ML beyond carbon, with 2024-2025 scale estimates.
Environmental Dimension	Source in ML	Scale of Impact (2024-2025)
Carbon emissions	Electricity for training and inference	GPT-4 training: est. ~5,000 tonnes CO2
Water usage	Data center cooling (evaporative and chilled water)	Google: 6.1B gallons (2023); Microsoft: 7.8B gallons (2023)
Electronic waste	GPU and server replacement cycles (3-5 years)	~50M tonnes global e-waste/year; AI hardware share growing
Resource extraction	Rare earth minerals for chips and hardware	Cobalt, lithium, gallium, germanium -- geopolitically concentrated
Land use	Data center construction and solar/wind farms	Hyperscale data centers: 50-100 acres each; growing rapidly

Table 18.7: Environmental dimensions of ML beyond carbon, with 2024-2025 scale estimates.

Water Is Often Overlooked

Microsoft's water consumption rose 34% from 2021 to 2022, largely attributed to AI workloads. Google consumed 6.1 billion gallons in 2023. Li et al. (2023) estimated that training GPT-3 alone consumed approximately 700,000 liters of freshwater for cooling. A single ChatGPT conversation of 20-50 queries may require the equivalent of a 500ml bottle of water. As water scarcity becomes a growing global concern, water usage in ML deserves far more attention.

Table 18.8: Data center sustainability metrics -- industry average vs. best-in-class hyperscale facilities. WUE = Water Usage Effectiveness.
Data Center Metric	Industry Average	Best-in-Class (Hyperscale)	Frontier Target (2026+)
PUE	1.55-1.80	1.08-1.12	<1.05
WUE (L/kWh)	1.8-2.5	0.5-1.0	<0.3 (air-cooled)
Carbon-free energy %	30-50%	80-95%	100% 24/7 matched
Server utilization	15-25%	40-60%	>70%
Hardware lifecycle	3-4 years	5-6 years	6-8 years with refurb

Table 18.8: Data center sustainability metrics -- industry average vs. best-in-class hyperscale facilities. WUE = Water Usage Effectiveness.

Definition

Water Usage Effectiveness (WUE)

The ratio of annual water usage (in liters) to IT equipment energy (in kWh). Lower WUE indicates more water-efficient cooling. Air-cooled facilities can achieve near-zero WUE but may have higher PUE. The tradeoff between water and energy efficiency is a key design consideration.

python
class="tok-comment"># Comprehensive sustainability reporting for an ML project
from dataclasses import dataclass
from codecarbon import EmissionsTracker

class="tok-decorator">@dataclass
class SustainabilityReport:
    class="tok-string">class="tok-string">""class="tok-string">"Full sustainability report for an ML training run."class="tok-string">""
    model_name: str
    gpu_type: str
    gpu_count: int
    training_hours: float
    energy_kwh: float
    carbon_kg: float
    carbon_intensity_gco2kwh: float
    pue: float
    cloud_region: str
    estimated_water_liters: float  class="tok-comment"># Based on WUE

    class="tok-decorator">@property
    def car_mile_equivalent(self) -> float:
        class="tok-string">class="tok-string">""class="tok-string">"CO2 equivalent in car miles (404g CO2/mile for avg US car)."class="tok-string">""
        return self.carbon_kg * class="tok-number">1000 / class="tok-number">404

    class="tok-decorator">@property
    def household_day_equivalent(self) -> float:
        class="tok-string">class="tok-string">""class="tok-string">"Energy equivalent in US household days (class="tok-number">30 kWh/day avg)."class="tok-string">""
        return self.energy_kwh / class="tok-number">30

    def to_model_card_section(self) -> str:
        return (
            class="tok-string">f"class="tok-comment">## Environmental Impact\n"
            class="tok-string">f"- **Energy consumed**: {self.energy_kwh:.1f} kWh\n"
            class="tok-string">f"- **Carbon emitted**: {self.carbon_kg:.1f} kg CO2eq\n"
            class="tok-string">f"- **Cloud region**: {self.cloud_region}\n"
            class="tok-string">f"- **Grid intensity**: {self.carbon_intensity_gco2kwh:.0f} "
            class="tok-string">f"gCO2/kWh\n"
            class="tok-string">f"- **Hardware**: {self.gpu_count}x {self.gpu_type} for "
            class="tok-string">f"{self.training_hours:.1f} hours\n"
            class="tok-string">f"- **Equivalent to**: {self.car_mile_equivalent:.0f} car "
            class="tok-string">f"miles or {self.household_day_equivalent:.1f} US household "
            class="tok-string">f"days of electricity\n"
            class="tok-string">f"- **Estimated water**: {self.estimated_water_liters:.0f} "
            class="tok-string">f"liters\n"
        )

class="tok-comment"># Usage example
tracker = EmissionsTracker(project_name=class="tok-string">"my-model")
tracker.start()
class="tok-comment"># ... training ...
emissions = tracker.stop()

report = SustainabilityReport(
    model_name=class="tok-string">"my-model-v1",
    gpu_type=class="tok-string">"A100-80GB",
    gpu_count=class="tok-number">8,
    training_hours=class="tok-number">24.0,
    energy_kwh=tracker.final_emissions_data.energy_consumed,
    carbon_kg=emissions,
    carbon_intensity_gco2kwh=class="tok-number">200,
    pue=class="tok-number">1.1,
    cloud_region=class="tok-string">"us-west-class="tok-number">2 (Oregon)",
    estimated_water_liters=tracker.final_emissions_data.energy_consumed * class="tok-number">1.8,
)
print(report.to_model_card_section())

The Rebound Effect

The rebound effect, also known as Jevons paradox, poses a fundamental challenge to efficiency improvements. When ML becomes more efficient, it tends to be deployed more widely, potentially increasing total environmental impact even as per-unit impact decreases.

Jevons Paradox in ML

When GPT-3 required millions of dollars to train, few organizations attempted it. As efficient training techniques, smaller models like Llama, and LoRA fine-tuning emerged, hundreds of thousands of organizations fine-tuned and deployed their own language models. The total compute spent on language model training and inference has increased by orders of magnitude even as per-model costs decreased. IEA estimates that global data center electricity consumption could double from 2024 to 2026, driven largely by AI. This pattern underscores the need for absolute emissions targets, not just efficiency improvements.

The Path Forward

The path toward sustainable ML requires coordinated progress on multiple fronts. No single intervention is sufficient, but the following strategies can collectively reduce ML's environmental footprint.

More efficient algorithms and architectures that achieve target quality with less computation (e.g., MoE, sparse attention)
Cleaner energy sources for computation through renewable energy procurement, PPAs, and 24/7 carbon-free matching
Longer hardware lifecycles and better recycling to reduce embodied carbon and e-waste
Better measurement and reporting tools that make environmental costs visible and comparable across organizations
Cultural and incentive shifts that reward efficiency alongside accuracy in research and industry (e.g., efficiency tracks at conferences)
Regulatory frameworks that require transparency in energy and carbon reporting for AI systems
Investment in next-generation cooling technologies (liquid cooling, immersion cooling) that reduce both PUE and WUE

Practical First Steps

Start with what you can control today: (1) Measure your training and inference energy with CodeCarbon or carbontracker. (2) Choose efficient architectures and use pre-trained models when possible. (3) Select low-carbon cloud regions. (4) Use mixed precision and early stopping. (5) Report your environmental costs in model cards. (6) Set a carbon budget for your team. Collective action from individual practitioners creates the cultural shift the field needs.

Efficiency is doing things right; sustainability is doing the right things. In ML, we need both.
Adapted from Peter Drucker

Lifecycle Assessment

A comprehensive evaluation of the total environmental impact of a product or system across its entire lifecycle, from manufacturing through use to disposal.

Rebound Effect

The phenomenon where efficiency improvements lead to increased usage that partially or fully offsets the environmental gains, also known as Jevons paradox.

Key Takeaways

1Training large ML models has significant environmental impact, with a single training run potentially emitting thousands of tonnes of CO2.
2Carbon footprint depends heavily on location and timing due to large variations in electricity grid carbon intensity -- up to 60x across regions.
3Architecture selection is the highest-leverage decision for energy efficiency; efficient models can reduce energy by orders of magnitude.
4Inference energy often dominates training energy for widely deployed models, making serving optimization critical.
5Green AI advocates treating computational efficiency as a first-class research metric alongside accuracy.
6Carbon-aware computing can reduce emissions by 20-50% by scheduling workloads when and where renewable energy is abundant.
7Water consumption is a rapidly growing and often overlooked environmental cost of AI, with major providers consuming billions of gallons annually.
8The rebound effect (Jevons paradox) means efficiency improvements alone are insufficient -- absolute emissions targets are needed.

CH.18

Chapter Complete

Up next:AI for Good

Chapter Progress

Reading

Exercise

Interact with the visualization

Quiz

Sustainable AI Quiz

Test your understanding of AI's environmental impact, carbon footprint, energy efficiency, and green AI.

Ready to test your knowledge?

5 questionsRandomized from pool70% to pass

Learning Objectives

01 The Environmental Impact of ML Viz

Embodied Carbon

Primary Environmental Costs

The Scale of Training Costs

Full Lifecycle Footprint

Carbon Footprint

Embodied Carbon

02 Carbon Footprint Estimation

Carbon Intensity

The Carbon Emissions Formula

Power Usage Effectiveness (PUE)

Strubell et al. Training Cost Model

Measuring Power Consumption

Geographic Variation in Carbon Intensity

Cloud Provider Carbon Footprints

Reporting Best Practices

Carbon Intensity

CodeCarbon

03 Energy Efficiency Techniques

Energy Efficiency

Hardware Selection and Efficiency

Training Efficiency Techniques

Inference Efficiency

Energy Efficiency

PUE

04 Green AI and Sustainable Practices

Green AI

Carbon-Aware Computing

Reuse and Transfer Learning

Organizational Sustainability Practices

Green AI

Carbon-Aware Computing

05 Measuring and Reducing Environmental Impact

Lifecycle Assessment (LCA)

Water Usage in ML

Water Usage Effectiveness (WUE)

The Rebound Effect

The Path Forward

Lifecycle Assessment

Rebound Effect

Key Takeaways

Chapter Progress

Sustainable AI Quiz