Session 3: Data Infrastructure & Big Data - AI-Augmented Data Science for Climate Leadership

This session shifts from the well-curated OWID interface to a more realistic scenario: working with Exiobase—a massive, environmentally-extended input-output database tracking emissions through global supply chains. You'll experience how data infrastructure choices affect both human and automated workflows.

Background: Exiobase

Exiobase is a multi-regional, environmentally-extended input-output (MRIO) database tracking economic activity and environmental impacts across global supply chains. Unlike OWID's aggregated national totals, Exiobase provides sectoral resolution: emissions from electricity generation, transport, manufacturing, agriculture, and hundreds of other economic activities across 49 regions.

163 industries

49 regions

1995–2022 annual coverage

Key characteristics:

Structure: Input-output tables linking inter-industry flows with environmental extensions
Coverage: CO₂, CH₄, land use, water consumption, and more
Applications: Consumption-based emissions accounting, supply chain analysis, trade embodied emissions, sectoral decarbonization pathways

Methodological note: MRIO models allocate emissions to final consumers rather than production locations. A smartphone manufactured in China but consumed in the US would have emissions attributed differently in Exiobase (consumption-based) versus OWID (territorial).

What You'll Explore

The session notebook (in your module template repository) guides you through:

Access friction: Attempt to load Exiobase from its original Zenodo archive—experience the barriers to automated workflows
Cloud-optimized formats: Load the same data from GeoParquet—observe the difference in agent performance
Schema exploration: Navigate the complex structure (163 industries × 49 regions × multiple environmental pressures)
Sectoral analysis: Identify top emission-intensive industries globally and by country
Cross-dataset validation: Compare Exiobase totals with OWID—investigate methodological discrepancies
Synthesis analysis: Independent investigation integrating multiple data sources

Learning Objectives

Evaluate how data format and access patterns constrain automated workflows
Compare schema complexity across datasets with different design goals
Integrate multi-source data requiring methodological reconciliation
Assess when coding agents can operate autonomously versus when domain expertise must guide analysis

Key Insight

Data engineering can eliminate access barriers but not conceptual complexity.

Moving from ZIP archives to cloud-optimized Parquet removes technical friction. But understanding MRIO methodology, industry classifications, and when two authoritative sources disagree—that requires domain knowledge no format change can provide.

← Session 2 Continue to Session 4 →

Background: Exiobase

Suggested Readings

What You'll Explore

Learning Objectives

Key Insight