Syllabus - AI-Augmented Data Science for Climate Leadership

This is a new course in a rapidly evolving field. The syllabus will surely change as we adapt to emerging tools and techniques.

analytics Module 1: The AI-Data Analyst

Structured Data • Sessions 1-4

Goal: Build an interactive emissions dashboard in 4 sessions

Week 1 Session 1: IDE Setup and AI-Assisted Coding

Concept: Setting up VS Code with AI coding assistants (Continue.dev, GitHub Copilot). Using Google Antigravity IDE. Understanding how to work with AI-augmented development environments.

Tech: VS Code, Copilot, Python Virtual Environments

In-Class Lab: Hello Climate. Download a raw CSV of Global Carbon Budget data. Use natural language prompts to load it, clean headers, and output a basic trend line.

Week 1 Session 2: Working with AI Coding Agents

Concept: Understanding how large language models function as coding agents. Distinguishing between chat interfaces and agents with local code execution capabilities. Using natural language to specify data operations.

Tech: VS Code with GitHub Copilot, Ibis Framework, Altair

In-Class Lab: Emissions Analysis. Work with the OWID CO2 dataset (~50,000 rows) to explore filtering, aggregation, visualization, and independent analysis—all through AI-assisted coding.

Week 2 Session 3: Data Infrastructure & Big Data

Concept: Working with complex, large-scale environmental databases. Understanding how data format and access patterns constrain automated workflows. Multi-source data integration and methodological reconciliation.

Tech: Cloud-optimized GeoParquet, DuckDB, Multi-Regional Input-Output models

In-Class Lab: Exiobase Analysis. Work with the Exiobase-3 database—an environmentally-extended input-output model covering 163 industries × 49 regions × multiple environmental pressures. Compare results with OWID to understand methodological differences (territorial vs consumption-based accounting).

Week 2 Session 4: Publishing & Sharing Results

Concept: Communicating results effectively. Moving from exploratory analysis to shareable outputs. Understanding when different formats serve different audiences.

Tech: Streamlit dashboards, PDF generation, GitHub Pages

In-Class Lab: Publication Studio. Take your Module 1 analysis and produce multiple output formats: an interactive Streamlit dashboard for exploration, a polished PDF report for stakeholders, and a simple webpage summarizing key findings.

assignmentModule 1 Assessment (20%): Submit a deployed interactive dashboard that analyzes climate-related structured data. The dashboard should demonstrate data cleaning, validation, visualization, and database querying skills developed across the module's four sessions.

map Module 2: Spatial Data & Environmental Justice

Mapping Data Center Impacts • Sessions 5-7

Goal: Map and analyze the environmental and ecological impacts of data center expansion through an environmental justice lens

Week 3 Session 5: Mapping Data Center Locations

Concept: Introduction to geospatial analysis. Understanding Coordinate Reference Systems (CRS) and why they matter for accurate spatial analysis. Building on DuckDB skills from Module 1 to work with spatial data.

Tech: DuckDB Spatial extension, Python anymap library (maplibre wrapper), Point geometries

In-Class Lab: Data Center Atlas. Load a dataset of data center locations across the US. Use DuckDB Spatial to transform coordinates between CRS systems and create an interactive MapLibre visualization showing the distribution of data centers. Explore patterns in their geographic clustering.

Week 3 Session 6: Environmental Justice & Vector Data

Concept: Spatial joins and overlay analysis. Using census and demographic data to examine environmental justice dimensions of infrastructure siting. Who lives near data centers and what communities bear the environmental burden?

Tech: DuckDB Spatial joins, Census vector data (shapefiles/GeoJSON), demographic analysis

In-Class Lab: Data Center Environmental Justice Audit. Perform spatial joins between data center locations and census tract boundaries. Analyze demographic characteristics (income, race, education) of communities within buffer zones around data centers. Identify patterns of environmental injustice in data center siting decisions and visualize findings on an interactive map.

Week 4 Session 7: Biodiversity Impacts & Raster Analysis

Concept: Working with large-scale raster data for ecological analysis. Understanding the biodiversity impacts of data center expansion on local ecosystems. Extracting values from continuous spatial data layers.

Tech: Rasterio, DuckDB Spatial with raster operations, species richness datasets, cloud-optimized GeoTIFFs

In-Class Lab: Data Centers & Biodiversity Hotspots. Analyze the intersection of data center locations with biodiversity data layers (e.g., species richness rasters from NatureServe). Extract raster values at data center locations to assess ecological sensitivity. Identify data centers located in biodiversity hotspots and quantify potential habitat impacts. Create visualizations showing the ecological footprint of digital infrastructure expansion.

assignmentModule 2 Assessment (20%): Submit a spatial analysis report with interactive maps examining the environmental and social justice dimensions of data center siting. Must include point data mapping, vector-based demographic analysis, and raster-based biodiversity impact assessment.

description Module 3: Working with LLMs & Unstructured Data

LLM APIs & Document Intelligence • Sessions 8-10

Goal: Extract structured insights from unstructured corporate sustainability documents using modern LLM workflows

Week 4 Session 8: Introduction to LLM APIs

Concept: Moving beyond IDE chat assistants to programmatic LLM use. Understanding how to work with LLMs through APIs for reproducible, automated workflows. Introduction to open-source models through OpenRouter.

Tech: LangChain, OpenRouter (accessing open models like gpt-oss, Olmo, nemotron), OpenAI structured outputs (JSON mode)

In-Class Lab: Your First LLM Pipeline. Build a simple Python script that uses LangChain to send prompts to different open-source models via OpenRouter. Compare responses across models. Experiment with OpenAI's structured output feature to extract specific fields (company name, emission target, baseline year) from a sample text passage about corporate climate commitments.

Week 5 Session 9: Structured Data Extraction from PDFs

Concept: AI-era document parsing. Extracting structured information from messy, unstructured corporate documents without traditional web scraping tools. Understanding the types of sustainability and energy disclosures that climate professionals encounter: CDP reports, GRI disclosures, corporate sustainability reports, utility rate filings.

Tech: LangChain document loaders, OpenAI structured outputs with Pydantic schemas, PDF parsing libraries

In-Class Lab: Sustainability Report Parser. Students work with real public documents (e.g., Apple's Environmental Progress Report, Microsoft Sustainability Report, or utility Integrated Resource Plans). Build a pipeline that loads PDFs, chunks them intelligently, and uses LLMs with structured output schemas to extract specific data fields: renewable energy percentages, Scope 1/2/3 emissions, energy consumption metrics, water usage, and waste diversion rates. Output results as clean JSON or CSV for further analysis.

Week 5 Session 10: Model Context Protocol & Advanced Document Analysis

Concept: Introduction to Model Context Protocol (MCP) as a modern approach to giving LLMs access to external data sources and tools. Understanding how MCP servers provide structured interfaces for document processing, database access, and other external capabilities without traditional RAG embeddings.

Tech: Model Context Protocol, MCP servers for PDF/document processing, LangChain integration with MCP

In-Class Lab: Multi-Document ESG Analysis. Use MCP-based tools to analyze multiple sustainability documents simultaneously. Build a workflow that compares climate commitments across several Fortune 500 companies, identifying gaps, inconsistencies, and best practices. Students explore how MCP simplifies complex document workflows compared to traditional embedding-based RAG approaches.

assignmentModule 3 Assessment (20%): Submit a working document extraction pipeline that processes real corporate sustainability PDFs and outputs structured data (JSON/CSV). The system should use LLM APIs programmatically to extract specific sustainability metrics and demonstrate understanding of structured outputs and modern document intelligence workflows.

psychology Module 4: The Capstone Studio

Build Your MVP • Sessions 11-14

Goal: Build a deployable Minimum Viable Product (MVP)

Week 6 Session 11: Project Scoping & Architecture

Activity: Collaborative design session. Teams define their project goals, identify data sources, and sketch their technical approach using AI as a design partner.

Focus: Feasibility and impact. Does the project leverage techniques from across the course? Will it provide actionable insights for climate decision-makers?

Week 6 Session 12: In-class Development Sprint

Activity: Focused development time. Instructors provide technical guidance and help teams overcome implementation challenges.

Focus: Building core functionality. Whether that's data pipelines, spatial analysis workflows, document extraction systems, or interactive visualizations—teams make substantial progress on their MVP.

Week 7 Session 13: Refinement & User Experience

Activity: Peer testing and feedback. Teams experience each other's projects and provide constructive feedback on usability and impact.

Focus: User experience and communication. Is the tool intuitive? Are insights clearly communicated? Does the project effectively tell its climate story?

Week 7 Session 14: Demo Day

Format: Lightning presentations showcasing live projects.

Evaluation: Does the project demonstrate technical sophistication? Does it address a real climate challenge? Could it influence decision-making in the real world?

assignmentModule 4 Assessment (20%): Present and submit a functional MVP that integrates techniques from across the course. Evaluation based on: technical implementation quality, user experience design, relevance to real climate challenges, and effectiveness of the live demonstration. Project repository and documentation must be included.

lightbulb Course Philosophy

This course takes a non-traditional approach. We won't master Python syntax, tidy data principles, Codd's third normal form, the mechanics of filters and joins, or the grammar of graphics—the traditional vocabulary of data science. For instructors and students familiar with conventional data science curricula, this will feel different.

We believe this is the right choice for our audience. Rather than building foundational programming skills from scratch, we focus on what climate professionals can accomplish today with modern AI-augmented tools. This is an authentic experience: these are the tools being used to solve real problems right now.

We acknowledge the risks. AI tools can produce incorrect results, and working at a higher level of abstraction can obscure understanding. But data science has always carried these risks—that's why software developers write unit tests and validation checks. Like any technology, AI coding assistants can be used well or poorly. Our goal is to teach you to use them well.

Learning Principles

Our course design reflects core principles of how people learn effectively:

work Authentic Context: Real climate challenges, real data, real tools
groups Social Learning: Collaborative projects, peer feedback, team problem-solving
build Active Practice: Hands-on labs every session, learning by doing
school Prior Knowledge: Building on your domain expertise in climate and sustainability