Session 2: Working with AI Coding Agents - AI-Augmented Data Science for Climate Leadership

This session explores how large language models function as coding agents within integrated development environments. You'll work with the OWID CO2 emissions dataset—a well-curated, accessible dataset ideal for learning AI-assisted data analysis workflows.

Part 0: Repository Setup

Step 1: Accept the GitHub Classroom Assignment

Your instructor will email you a secure GitHub Classroom link for Module 1. Click the link to create your personal repository. GitHub Classroom automatically creates a private copy of the module template repository under your account.

New to GitHub Classroom? See the GitHub Classroom documentation for more details.

Step 2: Clone to VS Code

Open VS Code and clone your new repository:

Press Ctrl+Shift+P (or Cmd+Shift+P on Mac)
Type "Git: Clone" and select it
Paste your repository URL from GitHub
Choose a local folder and open the cloned repository

If prompted to sign in to GitHub, VS Code will open a browser for device authentication. Follow the prompts to authorize VS Code.

Step 3: Set Up the Python Environment

Open Copilot Chat in VS Code and give it this instruction:

"Set up the python environment"

Your coding agent will follow the instructions in .vscode/copilot-instructions.md to create a Conda environment with the correct dependencies (ibis-framework, altair, pandas, jupyter).

Step 4: Restart and Select Kernel

After environment setup completes, restart VS Code to ensure the new environment is detected
Open the session notebook: notebooks/session2.ipynb
Click "Select Kernel" in the top-right of the notebook
Choose Python Environments → csol208 (the Conda environment you created)

You're now ready to work through the session exercises with AI assistance.

The Data: OWID CO2 Dataset

We're using the Our World in Data CO2 and Greenhouse Gas Emissions dataset — one of the most comprehensive open datasets on global emissions.

50,000+ data points

254 countries & regions

1750–2024 years covered

Key columns we'll use:

country, year — identifiers
co2 — total emissions (million tonnes)
co2_per_capita — emissions per person
coal_co2, oil_co2, gas_co2 — by fuel type

View full documentation →

What You'll Explore

The session notebook guides you through progressively complex tasks:

Loading and exploration: Data dimensions, temporal coverage, key variables
Filtering and ranking: Identifying top emitters, excluding aggregate regions
Time series visualization: Emissions trajectories for major countries
Compositional analysis: Emissions breakdown by fuel source
Groupby operations: Continental aggregations, percentage change calculations
Independent analysis: Choose your own analytical question

Throughout, you'll observe how coding agents handle these tasks autonomously versus where they need guidance—and reflect on the boundary between automated execution and human judgment.

Learning Objectives

Distinguish between chat-based LLM interfaces and coding agents with local execution capabilities
Develop fluency in specifying data operations through natural language
Evaluate the boundary between autonomous agent tasks and those requiring human guidance
Assess when programmatic approaches offer advantages over manual analysis

What to Commit

Before leaving, push your work:

git add .
git commit -m "Session 2: AI coding agents exercise"
git push

Your completed notebook should include:

All code cells executed with output
Completed reflection responses on agent performance
Your independent analysis with supporting visualization

Key Takeaways

Execution context matters — The same LLM model behaves differently in a browser chat versus an IDE with code execution
Natural language drives analysis — You can specify complex data operations without memorizing syntax
Verification remains essential — AI accelerates exploration but doesn't eliminate the need for human judgment

← Session 1 Continue to Session 3 →