This session explores how large language models function as coding agents within integrated development environments. You'll work with the OWID CO2 emissions dataset—a well-curated, accessible dataset ideal for learning AI-assisted data analysis workflows.


Part 0: Repository Setup

Step 1: Accept the GitHub Classroom Assignment

Your instructor will email you a secure GitHub Classroom link for Module 1. Click the link to create your personal repository. GitHub Classroom automatically creates a private copy of the module template repository under your account.

New to GitHub Classroom? See the GitHub Classroom documentation for more details.

Step 2: Clone to VS Code

Open VS Code and clone your new repository:

  1. Press Ctrl+Shift+P (or Cmd+Shift+P on Mac)
  2. Type "Git: Clone" and select it
  3. Paste your repository URL from GitHub
  4. Choose a local folder and open the cloned repository

If prompted to sign in to GitHub, VS Code will open a browser for device authentication. Follow the prompts to authorize VS Code.

Step 3: Set Up the Python Environment

Open Copilot Chat in VS Code and give it this instruction:

"Set up the python environment"

Your coding agent will follow the instructions in .vscode/copilot-instructions.md to create a Conda environment with the correct dependencies (ibis-framework, altair, pandas, jupyter).

Step 4: Restart and Select Kernel
  1. After environment setup completes, restart VS Code to ensure the new environment is detected
  2. Open the session notebook: notebooks/session2.ipynb
  3. Click "Select Kernel" in the top-right of the notebook
  4. Choose Python Environments → csol208 (the Conda environment you created)

You're now ready to work through the session exercises with AI assistance.


The Data: OWID CO2 Dataset

We're using the Our World in Data CO2 and Greenhouse Gas Emissions dataset — one of the most comprehensive open datasets on global emissions.

50,000+ data points 254 countries & regions 1750–2024 years covered

Key columns we'll use:

  • country, year — identifiers
  • co2 — total emissions (million tonnes)
  • co2_per_capita — emissions per person
  • coal_co2, oil_co2, gas_co2 — by fuel type

View full documentation →


What You'll Explore

The session notebook guides you through progressively complex tasks:

  • Loading and exploration: Data dimensions, temporal coverage, key variables
  • Filtering and ranking: Identifying top emitters, excluding aggregate regions
  • Time series visualization: Emissions trajectories for major countries
  • Compositional analysis: Emissions breakdown by fuel source
  • Groupby operations: Continental aggregations, percentage change calculations
  • Independent analysis: Choose your own analytical question

Throughout, you'll observe how coding agents handle these tasks autonomously versus where they need guidance—and reflect on the boundary between automated execution and human judgment.


Learning Objectives

  1. Distinguish between chat-based LLM interfaces and coding agents with local execution capabilities
  2. Develop fluency in specifying data operations through natural language
  3. Evaluate the boundary between autonomous agent tasks and those requiring human guidance
  4. Assess when programmatic approaches offer advantages over manual analysis

What to Commit

Before leaving, push your work:

git add .
git commit -m "Session 2: AI coding agents exercise"
git push

Your completed notebook should include:

  • All code cells executed with output
  • Completed reflection responses on agent performance
  • Your independent analysis with supporting visualization

Key Takeaways

  1. Execution context matters — The same LLM model behaves differently in a browser chat versus an IDE with code execution
  2. Natural language drives analysis — You can specify complex data operations without memorizing syntax
  3. Verification remains essential — AI accelerates exploration but doesn't eliminate the need for human judgment