From small data to structured analysis with LLM assistance
This session explores how large language models function as coding agents within integrated development environments. You'll work with the OWID CO2 emissions dataset—a well-curated, accessible dataset ideal for learning AI-assisted data analysis workflows.
Your instructor will email you a secure GitHub Classroom link for Module 1. Click the link to create your personal repository. GitHub Classroom automatically creates a private copy of the module template repository under your account.
New to GitHub Classroom? See the GitHub Classroom documentation for more details.
Open VS Code and clone your new repository:
Ctrl+Shift+P (or Cmd+Shift+P on Mac)If prompted to sign in to GitHub, VS Code will open a browser for device authentication. Follow the prompts to authorize VS Code.
Open Copilot Chat in VS Code and give it this instruction:
"Set up the python environment"
Your coding agent will follow the instructions in
.vscode/copilot-instructions.md to create a Conda environment with
the correct dependencies (ibis-framework, altair, pandas, jupyter).
notebooks/session2.ipynbYou're now ready to work through the session exercises with AI assistance.
We're using the Our World in Data CO2 and Greenhouse Gas Emissions dataset — one of the most comprehensive open datasets on global emissions.
| 50,000+ data points | 254 countries & regions | 1750–2024 years covered |
Key columns we'll use:
country, year — identifiersco2 — total emissions (million tonnes)co2_per_capita — emissions per personcoal_co2, oil_co2, gas_co2 — by fuel typeThe session notebook guides you through progressively complex tasks:
Throughout, you'll observe how coding agents handle these tasks autonomously versus where they need guidance—and reflect on the boundary between automated execution and human judgment.
Before leaving, push your work:
git add .
git commit -m "Session 2: AI coding agents exercise"
git push
Your completed notebook should include: