Homework Assignment: Data Formats and Access#

Overview#

This assignment focuses on accessing and exploring different types of climate data. You will learn to work with various data formats and sources. Code templates are provided - your job is to fill in the gaps and extend the analysis.


Setup: Install Required Packages#

Run this cell first:

# You may need to install these
# pip install pooch requests

import requests
import json

Part 1: Working with JSON Data#

JSON (JavaScript Object Notation) is commonly used for web APIs. Many climate data services provide JSON outputs.

Provided Code:

import json
import requests
# Here's a sample JSON structure similar to what APIs return
sample_json = '''
{
  "station": "USC00305800",
  "name": "New York Central Park",
  "location": {
    "latitude": 40.7789,
    "longitude": -73.9692
  },
  "observations": [
    {"date": "2023-01-01", "temperature": 32, "precipitation": 0.0},
    {"date": "2023-01-02", "temperature": 28, "precipitation": 0.5},
    {"date": "2023-01-03", "temperature": 35, "precipitation": 0.0},
    {"date": "2023-01-04", "temperature": 38, "precipitation": 0.2},
    {"date": "2023-01-05", "temperature": 41, "precipitation": 0.0}
  ]
}
'''

# Parse the JSON
data = json.loads(sample_json)

# Access nested data
print("Station:", data['station'])
print("Location:", data['location'])
print("First observation:", data['observations'][0])

Your Tasks:

# 1. Extract and print all dates and temperatures (8 points)
print("Date, Temperature")
for obs in data['observations']:
    # YOUR CODE HERE: print date and temperature for each observation
    pass

# 2. Calculate average temperature (8 points)
total_temp = 0
count = 0
# YOUR CODE HERE: calculate average

avg_temp = 0  # Replace this
print(f"Average temperature: {avg_temp}°F")

# 3. Find days with precipitation (9 points)
print("\nDays with precipitation:")
# YOUR CODE HERE

Now try with a real API :

# Use a real weather API (you may need to sign up for a free API key)
# Example APIs: OpenWeatherMap, NOAA, Weather.gov
# YOUR CODE HERE 

Part 2: Downloading Files with Python#

Pooch is a Python tool for downloading and caching data files.

Provided Code:

import pooch

# Set up Pooch to download a file
# This example downloads a small air quality dataset
file_path = pooch.retrieve(
    url="https://github.com/pandas-dev/pandas/raw/main/doc/data/air_quality_no2.csv",
    known_hash=None
)

print("File downloaded to:", file_path)
print("File exists:", os.path.exists(file_path))

Your Tasks:

import os

# 1. Verify the file was downloaded (5 points)
# Check the file size
file_size = os.path.getsize(file_path)
print(f"File size: {file_size} bytes")

# YOUR CODE HERE: open the file and count how many lines it has
line_count = 0 

print(f"Number of lines: {line_count}")

# 2. Download another file (10 points)
# Find a climate dataset online using the sources we talked about in lecture
# Download it using Pooch

# YOUR CODE HERE:
# my_url = "..."
# my_file = pooch.retrieve(url=my_url, known_hash=None)  # hash optional for first try
# Print info about your downloaded file

# 3. Create a data inventory (5 points)
# List all the files you've downloaded in this assignment
print("\nData Inventory:")
print("1. meteorites.csv - NASA meteorite landings")
print("2. air_quality_no2.csv - Air quality NO2 measurements")
# YOUR CODE HERE: add your file from task 2

Part 3: Understanding NetCDF Metadata#

NetCDF is a common format for climate data. Even without loading the full dataset, we can examine its metadata using HTTP requests.

Provided Code:

import requests

# OPeNDAP provides metadata in different formats
# We'll get basic info about a climate dataset

base_url = "http://iridl.ldeo.columbia.edu/expert/SOURCES/.NOAA/.NCEP/.CPC/.UNIFIED_PRCP/.GAUGE_BASED/.GLOBAL/.v1p0/.Monthly/.RETRO/.rain/dods"

# Get DDS (Dataset Descriptor Structure) - describes the structure
dds_url = base_url + ".dds"
response = requests.get(dds_url)

print("Dataset Structure:")
print(response.text[:500])  # Print first 500 characters

Your Tasks:

# 1. Identify dimensions and variables (5 points)
# Look at the DDS output above and answer:
# - What are the dimension names?
# - What is the main variable name?
# - Write your answers in a markdown cell

# 2. Get data attributes (5 points)
# DAS (Dataset Attribute Structure) contains metadata
das_url = base_url + ".das"
# YOUR CODE HERE: make a request to das_url and print first 1000 characters

# 3. Document what you learned (5 points)
# In a markdown cell, write:
# - What does this dataset contain?
# - What time period does it cover?
# - What geographic region does it cover?
# - What are the units of the main variable?
# Find this info in the DAS output