Homework Assignment: Data Formats and Access#
Overview#
This assignment focuses on accessing and exploring different types of climate data. You will learn to work with various data formats and sources. Code templates are provided - your job is to fill in the gaps and extend the analysis.
Setup: Install Required Packages#
Run this cell first:
# You may need to install these
# pip install pooch requests
import requests
import json
Part 1: Working with JSON Data#
JSON (JavaScript Object Notation) is commonly used for web APIs. Many climate data services provide JSON outputs.
Provided Code:
import json
import requests
# Here's a sample JSON structure similar to what APIs return
sample_json = '''
{
"station": "USC00305800",
"name": "New York Central Park",
"location": {
"latitude": 40.7789,
"longitude": -73.9692
},
"observations": [
{"date": "2023-01-01", "temperature": 32, "precipitation": 0.0},
{"date": "2023-01-02", "temperature": 28, "precipitation": 0.5},
{"date": "2023-01-03", "temperature": 35, "precipitation": 0.0},
{"date": "2023-01-04", "temperature": 38, "precipitation": 0.2},
{"date": "2023-01-05", "temperature": 41, "precipitation": 0.0}
]
}
'''
# Parse the JSON
data = json.loads(sample_json)
# Access nested data
print("Station:", data['station'])
print("Location:", data['location'])
print("First observation:", data['observations'][0])
Your Tasks:
# 1. Extract and print all dates and temperatures (8 points)
print("Date, Temperature")
for obs in data['observations']:
# YOUR CODE HERE: print date and temperature for each observation
pass
# 2. Calculate average temperature (8 points)
total_temp = 0
count = 0
# YOUR CODE HERE: calculate average
avg_temp = 0 # Replace this
print(f"Average temperature: {avg_temp}°F")
# 3. Find days with precipitation (9 points)
print("\nDays with precipitation:")
# YOUR CODE HERE
Now try with a real API :
# Use a real weather API (you may need to sign up for a free API key)
# Example APIs: OpenWeatherMap, NOAA, Weather.gov
# YOUR CODE HERE
Part 2: Downloading Files with Python#
Pooch is a Python tool for downloading and caching data files.
Provided Code:
import pooch
# Set up Pooch to download a file
# This example downloads a small air quality dataset
file_path = pooch.retrieve(
url="https://github.com/pandas-dev/pandas/raw/main/doc/data/air_quality_no2.csv",
known_hash=None
)
print("File downloaded to:", file_path)
print("File exists:", os.path.exists(file_path))
Your Tasks:
import os
# 1. Verify the file was downloaded (5 points)
# Check the file size
file_size = os.path.getsize(file_path)
print(f"File size: {file_size} bytes")
# YOUR CODE HERE: open the file and count how many lines it has
line_count = 0
print(f"Number of lines: {line_count}")
# 2. Download another file (10 points)
# Find a climate dataset online using the sources we talked about in lecture
# Download it using Pooch
# YOUR CODE HERE:
# my_url = "..."
# my_file = pooch.retrieve(url=my_url, known_hash=None) # hash optional for first try
# Print info about your downloaded file
# 3. Create a data inventory (5 points)
# List all the files you've downloaded in this assignment
print("\nData Inventory:")
print("1. meteorites.csv - NASA meteorite landings")
print("2. air_quality_no2.csv - Air quality NO2 measurements")
# YOUR CODE HERE: add your file from task 2
Part 3: Understanding NetCDF Metadata#
NetCDF is a common format for climate data. Even without loading the full dataset, we can examine its metadata using HTTP requests.
Provided Code:
import requests
# OPeNDAP provides metadata in different formats
# We'll get basic info about a climate dataset
base_url = "http://iridl.ldeo.columbia.edu/expert/SOURCES/.NOAA/.NCEP/.CPC/.UNIFIED_PRCP/.GAUGE_BASED/.GLOBAL/.v1p0/.Monthly/.RETRO/.rain/dods"
# Get DDS (Dataset Descriptor Structure) - describes the structure
dds_url = base_url + ".dds"
response = requests.get(dds_url)
print("Dataset Structure:")
print(response.text[:500]) # Print first 500 characters
Your Tasks:
# 1. Identify dimensions and variables (5 points)
# Look at the DDS output above and answer:
# - What are the dimension names?
# - What is the main variable name?
# - Write your answers in a markdown cell
# 2. Get data attributes (5 points)
# DAS (Dataset Attribute Structure) contains metadata
das_url = base_url + ".das"
# YOUR CODE HERE: make a request to das_url and print first 1000 characters
# 3. Document what you learned (5 points)
# In a markdown cell, write:
# - What does this dataset contain?
# - What time period does it cover?
# - What geographic region does it cover?
# - What are the units of the main variable?
# Find this info in the DAS output