Assignment: Pandas Groupby with Hurricane Data#
Import Numpy, Pandas and Matplotlib and set the display options.
Use the following code to load a CSV file of the NOAA IBTrACS hurricane dataset:
import pandas as pd
url = 'https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.ALL.list.v04r00.csv'
df = pd.read_csv(url, parse_dates=['ISO_TIME'], usecols=range(12),
skiprows=[1], na_values=[' ', 'NOT_NAMED'],
keep_default_na=False, dtype={'NAME': str})
df.tail()
| SID | SEASON | NUMBER | BASIN | SUBBASIN | NAME | ISO_TIME | NATURE | LAT | LON | WMO_WIND | WMO_PRES | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 716160 | 2024147N19089 | 2024 | 27 | NI | BB | REMAL | 2024-05-27 06:00:00 | NR | 23.0325 | 89.3509 | NaN | NaN |
| 716161 | 2024147N19089 | 2024 | 27 | NI | BB | REMAL | 2024-05-27 09:00:00 | NR | 23.3337 | 89.6178 | NaN | NaN |
| 716162 | 2024147N19089 | 2024 | 27 | NI | BB | REMAL | 2024-05-27 12:00:00 | NR | 23.6263 | 89.8799 | NaN | NaN |
| 716163 | 2024147N19089 | 2024 | 27 | NI | BB | REMAL | 2024-05-27 15:00:00 | NR | 23.9143 | 90.1400 | NaN | NaN |
| 716164 | 2024147N19089 | 2024 | 27 | NI | BB | REMAL | 2024-05-27 18:00:00 | NR | 24.2000 | 90.4000 | NaN | NaN |
Basin Key: (NI - North Indian, SI - South Indian, WP - Western Pacific, SP - Southern Pacific, EP - Eastern Pacific, NA - North Atlantic)
How many rows does this dataset have?
How many North Atlantic hurricanes are in this dataset?
1) Get the unique values of the BASIN, SUBBASIN, and NATURE columns#
2) Rename the WMO_WIND and WMO_PRES columns to WIND and PRES#
3) Get the 10 largest rows in the dataset by WIND#
You will notice some names are repeated.
4) Group the data on SID and get the 10 largest hurricanes by WIND#
5) Make a bar chart of the wind speed of the 20 strongest-wind hurricanes#
Use the name on the x-axis.
6) Plot the count of all datapoints by Basin#
as a bar chart
7) Plot the count of unique hurricanes by Basin#
as a bar chart.
8) Make a hexbin of the location of datapoints in Latitude and Longitude#
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.hexbin.html
9) Find Hurricane Katrina (from 2005) and plot its track as a scatter plot#
First find the SID of this hurricane.
Next get this hurricane’s group and plot its position as a scatter plot. Use wind speed to color the points.
10) Make time the index on your dataframe#
11) Plot the count of all datapoints per year as a timeseries#
You should use resample https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html
Which years stand out as having anomalous hurricane activity?