{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "98d9d0d8",
   "metadata": {
    "tags": [
     "papermill-error-cell-tag"
    ]
   },
   "source": [
    "<span style=\"color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;\">An Exception was encountered at '<a href=\"#papermill-error-cell\">In [1]</a>'.</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43986fc8",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": 0.009847,
     "end_time": "2025-11-25T01:42:46.437553",
     "exception": false,
     "start_time": "2025-11-25T01:42:46.427706",
     "status": "completed"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# ROF global monthly, annual, seasonal flows analysis <a id='top'></a>\n",
    "\n",
    "Use the following datasets\n",
    "\n",
    "1. reach-D19 gauge link ascii\n",
    "2. D19 flow site geopackage\n",
    "3. D19 discharge netCDF\n",
    "4. history netCD including river discharge\n",
    "\n",
    "[1. Setupt](#setup)\n",
    "\n",
    "[2. Loading data](#load_data)\n",
    "\n",
    "- Read monthly history files from archive. \n",
    "- Reference data: monthly discharge estimates at 922 big river mouths from Dai et al. 2019 data (D19)\n",
    "\n",
    "[3. Large 24 river analysis](#24_large_rivers)\n",
    "\n",
    "- Plotting time seriese (annual, seasonal cycle).\n",
    "- if D19 referece data is available, scatter plots at Large 24 selected rivers against D19 referece data\n",
    "\n",
    "[4. Large 50 rivers analysis](#50_large_rivers)\n",
    "\n",
    "- Annual flow summary table at large 50 selected rivers.\n",
    "- if D19 referece data is available, Scatter plot of annual flow against D19 reference data.\n",
    "\n",
    "[5. 922 rivers analysis](#922_rivers)\n",
    "\n",
    "- run only if reference flow is available\n",
    "- error statistics (%bias, rmse, correlation) at all 922 river sites.\n",
    "- plot error statistic on the global map\n",
    "- plot boxplots including case(s) for each error metric"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aae1c9e1",
   "metadata": {
    "tags": [
     "papermill-error-cell-tag"
    ]
   },
   "source": [
    "<span id=\"papermill-error-cell\" style=\"color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;\">Execution using papermill encountered an exception here and stopped:</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "11b508f0",
   "metadata": {
    "editable": true,
    "execution": {
     "iopub.execute_input": "2025-11-25T01:42:46.457945Z",
     "iopub.status.busy": "2025-11-25T01:42:46.457698Z",
     "iopub.status.idle": "2025-11-25T01:42:56.298505Z",
     "shell.execute_reply": "2025-11-25T01:42:56.297705Z"
    },
    "papermill": {
     "duration": 9.852122,
     "end_time": "2025-11-25T01:42:56.299370",
     "exception": true,
     "start_time": "2025-11-25T01:42:46.447248",
     "status": "failed"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "ename": "ModuleNotFoundError",
     "evalue": "No module named 'cupid_utils'",
     "output_type": "error",
     "traceback": [
      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
      "\u001b[31mModuleNotFoundError\u001b[39m                       Traceback (most recent call last)",
      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 16\u001b[39m\n\u001b[32m     13\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mcartopy\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mcrs\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mccrs\u001b[39;00m\n\u001b[32m     14\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mcartopy\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mfeature\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mcfeature\u001b[39;00m\n\u001b[32m---> \u001b[39m\u001b[32m16\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mcupid_utils\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mcu\u001b[39;00m  \u001b[38;5;66;03m# go buffs!\u001b[39;00m\n\u001b[32m     17\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mcupid_utils\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mrof\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m colors\n\u001b[32m     18\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mcupid_utils\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mrof\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m metrics \u001b[38;5;28;01mas\u001b[39;00m metric\n",
      "\u001b[31mModuleNotFoundError\u001b[39m: No module named 'cupid_utils'"
     ]
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import os, sys\n",
    "import glob\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib as mpl\n",
    "from matplotlib.ticker import FormatStrFormatter\n",
    "from sklearn.linear_model import LinearRegression\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import geopandas as gpd\n",
    "import xarray as xr\n",
    "import cartopy.crs as ccrs\n",
    "import cartopy.feature as cfeature\n",
    "\n",
    "import cupid_utils as cu  # go buffs!\n",
    "from cupid_utils.rof import colors\n",
    "from cupid_utils.rof import metrics as metric\n",
    "from cupid_utils.rof import load_yaml\n",
    "from cupid_utils.rof import no_time_variable\n",
    "\n",
    "rivers_50m = cfeature.NaturalEarthFeature(\"physical\", \"rivers_lake_centerlines\", \"50m\")\n",
    "land = cfeature.LAND\n",
    "\n",
    "print(\"\\nThe Python version: %s.%s.%s\" % sys.version_info[:3])\n",
    "print(xr.__name__, xr.__version__)\n",
    "print(pd.__name__, pd.__version__)\n",
    "print(gpd.__name__, gpd.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49dc57e4",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "-------------------------\n",
    "## 1. Setup <a id='setup'></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0b97ce6",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "parameters",
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Parameter Defaults\n",
    "# parameters are set in CUPiD's config.yml file\n",
    "# when running interactively, manually set the parameters below\n",
    "\n",
    "# global parameters\n",
    "CESM_output_dir = \"\"  # e.g., \"/glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing\"\n",
    "case_name = None  # case name: e.g., \"b.e30_beta02.BLT1850.ne30_t232.104\"\n",
    "base_case_name = None  # base case name: e.g., \"b.e23_alpha17f.BLT1850.ne30_t232.092\"\n",
    "base_case_output_dir = None\n",
    "start_date = \"\"  # simulation starting date: e.g., \"0001-01-01\"\n",
    "end_date = \"\"  # simulation ending date: \"0100-01-01\"\n",
    "base_start_date = \"\"  # base simulation starting date: \"0001-01-01\"\n",
    "base_end_date = \"\"  # base simulation ending date: \"0100-01-01\"\n",
    "serial = True  # use dask LocalCluster\n",
    "lc_kwargs = {}\n",
    "product = \"./\"\n",
    "\n",
    "# rof parameters\n",
    "hist_str = \"\"  # Used for hist file tag\n",
    "analysis_name = \"\"  # Used for Figure png names\n",
    "climo_nyears = 10  # number of years to compute the climatology\n",
    "grid_name = \"f09_f09_mosart\"  # ROF grid name used in case\n",
    "base_grid_name = (\n",
    "    grid_name  # spcify ROF grid name for base_case in config.yml if different than case\n",
    ")\n",
    "figureSave = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31125a1d",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": [
     "injected-parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# Parameters\n",
    "case_name = \"b.e30_alpha07c_cesm.B1850C_LTso.ne30_t232_wgx3.232\"\n",
    "base_case_name = \"b.e30_alpha07c_cesm.B1850C_LTso.ne30_t232_wgx3.228\"\n",
    "case_nickname = \"BLT1850_232\"\n",
    "base_case_nickname = \"BLT1850_228\"\n",
    "CESM_output_dir = \"/glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing\"\n",
    "start_date = \"0001-01-01\"\n",
    "end_date = \"0021-01-01\"\n",
    "climo_start_year = 11\n",
    "climo_end_year = 21\n",
    "base_start_date = \"0001-01-01\"\n",
    "base_end_date = \"0045-01-01\"\n",
    "base_climo_start_year = 36\n",
    "base_climo_end_year = 45\n",
    "obs_data_dir = (\n",
    "    \"/glade/campaign/cesm/development/cross-wg/diagnostic_framework/CUPiD_obs_data\"\n",
    ")\n",
    "ts_dir = None\n",
    "lc_kwargs = {\"threads_per_worker\": 1}\n",
    "serial = False\n",
    "hist_str = \"h0a\"\n",
    "analysis_name = \"\"\n",
    "grid_name = \"f09_f09_mosart\"\n",
    "climo_nyears = 10\n",
    "figureSave = False\n",
    "subset_kwargs = {}\n",
    "product = \"/glade/work/richling/cupid-test-2/CUPiD/examples/key_metrics/computed_notebooks//rof/global_discharge_gauge_compare_obs.ipynb\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "93613811",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "if base_case_output_dir is None:\n",
    "    base_case_output_dir = CESM_output_dir\n",
    "\n",
    "out_dir = os.path.join(os.path.dirname(product), \"rof_nb_output\")\n",
    "os.makedirs(out_dir, exist_ok=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b8233de",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# ROF additional setup\n",
    "setup = load_yaml(os.path.join(cu.__path__[0], \"..\", \"setup\", \"setup.yaml\"))\n",
    "\n",
    "ancillary_dir = setup[\n",
    "    \"ancillary_dir\"\n",
    "]  # ancillary directory including ROF domain, river network data etc.\n",
    "ref_flow_dir = setup[\"ref_flow_dir\"]  # including observed or reference flow data\n",
    "case_meta = setup[\"case_meta\"]  # Case metadata\n",
    "reach_gpkg = setup[\"reach_gpkg\"]  # river reach geopackage meta\n",
    "\n",
    "if not analysis_name:\n",
    "    analysis_name = case_name\n",
    "if not base_grid_name:\n",
    "    base_grid_name = grid_name\n",
    "\n",
    "case_dic = {\n",
    "    case_name: {\n",
    "        \"grid\": grid_name,\n",
    "        \"sim_period\": slice(f\"{start_date}\", f\"{end_date}\"),\n",
    "        \"climo_nyrs\": min(climo_nyears, int(end_date[:4]) - int(start_date[:4]) + 1),\n",
    "        \"output_dir\": CESM_output_dir,\n",
    "    }\n",
    "}\n",
    "if base_case_name is not None:\n",
    "    case_dic[base_case_name] = {\n",
    "        \"grid\": base_grid_name,\n",
    "        \"sim_period\": slice(f\"{base_start_date}\", f\"{base_end_date}\"),\n",
    "        \"climo_nyrs\": min(\n",
    "            climo_nyears, int(base_end_date[:4]) - int(base_start_date[:4]) + 1\n",
    "        ),\n",
    "        \"output_dir\": base_case_output_dir,\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c0a5980",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### Dasks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ea2783f",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# When running interactively, cupid_run should be set to False\n",
    "cupid_run = True\n",
    "\n",
    "client = None\n",
    "if cupid_run:\n",
    "    # Spin up cluster (if running in parallel)\n",
    "    if not serial:\n",
    "        from dask.distributed import Client, LocalCluster\n",
    "\n",
    "        cluster = LocalCluster(**lc_kwargs)\n",
    "        client = Client(cluster)\n",
    "else:\n",
    "    if not serial:\n",
    "        from dask.distributed import Client\n",
    "        from dask_jobqueue import PBSCluster\n",
    "\n",
    "        cluster = PBSCluster(\n",
    "            cores=1,\n",
    "            processes=1,\n",
    "            memory=\"50GB\",\n",
    "            queue=\"casper\",\n",
    "            walltime=\"01:00:00\",\n",
    "        )\n",
    "        cluster.scale(jobs=10)\n",
    "        client = Client(cluster)\n",
    "client"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9207c029",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "## 2. Loading data <a id='load_data'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9bef2706",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 2.1. Monthly/annual flow netCDFs\n",
    "- month_data (xr dataset) - read from archive\n",
    "- year_data (xr dataset) - computed from monthly\n",
    "- seas_data (xr dataset) - computed from monthly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32584621",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "reachID = {}\n",
    "month_data = {}\n",
    "year_data = {}\n",
    "seas_data = {}\n",
    "for case, meta in case_dic.items():\n",
    "    in_dire = os.path.join(meta[\"output_dir\"], case, \"rof/hist\")\n",
    "    model = case_meta[meta[\"grid\"]][\"model\"]\n",
    "    domain = case_meta[meta[\"grid\"]][\"domain_nc\"]\n",
    "    var_list = case_meta[meta[\"grid\"]][\"vars_read\"]\n",
    "\n",
    "    def preprocess(ds):\n",
    "        return ds[var_list]\n",
    "\n",
    "    year_list = [\n",
    "        \"{:04d}\".format(yr)\n",
    "        for yr in np.arange(\n",
    "            int(meta[\"sim_period\"].start[0:4]), int(meta[\"sim_period\"].stop[0:4]) + 1\n",
    "        )\n",
    "    ]\n",
    "\n",
    "    nc_list = []\n",
    "    for nc_path in sorted(glob.glob(f\"{in_dire}/{case}.{model}.{hist_str}.????-*.nc\")):\n",
    "        for yr in year_list:\n",
    "            if yr in os.path.basename(nc_path):\n",
    "                nc_list.append(nc_path)\n",
    "\n",
    "    # load data\n",
    "    ds = xr.open_mfdataset(\n",
    "        nc_list,\n",
    "        data_vars=\"minimal\",\n",
    "        parallel=True,\n",
    "        preprocess=preprocess,\n",
    "    ).sel(time=meta[\"sim_period\"])\n",
    "\n",
    "    # monthly\n",
    "    month_data[case] = ds.isel(time=slice(-meta[\"climo_nyrs\"] * 12, None)).load()\n",
    "    # annual\n",
    "    year_data[case] = (\n",
    "        ds.isel(time=slice(-meta[\"climo_nyrs\"] * 12, None))\n",
    "        .resample(time=\"YS\")\n",
    "        .mean(dim=\"time\")\n",
    "        .load()\n",
    "    )\n",
    "    # seasonal (compute here instead of reading for conisistent analysis period)\n",
    "    seas_data[case] = (\n",
    "        ds.isel(time=slice(-meta[\"climo_nyrs\"] * 12, None))\n",
    "        .groupby(\"time.month\")\n",
    "        .mean(\"time\")\n",
    "        .load()\n",
    "    )\n",
    "    vars_no_time = no_time_variable(month_data[case])\n",
    "    if vars_no_time:\n",
    "        seas_data[case][vars_no_time] = seas_data[case][vars_no_time].isel(\n",
    "            month=0, drop=True\n",
    "        )\n",
    "    mon_time = month_data[case].time.values\n",
    "    if domain == \"None\":\n",
    "        reachID[case] = month_data[case][\"reachID\"].values\n",
    "    else:\n",
    "        reachID[case] = xr.open_dataset(f\"{ancillary_dir}/{domain}\")[\"reachID\"].values\n",
    "    print(f\"Finished loading {case}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc8ed6f9",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 2.2 Large river ID and name ascii\n",
    "- big_river_50: dictionary {_site_id_:_river name_}\n",
    "- big_river_24: dictionary {_site_id_:_river name_}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab79440d",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(\n",
    "    os.path.join(cu.__path__[0], \"..\", \"setup\", \"large_river_50.txt\"),\n",
    "    index_col=\"river_name\",\n",
    ")\n",
    "big_river_50 = {key: values[\"site_id\"] for key, values in df.iterrows()}\n",
    "big_river_24 = {\n",
    "    key: values[\"site_id\"] for ix, (key, values) in enumerate(df.iterrows()) if ix < 24\n",
    "}  # The first 24 is used for plots"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8f4ccce",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 2.3. reach-D19 gauge link csv\n",
    "- gauge_reach_lnk (dataframe)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d5d0885",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "gauge_reach_lnk = {}\n",
    "for case, meta in case_dic.items():\n",
    "    gauge_reach_lnk[case] = pd.read_csv(\n",
    "        \"%s/D09/D09_925.%s.asc\" % (ref_flow_dir, case_meta[meta[\"grid\"]][\"network\"])\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1eb889af",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 2.4 D19 flow site shapefile\n",
    "- gauge_shp (dataframe)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ca46aa3",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "gauge_shp = gpd.read_file(\n",
    "    os.path.join(ref_flow_dir, \"D09\", \"geospatial\", \"D09_925.gpkg\")\n",
    ")\n",
    "gauge_shp = gauge_shp[gauge_shp[\"id\"] != 9999999]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbe6a395",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 2.5 D19 discharge data\n",
    "- ds_q_obs_mon (xr datasets)\n",
    "- ds_q_obs_yr (xr datasets)\n",
    "- dr_q_obs_seasonal (xr datasets)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3952b872",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "# read monthly data\n",
    "ds_q = xr.open_dataset(\n",
    "    \"%s/D09/coastal-stns-Vol-monthly.updated-May2019.mod.nc\" % (ref_flow_dir),\n",
    "    decode_times=False,\n",
    ")\n",
    "ds_q[\"time\"] = xr.cftime_range(\n",
    "    start=\"1900-01-01\", end=\"2018-12-01\", freq=\"MS\", calendar=\"standard\"\n",
    ")\n",
    "\n",
    "# monthly - if time_period is outside observation period, use the entire obs period\n",
    "obs_available = True\n",
    "if ds_q[\"time\"].sel(time=slice(str(mon_time[0]), str(mon_time[-1]))).values.size == 0:\n",
    "    obs_available = False\n",
    "    ds_q_obs_mon = ds_q[\"FLOW\"]\n",
    "else:\n",
    "    ds_q_obs_mon = ds_q[\"FLOW\"].sel(time=slice(str(mon_time[0]), str(mon_time[-1])))\n",
    "# compute annual flow from monthly\n",
    "ds_q_obs_yr = ds_q_obs_mon.resample(time=\"YE\").mean(dim=\"time\")\n",
    "# compute annual cycle at monthly scale\n",
    "dr_q_obs_seasonal = ds_q_obs_mon.groupby(\"time.month\").mean(\"time\")\n",
    "# annual mean, max, and min flow during time_period\n",
    "ds_q_obs_yr_min = ds_q_obs_yr.min(\"time\")\n",
    "ds_q_obs_yr_max = ds_q_obs_yr.max(\"time\")\n",
    "ds_q_obs_yr_mean = ds_q_obs_yr.mean(\"time\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7203a502",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 2.6 Get indices in observation and simulation for gauge name (processing)\n",
    "- gauge_plot (dictionary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aca1ad68",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "gauge_plot = {}\n",
    "for gname, site_id in big_river_50.items():\n",
    "    gauge_plot[gname] = {}\n",
    "    for case in case_dic.keys():\n",
    "        gauge_ix = [\n",
    "            i for i, gid in enumerate(ds_q.id.values) if gid == site_id\n",
    "        ]  # go through obs Dataset and get index matching to river (gauge) name\n",
    "        gauge_id = ds_q.id.values[gauge_ix][0]  ## guage ID\n",
    "        seg_id = (\n",
    "            gauge_reach_lnk[case]\n",
    "            .loc[gauge_reach_lnk[case][\"gauge_id\"] == gauge_id][\"route_id\"]\n",
    "            .values\n",
    "        )  # matching reach ID in river network\n",
    "        seg_ix = np.argwhere(\n",
    "            reachID[case] == seg_id\n",
    "        )  # matching reach index in river network\n",
    "        if len(seg_ix) == 0:\n",
    "            seg_ix = -999\n",
    "        else:\n",
    "            seg_ix = seg_ix[0]\n",
    "        gauge_plot[gname][case] = [gauge_ix, seg_ix, seg_id]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38963b81",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "------\n",
    "## 3. Analysis for 24 large rivers <a id='24_large_rivers'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "813c86e0",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 3.1 Annual flow series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ead9053",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "fig, axes = plt.subplots(8, 3, figsize=(7.25, 11.5))\n",
    "plt.subplots_adjust(\n",
    "    top=0.975, bottom=0.065, right=0.98, left=0.10, hspace=0.225, wspace=0.250\n",
    ")  # create some space below the plots by increasing the bottom-value\n",
    "\n",
    "for ix, river_name in enumerate(big_river_24.keys()):\n",
    "    row = ix // 3\n",
    "    col = ix % 3\n",
    "    for case, meta in case_dic.items():\n",
    "        net_idx = gauge_plot[river_name][case][1]\n",
    "        gaug_idx = gauge_plot[river_name][case][0]\n",
    "\n",
    "        q_name = case_meta[meta[\"grid\"]][\"flow_name\"]\n",
    "\n",
    "        if len(net_idx) == 1:\n",
    "            year_data[case][q_name][:, net_idx].plot(\n",
    "                ax=axes[row, col], linestyle=\"-\", lw=0.75, label=case\n",
    "            )\n",
    "        elif len(net_idx) == 2:  # means 2d grid\n",
    "            year_data[case][q_name][:, net_idx[0], net_idx[1]].plot(\n",
    "                ax=axes[row, col], linestyle=\"-\", lw=0.75, label=case\n",
    "            )\n",
    "    if obs_available:\n",
    "        ds_q_obs_yr.loc[:, gaug_idx].plot(\n",
    "            ax=axes[row, col],\n",
    "            linestyle=\"None\",\n",
    "            marker=\"o\",\n",
    "            markersize=3,\n",
    "            c=\"k\",\n",
    "            label=\"D19\",\n",
    "        )\n",
    "    else:\n",
    "        qob_mean = ds_q_obs_yr_mean.loc[gaug_idx]\n",
    "        qob_max = ds_q_obs_yr_max.loc[gaug_idx]\n",
    "        qob_min = ds_q_obs_yr_min.loc[gaug_idx]\n",
    "        axes[row, col].axhline(\n",
    "            y=qob_mean, color=\"k\", linestyle=\"--\", lw=0.5, label=\"D19 mean (1900-2018)\"\n",
    "        )\n",
    "        axes[row, col].axhspan(\n",
    "            qob_min[0],\n",
    "            qob_max[0],\n",
    "            color=\"gray\",\n",
    "            alpha=0.1,\n",
    "            label=\"D19 min-max (1900-2018)\",\n",
    "        )\n",
    "\n",
    "    axes[row, col].set_title(\"%d %s\" % (ix + 1, river_name), fontsize=8)\n",
    "\n",
    "    axes[row, col].set_xlabel(\"\")\n",
    "    if row < 7:\n",
    "        axes[row, col].set_xticklabels(\"\")\n",
    "    if col == 0:\n",
    "        axes[row, col].set_ylabel(\"Annual flow [m$^3$/s]\", fontsize=8)\n",
    "    else:\n",
    "        axes[row, col].set_ylabel(\"\")\n",
    "    axes[row, col].tick_params(\"both\", labelsize=\"xx-small\")\n",
    "\n",
    "# Legend- make space below the plot-raise bottom. there will be an label below the second last (bottom middle) ax, thanks to the bbox_to_anchor=(x, y) with a negative y-value.\n",
    "axes.flatten()[-2].legend(\n",
    "    loc=\"upper center\",\n",
    "    bbox_to_anchor=(0.125, -0.35, 0.75, 0.1),\n",
    "    ncol=5,\n",
    "    fontsize=\"x-small\",\n",
    ")\n",
    "\n",
    "if figureSave:\n",
    "    plt.savefig(f\"{out_dir}/NB1_Fig1_big_river_annual_{analysis_name}.png\", dpi=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e495343",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 3.2. Annual cycle at monthly step"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5cc2df46",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "fig, axes = plt.subplots(8, 3, figsize=(7.25, 11.5))\n",
    "plt.subplots_adjust(\n",
    "    top=0.975, bottom=0.065, right=0.98, left=0.10, hspace=0.225, wspace=0.250\n",
    ")  # create some space below the plots by increasing the bottom-value\n",
    "\n",
    "for ix, river_name in enumerate(big_river_24.keys()):\n",
    "    row = ix // 3\n",
    "    col = ix % 3\n",
    "    for case, meta in case_dic.items():\n",
    "        net_idx = gauge_plot[river_name][case][1]\n",
    "        gaug_idx = gauge_plot[river_name][case][0]\n",
    "\n",
    "        q_name = case_meta[meta[\"grid\"]][\"flow_name\"]\n",
    "\n",
    "        if len(net_idx) == 1:  # means vector\n",
    "            seas_data[case][q_name][:, net_idx].plot(\n",
    "                ax=axes[row, col], linestyle=\"-\", lw=0.75, label=case\n",
    "            )\n",
    "        elif len(net_idx) == 2:  # means 2d grid\n",
    "            seas_data[case][q_name][:, net_idx[0], net_idx[1]].plot(\n",
    "                ax=axes[row, col], linestyle=\"-\", lw=1.0, label=case\n",
    "            )\n",
    "\n",
    "    if obs_available:\n",
    "        dr_q_obs_seasonal.loc[:, gaug_idx].plot(\n",
    "            ax=axes[row, col],\n",
    "            linestyle=\":\",\n",
    "            lw=0.5,\n",
    "            marker=\"o\",\n",
    "            markersize=2,\n",
    "            c=\"k\",\n",
    "            label=\"D19\",\n",
    "        )\n",
    "\n",
    "    axes[row, col].set_title(\"%d %s\" % (ix + 1, river_name), fontsize=8)\n",
    "    axes[row, col].set_xlabel(\"\")\n",
    "    if row < 7:\n",
    "        axes[row, col].set_xticklabels(\"\")\n",
    "    if col == 0:\n",
    "        axes[row, col].set_ylabel(\"Mon. flow [m$^3$/s]\", fontsize=8)\n",
    "    else:\n",
    "        axes[row, col].set_ylabel(\"\")\n",
    "    axes[row, col].tick_params(\"both\", labelsize=\"xx-small\")\n",
    "\n",
    "# Legend- make space below the plot-raise bottom. there will be an label below the second last (bottom middle) ax, thanks to the bbox_to_anchor=(x, y) with a negative y-value.\n",
    "axes.flatten()[-2].legend(\n",
    "    loc=\"upper center\",\n",
    "    bbox_to_anchor=(0.10, -0.30, 0.75, 0.1),\n",
    "    ncol=5,\n",
    "    fontsize=\"x-small\",\n",
    ")\n",
    "\n",
    "if figureSave:\n",
    "    plt.savefig(f\"{out_dir}/NB1_Fig2_big_river_season_{analysis_name}.png\", dpi=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dc6be1a",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 3.3. scatter plots of monthly flow - obs vs sim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9541a4a6",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "if obs_available:\n",
    "    # Monthly flow scatter plot\n",
    "    fig, axes = plt.subplots(8, 3, figsize=(7.50, 15.00))\n",
    "    plt.subplots_adjust(\n",
    "        top=0.995, bottom=0.075, right=0.995, left=0.1, wspace=0.25, hspace=0.25\n",
    "    )\n",
    "\n",
    "    for ix, river_name in enumerate(big_river_24.keys()):\n",
    "        row = ix // 3\n",
    "        col = ix % 3\n",
    "        axes[row, col].yaxis.set_major_formatter(FormatStrFormatter(\"%.0f\"))\n",
    "\n",
    "        for jx, (case, meta) in enumerate(case_dic.items()):\n",
    "\n",
    "            net_idx = gauge_plot[river_name][case][1]\n",
    "            gaug_idx = gauge_plot[river_name][case][0]\n",
    "\n",
    "            q_name = case_meta[meta[\"grid\"]][\"flow_name\"]\n",
    "\n",
    "            if len(net_idx) == 1:  # means vector\n",
    "                ds_sim = month_data[case][q_name][:, net_idx]\n",
    "            elif len(net_idx) == 2:  # means 2d grid\n",
    "                ds_sim = month_data[case][q_name][:, net_idx[0], net_idx[1]].squeeze()\n",
    "\n",
    "            ds_obs = ds_q_obs_mon.sel(time=meta[\"time_period\"]).loc[:, gaug_idx]\n",
    "\n",
    "            axes[row, col].plot(\n",
    "                ds_obs, ds_sim, \"o\", label=case, markersize=4.0, alpha=0.4\n",
    "            )\n",
    "            if jx == 0:\n",
    "                max_val = np.max(ds_obs)\n",
    "                min_val = np.min(ds_obs)\n",
    "            else:\n",
    "                max_val = np.max([np.max(ds_sim), max_val])\n",
    "                min_val = np.min([np.min(ds_sim), min_val])\n",
    "\n",
    "        axes[row, col].plot(\n",
    "            [min_val * 0.98, max_val * 1.02],\n",
    "            [min_val * 0.98, max_val * 1.02],\n",
    "            \":k\",\n",
    "            linewidth=0.5,\n",
    "        )\n",
    "\n",
    "        axes[row, col].annotate(\n",
    "            \"%d %s\" % (ix + 1, river_name),\n",
    "            xy=(0.05, 0.875),\n",
    "            xycoords=\"axes fraction\",\n",
    "            fontsize=8,\n",
    "            bbox=dict(facecolor=\"white\", edgecolor=\"None\", alpha=0.8),\n",
    "        )\n",
    "        if row == 7 and col == 1:\n",
    "            axes[row, col].set_xlabel(\n",
    "                \"Monthly reference discharge [m$^3$/s]\", fontsize=9\n",
    "            )\n",
    "        else:\n",
    "            axes[row, col].set_xlabel(\"\")\n",
    "\n",
    "        if col == 0 and row == 5:\n",
    "            axes[row, col].set_ylabel(\n",
    "                \"Monthly simulated discharge [m$^3$/s]\", fontsize=10\n",
    "            )\n",
    "        else:\n",
    "            axes[row, col].set_ylabel(\"\")\n",
    "        axes[row, col].tick_params(\"both\", labelsize=\"xx-small\")\n",
    "\n",
    "    axes.flatten()[-2].legend(\n",
    "        loc=\"upper center\",\n",
    "        bbox_to_anchor=(0.10, -0.40, 0.75, 0.1),\n",
    "        ncol=5,\n",
    "        fontsize=\"x-small\",\n",
    "    )\n",
    "\n",
    "    if figureSave:\n",
    "        plt.savefig(\n",
    "            f\"{out_dir}/NB1_Fig3_big_river_month_scatter_{analysis_name}.png\", dpi=200\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99f5ba40",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 3.4. scatter plots of annual flow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5ee5533",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "if obs_available:\n",
    "    # annual flow scatter plot\n",
    "    fig, axes = plt.subplots(8, 3, figsize=(7.50, 15.00))\n",
    "    plt.subplots_adjust(\n",
    "        top=0.995, bottom=0.075, right=0.995, left=0.1, wspace=0.25, hspace=0.25\n",
    "    )\n",
    "\n",
    "    for ix, river_name in enumerate(big_river_24.keys()):\n",
    "        row = ix // 3\n",
    "        col = ix % 3\n",
    "        axes[row, col].yaxis.set_major_formatter(FormatStrFormatter(\"%.0f\"))\n",
    "\n",
    "        for jx, (case, meta) in enumerate(case_dic.items()):\n",
    "            net_idx = gauge_plot[river_name][case][1]\n",
    "            gaug_idx = gauge_plot[river_name][case][0]\n",
    "\n",
    "            q_name = case_meta[meta[\"grid\"]][\"flow_name\"]\n",
    "\n",
    "            if len(net_idx) == 1:  # means vector\n",
    "                ds_sim = year_data[case][q_name][:, net_idx]\n",
    "            elif len(net_idx) == 2:  # means 2d grid\n",
    "                ds_sim = year_data[case][q_name][:, net_idx[0], net_idx[1]]\n",
    "\n",
    "            ds_obs = ds_q_obs_yr.sel(time=meta[\"time_period\"]).loc[:, gaug_idx]\n",
    "\n",
    "            axes[row, col].plot(\n",
    "                ds_obs, ds_sim, \"o\", label=case, markersize=4.0, alpha=0.4\n",
    "            )\n",
    "            if jx == 0:\n",
    "                max_val = np.max(ds_obs)\n",
    "                min_val = np.min(ds_obs)\n",
    "            else:\n",
    "                max_val = np.max([np.max(ds_sim), max_val])\n",
    "                min_val = np.min([np.min(ds_sim), min_val])\n",
    "\n",
    "        axes[row, col].plot(\n",
    "            [min_val * 0.98, max_val * 1.02],\n",
    "            [min_val * 0.98, max_val * 1.02],\n",
    "            \":k\",\n",
    "            linewidth=0.5,\n",
    "        )\n",
    "\n",
    "        axes[row, col].annotate(\n",
    "            \"%d %s\" % (ix + 1, river_name),\n",
    "            xy=(0.05, 0.875),\n",
    "            xycoords=\"axes fraction\",\n",
    "            fontsize=8,\n",
    "            bbox=dict(facecolor=\"white\", edgecolor=\"None\", alpha=0.8),\n",
    "        )\n",
    "        if row == 7 and col == 1:\n",
    "            axes[row, col].set_xlabel(\n",
    "                \"Monthly reference discharge [m$^3$/s]\", fontsize=9\n",
    "            )\n",
    "        else:\n",
    "            axes[row, col].set_xlabel(\"\")\n",
    "\n",
    "        if col == 0 and row == 5:\n",
    "            axes[row, col].set_ylabel(\n",
    "                \"Monthly simulated discharge [m$^3$/s]\", fontsize=10\n",
    "            )\n",
    "        else:\n",
    "            axes[row, col].set_ylabel(\"\")\n",
    "        axes[row, col].tick_params(\"both\", labelsize=\"xx-small\")\n",
    "\n",
    "    axes.flatten()[-2].legend(\n",
    "        loc=\"upper center\",\n",
    "        bbox_to_anchor=(0.10, -0.40, 0.75, 0.1),\n",
    "        ncol=5,\n",
    "        fontsize=\"x-small\",\n",
    "    )\n",
    "\n",
    "    if figureSave:\n",
    "        plt.savefig(\n",
    "            f\"{out_dir}/NB1_Fig4_big_river_annual_scatter_{analysis_name}.png\", dpi=200\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2eac551",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "## 4. Anaysis for Large 50 rivers <a id='50_large_rivers'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a08e80ec",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 4.1 Summary tables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eeec623e",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "table_file = os.path.join(out_dir, f\"large50rivers_{analysis_name}.txt\")\n",
    "with open(table_file, \"w\") as f:\n",
    "    # some meta\n",
    "    f.write(\"# obs: Dai et al.(2019)\\n\")\n",
    "    f.write(\"# vol: km^3/yr\\n\")\n",
    "    f.write(\"# area: km^2\\n\")\n",
    "\n",
    "    # headers\n",
    "    f.write(\"No,          river_name,\")\n",
    "    f.write(\"{0: >15}_vol,\".format(\"obs\"))\n",
    "    for jx, (case, meta) in enumerate(case_dic.items()):\n",
    "        f.write(\"{0: >15}_vol,\".format(meta[\"grid\"]))\n",
    "    f.write(\"{0: >15}_area\".format(\"obs\"))\n",
    "    for jx, (case, meta) in enumerate(case_dic.items()):\n",
    "        f.write(\",\")\n",
    "        f.write(\"{0: >15}_area\".format(meta[\"grid\"]))\n",
    "    f.write(\"\\n\")\n",
    "\n",
    "    # data for each river\n",
    "    for ix, river_name in enumerate(big_river_50.keys()):\n",
    "\n",
    "        f.write(\"%02d,\" % (ix + 1))\n",
    "        f.write(\"{0: >20}\".format(river_name))\n",
    "\n",
    "        for jx, (case, meta) in enumerate(case_dic.items()):\n",
    "            f.write(\",\")\n",
    "            net_idx = gauge_plot[river_name][case][1]\n",
    "            gaug_idx = gauge_plot[river_name][case][0]\n",
    "\n",
    "            q_name = case_meta[meta[\"grid\"]][\"flow_name\"]\n",
    "\n",
    "            if len(net_idx) == 1:  # means vector\n",
    "                qsim = (\n",
    "                    np.nanmean(year_data[case][q_name][:, net_idx].values)\n",
    "                    * 60\n",
    "                    * 60\n",
    "                    * 24\n",
    "                    * 365\n",
    "                    / 10**9\n",
    "                )\n",
    "            elif len(net_idx) == 2:  # means 2d grid\n",
    "                qsim = (\n",
    "                    np.nanmean(\n",
    "                        year_data[case][q_name][:, net_idx[0], net_idx[1]].values\n",
    "                    )\n",
    "                    * 60\n",
    "                    * 60\n",
    "                    * 24\n",
    "                    * 365\n",
    "                    / 10**9\n",
    "                )\n",
    "\n",
    "            if jx == 0:\n",
    "                qobs = (\n",
    "                    ds_q_obs_yr_mean.loc[gaug_idx].values[0]\n",
    "                    * 60\n",
    "                    * 60\n",
    "                    * 24\n",
    "                    * 365\n",
    "                    / 10**9\n",
    "                )\n",
    "                f.write(\"{:19.1f},\".format(qobs))\n",
    "\n",
    "            f.write(\"{:19.1f}\".format(qsim))\n",
    "\n",
    "        for jx, (case, _) in enumerate(case_dic.items()):\n",
    "            f.write(\",\")\n",
    "            gaug_idx = gauge_plot[river_name][case][0]\n",
    "\n",
    "            # just get gauge_id from qs_q for now\n",
    "            gauge_id = ds_q[\"id\"].loc[gaug_idx].values[0]\n",
    "            network_area = gauge_reach_lnk[case][\n",
    "                gauge_reach_lnk[case][\"gauge_id\"] == gauge_id\n",
    "            ][\"route_area\"].values[0]\n",
    "\n",
    "            if jx == 0:\n",
    "                area = ds_q[\"area_stn\"].loc[gaug_idx].values[0]\n",
    "                f.write(\"{:20.0f},\".format(area))\n",
    "\n",
    "            f.write(\"{:20.0f}\".format(network_area))\n",
    "\n",
    "        f.write(\"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe4d9dfe",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 4.2. scatter plot of annual mean flow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "279b5eaf",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "if obs_available:\n",
    "\n",
    "    df_yr_vol = pd.read_csv(\n",
    "        table_file,\n",
    "        skiprows=3,\n",
    "        index_col=[\"No\"],\n",
    "        skipinitialspace=True,\n",
    "    )\n",
    "\n",
    "    fig, axes = plt.subplots(1, figsize=(5.50, 5.50))\n",
    "    regressor = LinearRegression()\n",
    "    bias_text = \"\"\n",
    "\n",
    "    for jx, (case, meta) in enumerate(case_dic.items()):\n",
    "        # compute linear regression\n",
    "        df_reg = df_yr_vol[[\"obs_vol\", f\"{meta['grid']}_vol\"]].dropna()\n",
    "        regressor.fit(df_reg[[\"obs_vol\"]], df_reg[f\"{meta['grid']}_vol\"])\n",
    "        y_pred = regressor.predict(df_reg[[\"obs_vol\"]])\n",
    "\n",
    "        # compute bias over 50 sites\n",
    "        diff = (df_yr_vol[f\"{meta['grid']}_vol\"] - df_yr_vol[\"obs_vol\"]).mean(\n",
    "            axis=0, skipna=True\n",
    "        )\n",
    "\n",
    "        df_yr_vol.plot(\n",
    "            ax=axes,\n",
    "            kind=\"scatter\",\n",
    "            x=\"obs_vol\",\n",
    "            y=f\"{meta['grid']}_vol\",\n",
    "            s=30,\n",
    "            alpha=0.6,\n",
    "            label=meta[\"grid\"],\n",
    "        )\n",
    "        # plt.plot(df_reg['obs_vol'], y_pred, color=color)\n",
    "        bias_text = bias_text + f\"\\n{meta['grid']}: {diff:.1f} \"\n",
    "\n",
    "    plt.text(\n",
    "        0.65, 0.30, \"bias [km3/yr]\", transform=axes.transAxes, verticalalignment=\"top\"\n",
    "    )\n",
    "    plt.text(\n",
    "        0.65, 0.30, f\"{bias_text} \", transform=axes.transAxes, verticalalignment=\"top\"\n",
    "    )\n",
    "\n",
    "    plt.axline((0, 0), slope=1, linestyle=\"--\", color=\"black\")\n",
    "    axes.set_xscale(\"log\")\n",
    "    axes.set_yscale(\"log\")\n",
    "    axes.set_xlabel(\"reference flow\")\n",
    "    axes.set_ylabel(\"Simulated flow\")\n",
    "    axes.set_title(\"River Flow at stations [km^3/yr]\")\n",
    "\n",
    "    if figureSave:\n",
    "        plt.savefig(\n",
    "            f\"{out_dir}/NB1_Fig5_50big_river_annual_flow_scatter_{analysis_name}.png\",\n",
    "            dpi=200,\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe5b4834",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "------\n",
    "## 5. Anaysis for all 922 sites <a id='922_rivers'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8785758",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 5.1 Compute metris at all the sites (no plots nor tables)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "147e11f7",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "if obs_available:\n",
    "    # Merge gauge_reach lnk (dataframe) into gauge shapefile\n",
    "    gauge_shp1 = {}\n",
    "    for case, df in gauge_reach_lnk.items():\n",
    "        # df = df.loc[(df['flag'] == 0)]\n",
    "        df1 = df.drop(columns=[\"riv_name\"])\n",
    "        gauge_shp1[case] = pd.merge(\n",
    "            gauge_shp, df1, how=\"inner\", left_on=\"id\", right_on=\"gauge_id\"\n",
    "        )\n",
    "\n",
    "    # compute %bias, correlation, and RMSE at each site based on monthly\n",
    "    mon_pbias = {}\n",
    "    mon_corr = {}\n",
    "    mon_rmse = {}\n",
    "\n",
    "    for case, meta in case_dic.items():\n",
    "        bias = np.full(len(gauge_shp1[case]), np.nan, dtype=float)\n",
    "        corr = np.full(len(gauge_shp1[case]), np.nan, dtype=float)\n",
    "        rmse = np.full(len(gauge_shp1[case]), np.nan, dtype=float)\n",
    "\n",
    "        q_name = case_meta[meta[\"grid\"]][\"flow_name\"]\n",
    "\n",
    "        for ix, row in gauge_shp1[case].iterrows():\n",
    "            q_obs = np.full(\n",
    "                meta[\"climo_nyrs\"] * 12, np.nan, dtype=float\n",
    "            )  # dummy q_sim that will be replaced by actual data if exist\n",
    "            q_sim = np.full(\n",
    "                meta[\"climo_nyrs\"] * 12, np.nan, dtype=float\n",
    "            )  # dummy q_sim that will be replaced by actual data if exist\n",
    "\n",
    "            route_id = row[\"route_id\"]\n",
    "            gauge_id = row[\"gauge_id\"]\n",
    "\n",
    "            gauge_ix = np.argwhere(ds_q.id.values == gauge_id)\n",
    "            if len(gauge_ix) == 1:\n",
    "                gauge_ix = gauge_ix[0]\n",
    "                q_obs = ds_q_obs_mon[:, gauge_ix].squeeze().values\n",
    "\n",
    "            route_ix = np.argwhere(reachID[case] == route_id)\n",
    "            if (\n",
    "                len(route_ix) == 1\n",
    "            ):  # meaning there is flow site in network and simulation exist at this site\n",
    "                route_ix = route_ix[0]\n",
    "                if len(route_ix) == 1:  # means vector\n",
    "                    q_sim = month_data[case][q_name][:, route_ix].squeeze().values\n",
    "                elif len(route_ix) == 2:  # means 2d grid\n",
    "                    q_sim = (\n",
    "                        month_data[case][q_name][:, route_ix[0], route_ix[1]]\n",
    "                        .squeeze()\n",
    "                        .values\n",
    "                    )\n",
    "\n",
    "                # compute %bias, correlation, RMSE\n",
    "                bias[ix] = metric.pbias(q_sim, q_obs)\n",
    "                corr[ix] = metric.corr(q_sim, q_obs)\n",
    "                rmse[ix] = metric.rmse(qsim, qobs)\n",
    "\n",
    "        mon_pbias[case] = bias\n",
    "        mon_corr[case] = corr\n",
    "        mon_rmse[case] = rmse\n",
    "\n",
    "        gauge_shp1[case][f\"bias_{meta['grid']}\"] = bias\n",
    "        gauge_shp1[case][f\"corr_{meta['grid']}\"] = corr\n",
    "        gauge_shp1[case][f\"rmse_{meta['grid']}\"] = rmse"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0e0bfc5",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 5.2. Spatial metric map "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c0163ac0",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "if obs_available:\n",
    "    # some local plot setups\n",
    "    cbar_kwrgs = {\n",
    "        \"bias\": {\n",
    "            \"shrink\": 0.9,\n",
    "            \"pad\": 0.02,\n",
    "            \"orientation\": \"horizontal\",\n",
    "            \"extend\": \"both\",\n",
    "        },\n",
    "        \"corr\": {\n",
    "            \"shrink\": 0.9,\n",
    "            \"pad\": 0.02,\n",
    "            \"orientation\": \"horizontal\",\n",
    "            \"extend\": \"min\",\n",
    "        },\n",
    "        \"rmse\": {\n",
    "            \"shrink\": 0.9,\n",
    "            \"pad\": 0.02,\n",
    "            \"orientation\": \"horizontal\",\n",
    "            \"extend\": \"max\",\n",
    "        },\n",
    "    }\n",
    "\n",
    "    meta = {\n",
    "        \"bias\": {\n",
    "            \"name\": \"%bias\",\n",
    "            \"vmin\": -100,\n",
    "            \"vmax\": 100,\n",
    "            \"cm\": colors.cmap_RedGrayBlue,\n",
    "        },\n",
    "        \"corr\": {\"name\": \"correlation\", \"vmin\": 0.2, \"vmax\": 1, \"cm\": colors.cmap_corr},\n",
    "        \"rmse\": {\"name\": \"RMSE\", \"vmin\": 0, \"vmax\": 1000, \"cm\": mpl.cm.turbo},\n",
    "    }\n",
    "\n",
    "    for error_metric in [\"rmse\", \"bias\", \"corr\"]:\n",
    "        for case, meta in case_dic.items():\n",
    "            fig, ax = plt.subplots(\n",
    "                1, figsize=(7.5, 4.0), subplot_kw={\"projection\": ccrs.PlateCarree()}\n",
    "            )\n",
    "\n",
    "            ax.add_feature(\n",
    "                rivers_50m, facecolor=\"None\", edgecolor=\"b\", lw=0.5, alpha=0.3\n",
    "            )\n",
    "            ax.add_feature(land, facecolor=\"white\", edgecolor=\"grey\")\n",
    "\n",
    "            gauge_shp1[case].plot(\n",
    "                ax=ax,\n",
    "                column=f\"{error_metric}_{meta['grid']}\",\n",
    "                markersize=10,\n",
    "                cmap=meta[error_metric][\"cm\"],\n",
    "                vmin=meta[error_metric][\"vmin\"],\n",
    "                vmax=meta[error_metric][\"vmax\"],\n",
    "            )\n",
    "\n",
    "            ax.set_title(\"%s %s\" % (case, meta[error_metric][\"name\"]))\n",
    "\n",
    "            points = ax.collections[-1]\n",
    "            plt.colorbar(points, ax=ax, **cbar_kwrgs[error_metric])\n",
    "\n",
    "            plt.tight_layout()\n",
    "\n",
    "            if figureSave:\n",
    "                plt.savefig(\n",
    "                    f\"{out_dir}/NB1_Fig6_{error_metric}_{case}_map.png\", dpi=200\n",
    "                )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d09df7cc",
   "metadata": {
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "tags": []
   },
   "source": [
    "### 5.4 Boxplots of Error metrics (RMSE, %bias, and correlation coefficient)\n",
    "Boxplot distribution is based on metrics sampled at 922 sites.\n",
    "\n",
    "The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). \n",
    "The whiskers extend from the edges of box to show the range of the data. \n",
    "By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. \n",
    "Outliers are plotted as separate dots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "512ff9cd",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "if obs_available:\n",
    "    boxprops = {\"linestyle\": \"-\", \"linewidth\": 1.5, \"color\": \"blue\"}\n",
    "    medianprops = {\"linestyle\": \"-\", \"linewidth\": 1.5, \"color\": \"red\"}\n",
    "\n",
    "    for error_metric in [\"rmse\", \"bias\", \"corr\"]:\n",
    "        column_stat = []\n",
    "        gauge_shp_all_case = gauge_shp.copy(deep=True)\n",
    "        for case, meta in case_dic.items():\n",
    "            gauge_shp_all_case = gauge_shp_all_case.merge(\n",
    "                gauge_shp1[case][[\"id\", f\"{error_metric}_{meta['grid']}\"]],\n",
    "                left_on=\"id\",\n",
    "                right_on=\"id\",\n",
    "            )\n",
    "            column_stat.append(f\"{error_metric}_{meta['grid']}\")\n",
    "        fig, ax = plt.subplots(1, figsize=(6.5, 4))\n",
    "        gauge_shp_all_case.boxplot(\n",
    "            ax=ax,\n",
    "            column=column_stat,\n",
    "            boxprops=boxprops,\n",
    "            medianprops=medianprops,\n",
    "            sym=\".\",\n",
    "        )\n",
    "\n",
    "        xticklabels = [label[len(error_metric) + 1 :] for label in column_stat]\n",
    "        ax.set_xticklabels(xticklabels)\n",
    "\n",
    "        if error_metric == \"rmse\":\n",
    "            ax.set_ylim([0, 1000])\n",
    "            ax.set_title(\"RMSE [m3/s]\")\n",
    "        elif error_metric == \"bias\":\n",
    "            ax.set_ylim([-150, 250])\n",
    "            ax.set_title(\"%bias [%]\")\n",
    "        elif error_metric == \"corr\":\n",
    "            ax.set_ylim([-0.2, 1])\n",
    "            ax.set_title(\"correlation\")\n",
    "\n",
    "        if figureSave:\n",
    "            plt.savefig(f\"{out_dir}/NB1_Fig7_{error_metric}_boxplot.png\", dpi=150)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50c45b9c",
   "metadata": {
    "editable": true,
    "papermill": {
     "duration": null,
     "end_time": null,
     "exception": null,
     "start_time": null,
     "status": "pending"
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "if client and not cupid_run:\n",
    "    client.shutdown()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cupid-analysis",
   "language": "python",
   "name": "cupid-analysis"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  },
  "papermill": {
   "duration": 11.5344,
   "end_time": "2025-11-25T01:42:57.037106",
   "exception": true,
   "input_path": "/glade/derecho/scratch/richling/tmp/tmp6czgsu6m.ipynb",
   "output_path": "/glade/work/richling/cupid-test-2/CUPiD/examples/key_metrics/computed_notebooks/rof/global_discharge_gauge_compare_obs.ipynb",
   "parameters": {
    "CESM_output_dir": "/glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing",
    "analysis_name": "",
    "base_case_name": "b.e30_alpha07c_cesm.B1850C_LTso.ne30_t232_wgx3.228",
    "base_case_nickname": "BLT1850_228",
    "base_climo_end_year": 45,
    "base_climo_start_year": 36,
    "base_end_date": "0045-01-01",
    "base_start_date": "0001-01-01",
    "case_name": "b.e30_alpha07c_cesm.B1850C_LTso.ne30_t232_wgx3.232",
    "case_nickname": "BLT1850_232",
    "climo_end_year": 21,
    "climo_nyears": 10,
    "climo_start_year": 11,
    "end_date": "0021-01-01",
    "figureSave": false,
    "grid_name": "f09_f09_mosart",
    "hist_str": "h0a",
    "lc_kwargs": {
     "threads_per_worker": 1
    },
    "obs_data_dir": "/glade/campaign/cesm/development/cross-wg/diagnostic_framework/CUPiD_obs_data",
    "product": "/glade/work/richling/cupid-test-2/CUPiD/examples/key_metrics/computed_notebooks//rof/global_discharge_gauge_compare_obs.ipynb",
    "serial": false,
    "start_date": "0001-01-01",
    "subset_kwargs": {},
    "ts_dir": null
   },
   "start_time": "2025-11-25T01:42:45.502706"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}