Visualizing Ice Core Datasets in Python Using a Jupyter Notebook Template#

Authors#

Author1 = {“name”: “L. Markowsky”, “affiliation”: “Kennesaw State University”, “email”: “lmarkowsky@gmail.com”}
Author2 = {“name”: “Jessica Scheick”, “affiliation”: “University of New Hampshire”, “email”: “jbscheick@gmail.com”, “orcid”: “0000-0002-3421-4459”}

1 Visualizing Ice Core Datasets in Python Using a Jupyter Notebook Template
2 Setup
- 2.1 Library import
3 Source data
- 3.1 NEEM ion concentration data in 10 yr resolution
- 3.2 Aerosol source concentrations and atmospheric lifetime in 10 yr resolution
4 Data processing and analysis
5 References

Purpose#

Chemical constituents measured in ice cores are often visualized in a standard format, with figures produced by proprietary or closed tools. Such tools present an extra step that requires a context switch in the process of analyzing and visualizing results. Python and its many machine learning, numerical computation, and visualization libraries together with Jupyter notebooks offer a unified alternative that permits researchers to analyze and visualize ice core datasets in a single, highly-integrated ecosystem. This Jupyter notebook uses exclusively open-source Python libraries (specifically NumPy, Matplotlib, and Pandas) to create a readily reproducible, publication quality figure. We demonstrate the utility of the notebook as a template for ice core researchers engaging in open science by recreating multiple figures published by Schupbach et al. in Nature Communications, 16 April 2018. This notebook provides a step-by-step guide to systematically recreate two figures using Schupbach’s original data.

Technical contributions#

This notebook presents an annotated, step-by-step template to create publication-quality figures presenting ice core chemistry data using open-source Python libraries.
Two figures from an article by Schupbach et al. that appeared in Nature Communications, 16 April 2018, are reproduced to demonstrate how to use Python to visualize ice core data.
Part 1: First, as closely as possible, we will reproduce Figure 2 as published in Nature using the datasets provided by the authors.
Part 2: Then, we will use Part 1 as a model to reproduce Figure 1 (subplots a-e only).

Methodology#

A typical visualization of ice core data is shown, step-by-step, with explanations. Briefly, the steps in the process are:

Prepare the data and initialize a figure with four empty axes.
Move the y-axis labels and tick marks of every other plot to the right-hand side.
Remove extraneous spines (lines).
Draw the tick marks inside the plot area.
Plot the data in a light shade of each selected color.
Use a log scale for the y-axis of slected plots.
Plot a running average of the data using a darker shade of each selected color.
Add horizontal dashed lines to show the early Holocene average.
Plot lines in the final subplot, and create a legend to label them.
Draw vertical gridlines common to all plots in the figure.
Adjust the axes (remove horizontal space, tweak tick settings) to match the published figure.
Draw a depth scale along the top of the figure.
Remove extraneous spines (lines) created by step 12.
Plot gray bands in subplots a-c.
Adjust the limits of the y-axes to match the published figure.
Add annotations to the figure.
Adjust the figure size and the font size of the labels on the x-axes, y-axes, and legend.
Save the figure to a file.

The code used to produce the facimile of Figure 2 (Part 1) is then used as a template to produce a facimile of Figure 1 (subplots a-e) (Part 2).

Results#

Part 1#

Figure 2 (as published in Nature Communications [4])	The resulting facimile of Figure 2

Part 2#

Figure 1 (as published in Nature Communications [4])	The template deleveloped in Part 1 was adapted to produce the following facimile of Figure 1 (subplots a-e only)

Keywords#

keywords=[“ice core”, “visualization”, “climate”, “climate change”, “reproducible”]

Citation#

Markowsky, L., and Scheick, J. “Visualizing Ice Core Datasets in Python Using a Jupyter Notebook Template,” 2022.

Future work#

The authors’ future work based on this notebook will be to modularize the code and then to:

Create a Jupyter book, with the introductory material (subsections 4.1 and 4.2) expanded and separated into a separate notebook. This Jupyter book will be designed for use by novice Python programmers and especially crafted for use in a classroom setting.
Develop and release a Python package, designed for use by ice core scientists wishing to use a more flexible, or simply a Python-based, tool to create this well-known type of visualization of chemical constituents in ice core datasets.

Acknowledgements#

The authors thank the organizers of the NSF 2019 Arctic Data Workshop, held at the University of Maine, where a preliminary version of this notebook was presented. The authors also thank the reviewers for their helpful remarks.

Copyright and License#

Setup#

Library import#

# Data manipulation
import numpy as np
import pandas as pd

# Visualizations
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns

Source data#

All files (paper in PDF format, datasets, figures, and supplementary information) were downloaded from www.nature.com on 30 March 2019. [4] The original datasets were provided in the form of two Excel files, which were then saved as CSV text files using LibreOffice Calc. In this notebook, the data are imported and prepared from both the Excel and CSV text files in subsections 4.1.4 (Exploring data with Pandas) and 4.2.1 (Step 1: Prepare the data and initialize the figure with four empty axes).

Descriptions of the two datasets, reproduced in the following two subsections, are included at the top of Schupbach et al.’s original Excel files.

References for the two datasets:

Schüpbach et al., Greenland records of aerosol source and atmospheric lifetime changes from the Eemian to the Holocene, Nature Communications, 2018. https://www.nature.com/articles/s41467-018-03924-3#Sec11
Rasmussen et al., A first chronology for the North Greenland Eemian Ice Drilling (NEEM) ice core. Climate of the Past 9, 2713–2730, doi:10.5194/cp-9-2713-2013 (2013).

NEEM ion concentration data in 10 yr resolution#

Column	Description of Data
A	NEEM depth in m below the surface of top of 10 yr interval
B	NEEM depth in m below the surface of bottom of 10 yr interval
C	NEEM depth in m below the surface of middle of 10 yr interval
D	GICC05 age of the top of the 10 yr interval applied to the NEEM ice core (Rasmussen et al., 2013) in years before present (where present is defined as 1950).
E	GICC05 age of the bottom of the 10 yr interval applied to the NEEM ice core (Rasmussen et al., 2013) in years before present (where present is defined as 1950).
F	GICC05 age of the middle of the 10 yr interval applied to the NEEM ice core (Rasmussen et al., 2013) in years before present (where present is defined as 1950).
G	mean NH4+ ice concentration of each 10 yr interval in ng/g (missing values indicated by NaN)
H	mean NO3- ice concentration of each 10 yr interval in ng/g (missing values indicated by NaN)
I	mean Na+ ice concentration of each 10 yr interval in ng/g (missing values indicated by NaN)
J	mean Ca2+ ice concentration of each 10 yr interval in ng/g (missing values indicated by NaN)
K	meanSO42- ice concentration of each 10 yr interval in ng/g (missing values indicated by NaN)

Aerosol source concentrations and atmospheric lifetime in 10 yr resolution#

Column	Description of Data
A	NEEM depth in m below the surface of top of 10 yr interval
B	NEEM depth in m below the surface of bottom of 10 yr interval
C	NEEM depth in m below the surface of middle of 10 yr interval
D	GICC05 age of the top of the 10 yr interval applied to the NEEM ice core (Rasmussen et al., 2013) in years before present (where present is defined as 1950).
E	GICC05 age of the bottom of the 10 yr interval applied to the NEEM ice core (Rasmussen et al., 2013) in years before present (where present is defined as 1950).
F	GICC05 age of the middle of the 10 yr interval applied to the NEEM ice core (Rasmussen et al., 2013) in years before present (where present is defined as 1950).
G	mean of NH4+ aerosol concentration at the source for each 10 yr interval in ng/m3 (missing values indicated by NaN)
H	mean + 1 sigma of NH4+ aerosol concentration at the source for each 10 yr interval in ng/m3 (missing values indicated by NaN)
I	mean - 1 sigma of NH4+ aerosol concentration at the source for each 10 yr interval in ng/m3 (missing values indicated by NaN)
J	atmospheric residence time of NH4+ for each 10 yr interval in days
K	mean of Na+ aerosol concentration at the source for each 10 yr interval in ng/m3 (missing values indicated by NaN)
L	mean + 1 sigma of Na+ aerosol concentration at the source for each 10 yr interval in ng/m3 (missing values indicated by NaN)
M	mean - 1 sigma of Na+ aerosol concentration at the source for each 10 yr interval in ng/m3 (missing values indicated by NaN)
N	atmospheric residence time of Na+ for each 10 yr interval in days
O	mean of Ca2+ aerosol concentration at the source for each 10 yr interval in ng/m3 (missing values indicated by NaN)
P	mean + 1 sigma of Ca2+ aerosol concentration at the source for each 10 yr interval in ng/m3 (missing values indicated by NaN)
Q	mean - 1 sigma of Ca2+ aerosol concentration at the source for each 10 yr interval in ng/m3 (missing values indicated by NaN)
R	atmospheric residence time of Ca2+ for each 10 yr interval in days

Data Processing and Analysis#

A brief introduction to the Python data science ecosystem#

Why Python?#

Why spend the time to learn a new language when < R/Excel/MATLAB > gets the job done?

This is a frequently asked question, even among software developers, as it is often easier to use a language or tool that you already know well. Sometimes, however, a new language or tool can offer benefits not available or not easy to achieve otherwise.

Some of Python’s benefits#

The Python ecosystem, and in particular, the Python data science stack, has much to offer:

The Python data science stack consists of mature, well-developed and well-documented libraries for analyzing and visualizing data.
The Python data science stack is free, open-source, and cross-platform.
Although Python, together with its libraries, is an example of a very high level language, well-written programs in the Python ecosystem can have very good performance.
Programming in Python yields clean, readable, highly maintainable and reusable source code.
Python’s large, friendly user community makes it easy to find answers to common problems online.
Python has become the most widely used programming language in data science and machine learning.

An overview of the Python data science stack (with links to resources)#

In addition to Python itself, the tools and libraries often used by data scientists include Jupyter Notebooks, IPython, NumPy, Pandas, Matplotlib, Seaborn, SciPy, Scikit-Learn, and Statsmodels.

Python is an easy-to-learn, very high level language with seemingly limitless libraries for every possible purpose. To get started with Python:
- Python Software Foundation. The Python Tutorial \(-\) Python 3.10.4 Documentation, 2001-2022. https://docs.python.org/3/tutorial/index.html
- VanderPlas, Jake. A Whirlwind Tour of Python, August 2016. https://jakevdp.github.io/WhirlwindTourOfPython/ (book in html) and https://github.com/jakevdp/WhirlwindTourOfPython (full text in Jupyter notebooks)
Jupyter Notebooks and IPython provide interactive computing and software development environments that encourage an execute-explore workflow as an alternative to the edit-compile-run workflow that long dominated computing. Jupyter notebooks, in particular, are widely used in the data science community to share code, text, and output. Kernels are available to support a variety of languages, including Python, R, C++, Scilab, and MATLAB. The name Jupyter refers to the programming languages Julia, Python and R, and is also a reference to Galileo’s notes on his observations of Jupiter and four of its moons. Resources for Jupyter and IPython are abundant, including:
- Jupyter/IPython notebook quick start guide: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html
- Jupyter notebook user documentation: https://jupyter-notebook.readthedocs.io/en/stable/notebook.html
- VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data, Chapter 1: IPython: Beyond Normal Python, November 2016. https://jakevdp.github.io/PythonDataScienceHandbook/01.00-ipython-beyond-normal-python.html (in html) and https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.00-IPython-Beyond-Normal-Python.ipynb (as a Jupyter notebook)
- A curated collection of notable Jupyter/IPython notebooks: https://github.com/jupyter/jupyter/wiki
- An example of a paper presented as a notebook: https://nbviewer.jupyter.org/github/cossatot/lanf_earthquake_likelihood/blob/master/notebooks/lanf_manuscript_notebook.ipynb
NumPy, a contraction of Numerical Python, provides an efficient multi-dimensional array object called an ndarray. NumPy features fast, elementwise array operations, functions to read/write array-based datasets from/to disk, linear algebra operations, and an API to allow program access to ndarray objects from C/C++ code. To get started with NumPy:
- NumPy Developers. NumPy: the absolute basics for beginners, 2008-2022. https://numpy.org/doc/stable/user/absolute_beginners.html
- VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data, Chapter 2: Introduction to NumPy, November 2016. https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html (in html) and https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.00-Introduction-to-NumPy.ipynb (as a Jupyter notebook)
Pandas, a contraction of Panel Data (an econometrics term for multidimensional datasets), is the heart of much data analysis today. Pandas is centered around two data structures: (1) DataFrame: a tabular, column-oriented data structure; and (2) Series: a one-dimensional labeled array object. Pandas provides data preparation and cleaning, data manipulation capabilities similar to SQL, aggregation, subset selection, and sophisticated indexing. To get started with Pandas:
- Pandas Getting Started Tutorials: https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/index.html
- 10 Minutes to Pandas: https://pandas.pydata.org/docs/user_guide/10min.html
- Pandas Cookbook: http://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html
- VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data, Chapter 3: Data Manipulation with Pandas, November 2016. https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html (in html) and https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.00-Introduction-to-Pandas.ipynb (as a Jupyter notebook)
Matplotlib and Seaborn are two widely-used Python plotting libraries. Both produce publication quality plots. Seaborn is built on Matplotlib and may be easier to use. Matplotlib, however, provides lower-level control and may be preferable when a specific rendering is required. To get started:
- Matplotlib Tutorial: https://matplotlib.org/stable/tutorials/index.html
- Matplotlib Examples: https://matplotlib.org/stable/gallery/index.html
- VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data, Chapter 4: Visualization with Matplotlib, November 2016. https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html (in html) and https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/04.00-Introduction-To-Matplotlib.ipynb (as a Jupyter notebook)
- Seaborn User Guide and Tutorial: https://seaborn.pydata.org/tutorial.html
- Seaborn Example Gallery: https://seaborn.pydata.org/examples/index.html
SciPy, a collection of scientific computing packages, includes scipy.spatial for spatial data structures and algorithms and scipy.stats for statistics. To get started with SciPy:
- SciPy User Guide: https://docs.scipy.org/doc/scipy/tutorial/index.html#user-guide
Scikit-Learn and Statsmodels are used for machine learning and statistical analysis. Scikit-Learn, a machine learning toolkit, provides functions for classification, regression, clustering, dimensionality reduction, and model selection. The website for Statsmodels, a statistical analysis package, shows numerous examples, including: linear regression models, robust regression, and time series analysis. To get started:
- VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data, Chapter 5: Machine Learning, November 2016. https://jakevdp.github.io/PythonDataScienceHandbook/05.00-machine-learning.html (in html) and https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.00-Machine-Learning.ipynb (as a Jupyter notebook)
- Scikit-Learn’s home page: https://scikit-learn.org/stable/
- Statsmodels: Getting Started: https://www.statsmodels.org/stable/gettingstarted.html
- Statsmodels Examples: https://www.statsmodels.org/stable/examples/index.html

A first look at NumPy, Pandas, and Matplotlib#

Exploring a namespace#

We have already imported NumPy and Pandas using standard naming conventions, in part duplicated here for convenience:

import numpy as np
import pandas as pd

Once a library has been imported, tab completion can be used in a Jupyter notebook code cell to see everything that’s included in a particular library. To explore the NumPy namespace, uncomment the line in the next code cell, and then enter a tab at the end of the line to view a dropdown box of all constants and functions available to the user. (The line in the following cell is commented out simply to enable the notebook to run without errors.)

For example, in the NumPy namespace, enter a \(<\)tab\(>\) after np. to see the dropdown menu. To rapidly look for all names in the namespace beginning with the letter ‘p’, type ‘p’. The constant ‘pi’ can then be selected, and as usual, Ctrl-Enter will execute the cell, showing the value of \(\pi\), a global constant in the NumPy namespace.

#np.

As an example of a function, enter a \(<\)tab\(>\) after np., type ‘cor’ to show everything in the namespace beginning with ‘cor’, select ‘correlate’ from the dropdown menu, and finally Ctrl-Enter to execute the cell to see the value of correlate, a function implemented in NumPy.

#np.

Accessing built-in help#

Appending ? or ?? to a function name returns built-in help:

? shows the Python docstring
?? shows the source code (which includes the docstring at the top)

The docstring includes an explanation of each parameter and the return value, if any, and may also examples of the function’s use, as seen by executing the cells below.

np.correlate?

np.correlate??

Three fundamental NumPy and Pandas data structures: ndarray, Series, and DataFrame#

The basic data type in NumPy is the n-dimensional array, or ndarray. The values in the array are all of the same type.

np.ndarray?

The basic data types in Pandas are Series and DataFrame. The Series object is like an ndarray, only with an index (which could be names or other identifying information about the array).

pd.Series?

The DataFrame holds a table of values, where each column is a Series. While each column has a particular type, different columns may have different types. Like indices, columns can be named to make it easier to reference certain ones.

pd.DataFrame?

Exploring an ice core dataset with Pandas#

Calling the function pd.read_csv(<filename>) reads a CSV text file into a DataFrame.

In the following cell, sep=',' is superfluous (because a comma is the default separator), but the same function can read tab-separated values by specifying sep='\t'.

The skiprows= keyword lets Pandas know to not read in the first x rows of the file. This is necessary when those first rows contain metadata or other non-tabulated data. Here we have manually determined how many rows need to be skipped and entered those values.

# Read data from two csv files into two DataFrames.
supp_data_1 = pd.read_csv('data/supplementary-data-1.csv', sep=',', skiprows=18)
supp_data_2 = pd.read_csv('data/supplementary-data-2.csv', sep=',', skiprows=27,
                          na_values=['NaN', '#VALUE!'])

In Python, missing data is often indicated by the None, or in the case of missing numerical data, by NaN, which stands for Not a Number. In the following (abbreviated) table of values, NaN is used for the missing data, which appeared as both NaN and #VALUE! entries in Schupbach et al.’s original datasets.

supp_data_1

	depth_top (m)	depth_bott (m)	depth_mid (m)	ageGICC05ext_top (yr BP)	ageGICC05ext_bott (yr BP)	ageGICC05_mid (yr BP)	NH4 (ng/g)	NO3 (ng/g)	Na (ng/g)	Ca (ng/g)	SO4 (ng/g)
0	1135.6400	1136.6575	1136.1488	7185.0000	7194.9119	7189.9559	2.26	70.69	19.69	8.58	NaN
1	1136.6575	1137.6295	1137.1435	7194.9119	7205.0488	7199.9803	0.74	50.73	15.51	2.95	NaN
2	1137.6295	1138.5920	1138.1108	7205.0488	7215.0036	7210.0262	2.36	64.50	13.24	4.98	NaN
3	1138.5920	1139.5085	1139.0503	7215.0036	7225.0018	7220.0027	3.51	81.57	11.41	4.71	NaN
4	1139.5085	1140.4435	1139.9760	7225.0018	7235.1555	7230.0787	3.05	76.25	9.96	5.06	NaN
...	...	...	...	...	...	...	...	...	...	...	...
11441	2431.6630	2431.7880	2431.7255	128575.0400	128585.0400	128580.0400	13.69	85.04	9.82	11.09	102.5
11442	2431.7880	2431.9130	2431.8505	128585.0400	128595.0400	128590.0400	15.02	86.84	9.86	13.13	104.5
11443	2431.9130	2432.0460	2431.9795	128595.0400	128605.1227	128600.0814	12.10	82.00	10.12	16.29	139.8
11444	2432.0460	2432.1535	2432.0998	128605.1227	128612.5026	128608.8127	14.65	86.12	12.36	28.48	197.3
11445	2432.1535	2432.2610	2432.2073	128612.5026	128619.8825	128616.1926	14.34	85.44	13.36	17.05	NaN

11446 rows × 11 columns

The data were initially downloaded as Excel files. We could alternatively read the Excel file directly into a Pandas dataframe without first converting the Excel file to an intermediate CSV text file. Notice the slightly different keyword inputs required for this function.

supp_data_1_xlsx = pd.read_excel('data/supplementary-data-1.xlsx',
                                 sheet_name='Blatt1', skiprows=18)
supp_data_1_xlsx

	depth_top (m)	depth_bott (m)	depth_mid (m)	ageGICC05ext_top (yr BP)	ageGICC05ext_bott (yr BP)	ageGICC05_mid (yr BP)	NH4 (ng/g)	NO3 (ng/g)	Na (ng/g)	Ca (ng/g)	SO4 (ng/g)
0	1135.6400	1136.6575	1136.14875	7185.000000	7194.911859	7189.955929	2.256877	70.691648	19.686037	8.580064	NaN
1	1136.6575	1137.6295	1137.14350	7194.911859	7205.048775	7199.980317	0.744581	50.732062	15.510608	2.945996	NaN
2	1137.6295	1138.5920	1138.11075	7205.048775	7215.003636	7210.026206	2.363056	64.503053	13.243701	4.984971	NaN
3	1138.5920	1139.5085	1139.05025	7215.003636	7225.001818	7220.002727	3.507831	81.572921	11.413816	4.707585	NaN
4	1139.5085	1140.4435	1139.97600	7225.001818	7235.155487	7230.078653	3.045829	76.253548	9.964454	5.062247	NaN
...	...	...	...	...	...	...	...	...	...	...	...
11441	2431.6630	2431.7880	2431.72550	128575.040000	128585.040000	128580.040000	13.694443	85.040346	9.820952	11.091903	102.5
11442	2431.7880	2431.9130	2431.85050	128585.040000	128595.040000	128590.040000	15.020880	86.841639	9.855394	13.127113	104.5
11443	2431.9130	2432.0460	2431.97950	128595.040000	128605.122732	128600.081366	12.096210	81.999379	10.119022	16.292825	139.8
11444	2432.0460	2432.1535	2432.09975	128605.122732	128612.502632	128608.812682	14.645366	86.115549	12.358430	28.478098	197.3
11445	2432.1535	2432.2610	2432.20725	128612.502632	128619.882531	128616.192581	14.344681	85.439942	13.355068	17.046181	NaN

11446 rows × 11 columns

We can gather information about our data using some of Panda’s methods.

supp_data_1.shape

(11446, 11)

supp_data_1.columns

Index(['depth_top (m)', 'depth_bott (m)', 'depth_mid (m)',
       'ageGICC05ext_top (yr BP)', 'ageGICC05ext_bott (yr BP)',
       'ageGICC05_mid (yr BP)', 'NH4 (ng/g)', 'NO3 (ng/g)', 'Na (ng/g)',
       'Ca (ng/g)', 'SO4 (ng/g)'],
      dtype='object')

supp_data_2.shape

(11446, 18)

supp_data_2.columns

Index(['depth_top (m)', 'depth_bott (m)', 'depth_mid (m)',
       'ageGICC05ext_top (yr BP)', 'ageGICC05ext_bott (yr BP)',
       'ageGICC05_mid (yr BP)', 'NH4source (ng/m3)',
       'NH4source+1sigma (ng/m3)', 'NH4source-1sigma (ng/m3)',
       'NH4 atm. residence time (days)', 'Nasource (ng/m3)',
       'Nasource+1sigma (ng/m3)', 'Nasource-1sigma (ng/m3)',
       'Na atm. residence time (days)', 'Casource (ng/m3)',
       'Casource+1sigma (ng/m3)', 'Casource-1sigma (ng/m3)',
       'Ca atm. residence time (days)'],
      dtype='object')

Selecting a subset of columns produces a view into the DataFrame. This can make large DataFrames easier to work with. For efficiency, nothing is copied.

selectSuppDataCols = supp_data_1[['depth_mid (m)', 'ageGICC05_mid (yr BP)',
                                  'NH4 (ng/g)', 'NO3 (ng/g)', 'Na (ng/g)',
                                  'Ca (ng/g)', 'SO4 (ng/g)']]

A powerful tool in Pandas is called grouping. Grouping allows us to bin data and calculate statistics about each group. Here we reduce the number of rows by grouping samples taken in 1 meter segments and then find the mean of each group.

groupedSelectSuppData = selectSuppDataCols.round({'depth_mid (m)':0}).groupby('depth_mid (m)').mean()
groupedSelectSuppData

	ageGICC05_mid (yr BP)	NH4 (ng/g)	NO3 (ng/g)	Na (ng/g)	Ca (ng/g)	SO4 (ng/g)
depth_mid (m)
1136.0	7189.955900	2.260000	70.690000	19.690000	8.580000	NaN
1137.0	7199.980300	0.740000	50.730000	15.510000	2.950000	NaN
1138.0	7210.026200	2.360000	64.500000	13.240000	4.980000	NaN
1139.0	7220.002700	3.510000	81.570000	11.410000	4.710000	NaN
1140.0	7230.078700	3.050000	76.250000	9.960000	5.060000	NaN
...	...	...	...	...	...	...
2428.0	128255.012940	16.572000	81.759000	11.107000	12.325000	104.300000
2429.0	128349.993456	15.378889	82.525556	9.871111	14.343333	104.888889
2430.0	128440.014856	13.934444	85.722222	10.133333	12.745556	111.388889
2431.0	128525.016150	13.946250	84.508750	11.307500	16.842500	137.425000
2432.0	128594.201117	14.113333	85.388333	10.751667	16.296667	133.080000

1139 rows × 6 columns

The describe method computes some statistics for each DataFrame column.

groupedSelectSuppData.describe()

	ageGICC05_mid (yr BP)	NH4 (ng/g)	NO3 (ng/g)	Na (ng/g)	Ca (ng/g)	SO4 (ng/g)
count	1139.000000	1129.000000	1123.000000	1117.000000	1128.000000	990.000000
mean	43672.378440	8.094032	86.185192	39.381287	119.149032	161.776225
std	35066.820306	5.331508	15.274404	31.692534	143.952161	122.421050
min	7189.955900	0.703333	49.870000	3.779677	2.260000	28.164286
25%	11815.005350	4.464286	73.480769	12.490000	11.128782	66.751786
50%	34095.004542	6.727059	85.856923	26.979000	48.780625	101.917476
75%	68242.486354	10.226250	98.014063	67.589000	207.234659	249.153795
max	128594.201117	44.740000	133.250000	131.764545	787.039167	662.541667

A brief look at Seaborn#

Although this notebook uses Matplotlib to create visualizations, for some applications, Matplotlib’s low-level control may be more complex than is required. Seaborn, like Matplotlib, produces publication quality plots but is easier to use.

We have already imported Seaborn using the standard naming convention, duplicated here for convenience:

import seaborn as sns

As before, to explore the Seaborn namespace, uncomment the line in the next code cell, and then enter a tab at the end of the line to view a dropdown box of all constants and functions available to the user.

#sns.

Numerous common visualizations using Seaborn can be constructed in minutes by modifying examples found in the tutorial and gallery. The links below are duplicated from subsection 4.1.3 for convenience:

Seaborn User Guide and Tutorial: https://seaborn.pydata.org/tutorial.html
Seaborn Example Gallery: https://seaborn.pydata.org/examples/index.html

For example, we can use Seaborn to quickly plot pairwise relationships in the ice core dataset (as selected and grouped in subsection 4.2.4) using the single Python statement:

sns.pairplot(groupedSelectSuppData[['NH4 (ng/g)', 'NO3 (ng/g)', 'Na (ng/g)',
                                  'Ca (ng/g)', 'SO4 (ng/g)']].dropna())

<seaborn.axisgrid.PairGrid at 0x7f109068cf10>

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_53_1.png

Getting started with Matplotlib#

In this notebook, we will use Matplotlib to programmatically create a specialized figure with subplots and customized axes. By convention, Matplotlib’s pyplot and ticker APIs are imported using the statements:

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

These namespaces can be explored by removing the # and entering a tab at the end of the lines:

#plt.

#ticker.

In Matplotlib, a figure contains subplots arranged in a grid, as shown in this example from Chapter 9 of Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, Second Edition, by Wes McKinney (O’Reilly, September 2018) [2]. Unless otherwise specified, data are plotted on the most recently created subplot, as in the line

plt.plot(np.random.randn(50).cumsum(), 'k--')

which plots points on the third subplot, ax3, just created in the preceding line. To select a subplot on which to plot data, use the name of the subplot, as shown in the final two lines:

ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))

fig = plt.figure()
ax1 = fig.add_subplot(2,2,1)
ax2 = fig.add_subplot(2,2,2)
ax3 = fig.add_subplot(2,2,3)
plt.plot(np.random.randn(50).cumsum(), 'k--')
ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))

<matplotlib.collections.PathCollection at 0x7f108a478c70>

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_59_1.png

Part 1: Recreate Figure 2, step-by-step, with intermediate results#

In this section, we will create a reasonable facimile of Figure 2, as published in Nature Communications, one step at a time, showing the resulting figure after each step.

Figure 2 (as published in Nature Communications [4])	The resulting facimile of Figure 2

Step 1: Prepare the data and initialize the figure with four empty axes#

Figure 2 uses only Supplentary Data 2, which we’ve already read into a DataFrame in subsection 4.2.4.

supp_data_2

	depth_top (m)	depth_bott (m)	depth_mid (m)	ageGICC05ext_top (yr BP)	ageGICC05ext_bott (yr BP)	ageGICC05_mid (yr BP)	NH4source (ng/m3)	NH4source+1sigma (ng/m3)	NH4source-1sigma (ng/m3)	NH4 atm. residence time (days)	Nasource (ng/m3)	Nasource+1sigma (ng/m3)	Nasource-1sigma (ng/m3)	Na atm. residence time (days)	Casource (ng/m3)	Casource+1sigma (ng/m3)	Casource-1sigma (ng/m3)	Ca atm. residence time (days)
0	1135.6400	1136.6575	1136.1488	7185.0000	7194.9119	7189.9559	35.548	66.914	4.1827	4.76	232.92	582.70	-116.86	1.59	33.85	82.68	-14.98	7.12
1	1136.6575	1137.6295	1137.1435	7194.9119	7205.0488	7199.9803	11.800	22.243	1.3565	4.74	185.62	465.66	-94.42	1.58	11.70	28.64	-5.25	7.10
2	1137.6295	1138.5920	1138.1108	7205.0488	7215.0036	7210.0262	37.856	71.542	4.1694	4.70	161.75	407.77	-84.26	1.57	20.02	49.24	-9.20	7.06
3	1138.5920	1139.5085	1139.0503	7215.0036	7225.0018	7220.0027	56.530	106.980	6.0802	4.68	140.98	356.32	-74.36	1.56	19.02	46.90	-8.86	7.04
4	1139.5085	1140.4435	1139.9760	7225.0018	7235.1555	7230.0787	49.152	93.035	5.2689	4.67	123.39	311.95	-65.16	1.56	20.48	50.53	-9.57	7.04
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
11441	2431.6630	2431.7880	2431.7255	128575.0400	128585.0400	128580.0400	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11442	2431.7880	2431.9130	2431.8505	128585.0400	128595.0400	128590.0400	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11443	2431.9130	2432.0460	2431.9795	128595.0400	128605.1227	128600.0814	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11444	2432.0460	2432.1535	2432.0998	128605.1227	128612.5026	128608.8127	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11445	2432.1535	2432.2610	2432.2073	128612.5026	128619.8825	128616.1926	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

11446 rows × 18 columns

Listing the column names shows that Supplentary Data 2 lacks a column for age in kyr, so first create a new column and add it to the DataFrame.

supp_data_2.columns

Index(['depth_top (m)', 'depth_bott (m)', 'depth_mid (m)',
       'ageGICC05ext_top (yr BP)', 'ageGICC05ext_bott (yr BP)',
       'ageGICC05_mid (yr BP)', 'NH4source (ng/m3)',
       'NH4source+1sigma (ng/m3)', 'NH4source-1sigma (ng/m3)',
       'NH4 atm. residence time (days)', 'Nasource (ng/m3)',
       'Nasource+1sigma (ng/m3)', 'Nasource-1sigma (ng/m3)',
       'Na atm. residence time (days)', 'Casource (ng/m3)',
       'Casource+1sigma (ng/m3)', 'Casource-1sigma (ng/m3)',
       'Ca atm. residence time (days)'],
      dtype='object')

# Add column for age in kyr to the DataFrame.
supp_data_2['age_kyr'] = supp_data_2['ageGICC05_mid (yr BP)'] / 1000

Now, listing the columns and viewing the dataset confirm that the new column for kyr has been added.

supp_data_2.columns

Index(['depth_top (m)', 'depth_bott (m)', 'depth_mid (m)',
       'ageGICC05ext_top (yr BP)', 'ageGICC05ext_bott (yr BP)',
       'ageGICC05_mid (yr BP)', 'NH4source (ng/m3)',
       'NH4source+1sigma (ng/m3)', 'NH4source-1sigma (ng/m3)',
       'NH4 atm. residence time (days)', 'Nasource (ng/m3)',
       'Nasource+1sigma (ng/m3)', 'Nasource-1sigma (ng/m3)',
       'Na atm. residence time (days)', 'Casource (ng/m3)',
       'Casource+1sigma (ng/m3)', 'Casource-1sigma (ng/m3)',
       'Ca atm. residence time (days)', 'age_kyr'],
      dtype='object')

supp_data_2

	depth_top (m)	depth_bott (m)	depth_mid (m)	ageGICC05ext_top (yr BP)	ageGICC05ext_bott (yr BP)	ageGICC05_mid (yr BP)	NH4source (ng/m3)	NH4source+1sigma (ng/m3)	NH4source-1sigma (ng/m3)	NH4 atm. residence time (days)	Nasource (ng/m3)	Nasource+1sigma (ng/m3)	Nasource-1sigma (ng/m3)	Na atm. residence time (days)	Casource (ng/m3)	Casource+1sigma (ng/m3)	Casource-1sigma (ng/m3)	Ca atm. residence time (days)	age_kyr
0	1135.6400	1136.6575	1136.1488	7185.0000	7194.9119	7189.9559	35.548	66.914	4.1827	4.76	232.92	582.70	-116.86	1.59	33.85	82.68	-14.98	7.12	7.189956
1	1136.6575	1137.6295	1137.1435	7194.9119	7205.0488	7199.9803	11.800	22.243	1.3565	4.74	185.62	465.66	-94.42	1.58	11.70	28.64	-5.25	7.10	7.199980
2	1137.6295	1138.5920	1138.1108	7205.0488	7215.0036	7210.0262	37.856	71.542	4.1694	4.70	161.75	407.77	-84.26	1.57	20.02	49.24	-9.20	7.06	7.210026
3	1138.5920	1139.5085	1139.0503	7215.0036	7225.0018	7220.0027	56.530	106.980	6.0802	4.68	140.98	356.32	-74.36	1.56	19.02	46.90	-8.86	7.04	7.220003
4	1139.5085	1140.4435	1139.9760	7225.0018	7235.1555	7230.0787	49.152	93.035	5.2689	4.67	123.39	311.95	-65.16	1.56	20.48	50.53	-9.57	7.04	7.230079
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
11441	2431.6630	2431.7880	2431.7255	128575.0400	128585.0400	128580.0400	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	128.580040
11442	2431.7880	2431.9130	2431.8505	128585.0400	128595.0400	128590.0400	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	128.590040
11443	2431.9130	2432.0460	2431.9795	128595.0400	128605.1227	128600.0814	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	128.600081
11444	2432.0460	2432.1535	2432.0998	128605.1227	128612.5026	128608.8127	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	128.608813
11445	2432.1535	2432.2610	2432.2073	128612.5026	128619.8825	128616.1926	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	128.616193

11446 rows × 19 columns

Now that the data has been prepared, create a new figure with four subplots. Matplotlib supports a subset of latex, so we can use familiar latex format to label the y-axes. For more, see the Matplotlib documentation: https://matplotlib.org/stable/tutorials/text/mathtext.html

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_70_1.png

Step 2: Position the y-axis labels and tick marks#

The 2nd and 4th plots should have y-axis labels and tick marks on the right using the Python code:

for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

Note that in this and each of the following steps, the newly added code is highlighted by ‘##########’ both above and below.

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

###############################################################################
# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_72_1.png

Step 3: Remove unnecessary spines#

Next, we remove lines (spines) that don’t appear in Figure 2 as published in Nature by adding:

for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

###############################################################################    
# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
###############################################################################

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_74_0.png

Step 4: Configure the tick marks#

To draw the tick marks inside the plot area, we adjust the tick_params by setting the direction to ‘in’:

for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

###############################################################################
# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')
###############################################################################

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_76_1.png

Step 5: Plot data in subplots a-c#

Plot the data in subplots a-c using a lighter shade (by setting alpha=0.4) of each selected color:

supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')
    
# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

###############################################################################
# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_78_1.png

Step 6: Use a log scale for selected subplots#

Next, we use a log scale for the y-axis of subplots a-c, but we don’t use exponential format for the y-axis labels:

for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')
    
###############################################################################
# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())
###############################################################################

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_80_1.png

Step 7: Plot a running average#

Calculate and plot the smoothed function values using a darker line.

To do this, first we use min_periods to remove extraneous “mean” values when the actual data is “NaN”:

supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

Then, the smoothed function value is represented by plotting the 21 point running means of the 10 year data, as described in the caption for Figure 2, on page 4 of Schupbach et al. [4].

supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()
supp_data_2_rolling_mean

	depth_top (m)	depth_bott (m)	depth_mid (m)	ageGICC05ext_top (yr BP)	ageGICC05ext_bott (yr BP)	ageGICC05_mid (yr BP)	NH4source (ng/m3)	NH4source+1sigma (ng/m3)	NH4source-1sigma (ng/m3)	NH4 atm. residence time (days)	Nasource (ng/m3)	Nasource+1sigma (ng/m3)	Nasource-1sigma (ng/m3)	Na atm. residence time (days)	Casource (ng/m3)	Casource+1sigma (ng/m3)	Casource-1sigma (ng/m3)	Ca atm. residence time (days)	age_kyr
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
11441	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11442	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11443	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11444	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11445	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

11446 rows × 19 columns

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

###############################################################################
# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_83_1.png

Step 8: Draw horizonal dashed lines#

Next, we plot the dashed lines indicating the early Holocene average, using dates selected based on the title of Figure 3, on page 6 of Schupbach et al. [4].

Before modifying the code, we first explore how to find the mean of each column of interest.

early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_data_2.mean()

NH4source (ng/m3)    127.979238
Nasource (ng/m3)      88.577290
Casource (ng/m3)      27.611878
dtype: float64

Then, based on the exploration above, we modify the code by adding:

early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)
###############################################################################
# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_87_1.png

Step 9: Plot data in subplot d#

The fourth subplot is a bit different from the first three. Rather than a single value plotted with a running average, subplot d will have three values, plotted as colored lines together with a legend to identify them. Although the results of this cell produce a legend that is obscured by the plotted lines, we will fix this soon.

supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

###############################################################################
# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_90_1.png

Step 10: Draw vertical gridlines#

The first step in drawing the vertical gridlines consists of two for loops:

for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

###############################################################################
# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_92_1.png

Step 11: Adjust axes to match the published figure#

The vertical gridlines produced by the previous cell are disconnected, the gridlines don’t show up on the left and right edges, and the dark tick marks need to be adjusted to match the figure. To make these changes, we will:

Remove the space between the subplots:

fig.subplots_adjust(hspace=0)

By commenting out (and in subsequent cells, removing) the following for loop, show the rightmost gridline in subplots a and c (axes 0 and 2).

#for i in [0, 2]:
#    axes[i].spines['right'].set_visible(False)

Set x-axis ticks (default ticks were drawn every 20 kyr):

NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

Show x-axis tick marks only the bottommost plot:

for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
###############################################################################
fig.subplots_adjust(hspace=0)
###############################################################################

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
###############################################################################
#for i in [0, 2]:
#    axes[i].spines['right'].set_visible(False)
###############################################################################
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

###########################################################################
# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)
###########################################################################

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_94_0.png

Step 12: Draw a depth scale along top of figure#

Draw an x-axis using an alternate scale (depth) along top of figure. First, explore how to plot the depth to correspond to the date axis using a lambda function.

supp_data_2.loc[supp_data_1['depth_mid (m)'] >= 1700, 'ageGICC05_mid (yr BP)']

    33490.0091
    33500.0082
    33509.9918
    33519.9707
    33529.9916
            ...     
  128580.0400
  128590.0400
  128600.0814
  128608.8127
  128616.1926
Name: ageGICC05_mid (yr BP), Length: 8815, dtype: float64

supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'ageGICC05_mid (yr BP)'].min()

33490.0091

supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 2400, 'age_kyr'].min()

123.3099824

neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_1['depth_mid (m)'] >= d, 'age_kyr'].min()

ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top_xticks

[7.1899559,
4499691,
48005,
4900091,
43005,
0400041,
6099891,
52002499999999,
96999240000001,
3099824]

kyr_to_depth = lambda y: supp_data_2.loc[supp_data_2['age_kyr'] <= y, 'depth_mid (m)'].max()
print(kyr_to_depth(130))

2432.2073

Then, using the exploration above (repeated here as it will appear in the code):

ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()

Draw an x-axis using an alternate scale (depth) along top of figure:

ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

###########################################################################
# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')
###########################################################################

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_102_0.png

Step 13: Remove extraneous spines#

Remove the right and bottom spines created by the new x-axis by setting their visibility to False:

ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
###########################################################################
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)
###########################################################################

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_104_0.png

Step 14: Plot gray uncertainty bands#

In subplots 2a, 2b, and 2c, plot light gray uncertainty bands:

x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

###############################################################################
# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')
###############################################################################

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_106_0.png

Step 15: Tweak the y-axis settings#

Tweak the y-axis settings to correspond to the published figure:

axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

###############################################################################    
# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)
###############################################################################

(0, 25)

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_108_1.png

Step 16: Annotate the figure#

In this step, we annotate the figure. First, add ‘a’, ‘b’, ‘c’, and ‘d’ to the subplots using the code:

axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

Then, add the notations ‘Hol’, ‘LGM’, ‘Warm interstadials’, ‘Cold interstadials’, and ‘Eem’ to subplot c:

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)

###############################################################################
# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)
###############################################################################

Text(120, 80, 'Eem')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_110_1.png

Step 17. Adjust the figure size and the font size of the labels on the x-axes, y-axes, and legend#

The figure size and the default font size, including all labels of all of the axes and in the legend, can be adjusted by the two lines:

fig, axes = plt.subplots(4, 1, sharex=True, figsize=(16,14))
plt.rcParams['font.size'] = 18

Note that when adjusting the font size, Matplotlib may alter the number of tick marks displayed. For example, in the preceding line, setting the default font size to 20 (instead of 18) causes many of the desired tick marks on the y-axes to disappear.

The default label size can be overridden. Here, the labels in the legend are adjusted in order to create a better fit in the upper left corner of subplot d:

axes[3].legend(loc='upper left', frameon=False, fontsize=14)

# Figure 2

# Create a new figure with four subplots.
###############################################################################
# Adjust the figure size and the default font size for the figure, which 
# includes the labels on all of the x-axes, y-axes, and the legend.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(16,14))
plt.rcParams['font.size'] = 18
###############################################################################
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')

###############################################################################
# Subplot 2d legend. The default label size is overridden in order to create 
# a better fit in the upper left corner of subplot d.
axes[3].legend(loc='upper left', frameon=False, fontsize=14)
###############################################################################

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)

# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)

Text(120, 80, 'Eem')

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_112_1.png

Step 18. Save the figure to a file#

And finally, we save the resulting figure to a file. The file formats available are backend-dependent. On one of the author’s systems, the available file formats were:

{'ps':   'Postscript',
 'eps':  'Encapsulated Postscript',
 'pdf':  'Portable Document Format',
 'pgf':  'PGF code for LaTeX',
 'png':  'Portable Network Graphics',
 'raw':  'Raw RGBA bitmap',
 'rgba': 'Raw RGBA bitmap',
 'svg':  'Scalable Vector Graphics',
 'svgz': 'Scalable Vector Graphics'}

To see a list of the file formats available on your system, execute the following cell.

plt.gcf().canvas.get_supported_filetypes()

{'ps': 'Postscript',
 'eps': 'Encapsulated Postscript',
 'pdf': 'Portable Document Format',
 'pgf': 'PGF code for LaTeX',
 'png': 'Portable Network Graphics',
 'raw': 'Raw RGBA bitmap',
 'rgba': 'Raw RGBA bitmap',
 'svg': 'Scalable Vector Graphics',
 'svgz': 'Scalable Vector Graphics'}

<Figure size 432x288 with 0 Axes>

The file format to be used is specified by the extension on the filename given as the fname parameter to savefig. Here, we request png format and also trim the blank space from the margins:

fig.savefig(fname='OurFig2.png', bbox_inches='tight', pad_inches=0)

Numerous options for savefig are available, including one to set the desired resolution in dots per inch (dpi). For a full list of available options, see the Matplotlib documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html

# Figure 2

# Create a new figure with four subplots.
# Adjust the figure size and the default font size for the figure, which 
# includes the labels on all of the x-axes, y-axes, and the legend.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(16,14))
plt.rcParams['font.size'] = 18
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')

# Subplot 2d legend. The default label size is overridden in order to create 
# a better fit in the upper left corner of subplot d.
axes[3].legend(loc='upper left', frameon=False, fontsize=14)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)

# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)

###############################################################################
# Save the final result to a file.
fig.savefig(fname='OurFig2.png', bbox_inches='tight', pad_inches=0)
###############################################################################

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_116_0.png

Final complete script (with our final output)#

For completeness, reading the data from the csv file and adding a column for the age in kyr have been added to the top of the code.

# Figure 2

# Read data from a csv file into a DataFrame.
supp_data_2 = pd.read_csv('data/supplementary-data-2.csv', sep=',',
                          skiprows=27, na_values=['NaN', '#VALUE!'])

# Add column for age in kyr to the DataFrame.
supp_data_2['age_kyr'] = supp_data_2['ageGICC05_mid (yr BP)'] / 1000

# Create a new figure with four subplots.
# Adjust the figure size and the default font size for the figure, which 
# includes the labels on all of the x-axes, y-axes, and the legend.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(16,14))
plt.rcParams['font.size'] = 18
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')

# Subplot 2d legend. The default label size is overridden in order to create 
# a better fit in the upper left corner of subplot d.
axes[3].legend(loc='upper left', frameon=False, fontsize=14)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)

# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)

# Save the final result to a file.
fig.savefig(fname='OurFig2.png', bbox_inches='tight', pad_inches=0)

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_118_0.png

Published figure (for comparison)#

Compare our result above with the published figure below:

Part 2: Recreate Figure 1 (subplots a-e) using the preceding code as a template#

Using the previous code as a template, we can quickly produce a figure similar to Figure 1. Because subplots f and g use data from another source, the facimile only contains subplots a-e. The data used to produce these visualizations are found in Supplementary Data 1.

The original Figure 1, and then our Python code and its output:

# Figure 1 (subplots a-e)

# Read data from a csv file into a DataFrame.
supp_data_1 = pd.read_csv('data/supplementary-data-1.csv', sep=',',
                          skiprows=18)

# Add column for age in kyr to the DataFrame.
supp_data_1['age_kyr'] = supp_data_1['ageGICC05_mid (yr BP)'] / 1000

# Create a new figure with five subplots.
# Adjust the figure size and the default font size for the figure, which 
# includes the labels on all of the x-axes and y-axes.
fig, axes = plt.subplots(5, 1, sharex=True, figsize=(15,15))
plt.rcParams['font.size'] = 16
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_1.loc[supp_data_1['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_1.loc[supp_data_1['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(5):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-e, but don't use exponential format
# for y-axis labels.
for i in range(5):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 1 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in range(4):
    axes[i].spines['bottom'].set_visible(False)
for i in range(1, 5):
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 1a-e.
# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_1.plot(ax=axes[0], x='age_kyr', y='NH4 (ng/g)',
                 color='green', alpha=0.4, legend=False)
supp_data_1.plot(ax=axes[1], x='age_kyr', y='NO3 (ng/g)',
                 color='orange', alpha=0.4, legend=False)
supp_data_1.plot(ax=axes[2], x='age_kyr', y='Na (ng/g)',
                 color='blue', alpha=0.4, legend=False)
supp_data_1.plot(ax=axes[3], x='age_kyr', y='Ca (ng/g)',
                 color='red', alpha=0.4, legend=False)
supp_data_1.plot(ax=axes[4], x='age_kyr', y='SO4 (ng/g)',
                 color='purple', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_1_smoothed = supp_data_1.rolling(21, min_periods=21, center=True)
supp_data_1_rolling_mean = supp_data_1_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_1_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4 (ng/g)',
                              color='green', legend=False)
supp_data_1_rolling_mean.plot(ax=axes[1], x='age_kyr', y='NO3 (ng/g)',
                              color='brown', legend=False)
supp_data_1_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Na (ng/g)',
                              color='blue', legend=False)
supp_data_1_rolling_mean.plot(ax=axes[3], x='age_kyr', y='Ca (ng/g)',
                              color='red', legend=False)
supp_data_1_rolling_mean.plot(ax=axes[4], x='age_kyr', y='SO4 (ng/g)',
                              color='purple', legend=False)

# Draw vertical gridlines.
for i in range(5):
    axes[i].xaxis.grid(which='major')
for i in [0, 2, 4]:
    axes[i].spines['right'].set_color('lightgrey')
    
# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $g^{-1}$)')
axes[1].set_ylabel(r'$NO_3^-$ (ng $g^{-1}$)')
axes[2].set_ylabel(r'$Na^+$ (ng $g^{-1}$)')
axes[3].set_ylabel(r'$Ca^{2+}$ (ng $g^{-1}$)')
axes[4].set_ylabel(r'$SO_4^{2-}$ (ng $g^{-1}$)')

axes[4].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[4].set_xlim(0, 130)
axes[4].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[4].xaxis.set_minor_locator(ticker.LinearLocator(2 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(4):
    axes[i].tick_params(axis='x', which='both', bottom=False)
    
# Tweak y-axis limits to correspond to published figure.
axes[0].set_ylim(0.1, 100.0)
axes[1].set_ylim(40, 400)
axes[2].set_ylim(1, 1000)
axes[3].set_ylim(1, 1000)
axes[4].set_ylim(10, 10000)

# Tweak y-axis settings for subplot b.
axes[1].set_yticks([50, 100, 200])
axes[1].get_yaxis().set_major_formatter(ticker.ScalarFormatter())
axes[1].get_yaxis().set_minor_formatter(ticker.NullFormatter())
    
# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[4].annotate('e', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[3].annotate('Hol', xy=(5, 2), xycoords='data', fontsize=16)
axes[3].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[3].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[3].annotate('Warm interstadials', xy=(29, 8), xycoords='data', fontsize=16)
axes[3].annotate('Eem', xy=(120, 2), xycoords='data', fontsize=16)

# Save the final result to a file.
fig.savefig(fname='OurFig1.png', bbox_inches='tight', pad_inches=0)

../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_121_0.png

References#

Free Software Foundation. The GNU General Public License v3.0, accessed 4 April 2022. https://www.gnu.org/licenses/gpl-3.0.en.html

2. McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, Second Edition, O’Reilly Media, September 2018. https://wesmckinney.com/pages/book.html (author’s page) and https://github.com/wesm/pydata-book (datasets and Jupyter notebooks)

3. Python Software Foundation. Python 3.10.4 Documentation, 2001-2022. https://docs.python.org/3/

4. Schüpbach, S., et al. “Greenland records of aerosol source and atmospheric lifetime changes from the Eemian to the Holocene,” Nature Communications, volume 9, Article number: 1476 (2018), last accessed 15 April 2022. https://www.nature.com/articles/s41467-018-03924-3

5. VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data, November 2016. https://jakevdp.github.io/PythonDataScienceHandbook/ (book in html) and https://github.com/jakevdp/PythonDataScienceHandbook (full text in Jupyter notebooks)

6. VanderPlas, Jake. A Whirlwind Tour of Python, August 2016. https://jakevdp.github.io/WhirlwindTourOfPython/ (free Jupyter notebooks and pdf)

EarthCube 2022 Call for Notebooks

Visualizing Ice Core Datasets in Python Using a Jupyter Notebook Template

Contents