Visualizing Ice Core Datasets in Python Using a Jupyter Notebook Template#

Authors#

  • Author1 = {“name”: “L. Markowsky”, “affiliation”: “Kennesaw State University”, “email”: “lmarkowsky@gmail.com”}

  • Author2 = {“name”: “Jessica Scheick”, “affiliation”: “University of New Hampshire”, “email”: “jbscheick@gmail.com”, “orcid”: “0000-0002-3421-4459”}

Table of Contents

Purpose#

Chemical constituents measured in ice cores are often visualized in a standard format, with figures produced by proprietary or closed tools. Such tools present an extra step that requires a context switch in the process of analyzing and visualizing results. Python and its many machine learning, numerical computation, and visualization libraries together with Jupyter notebooks offer a unified alternative that permits researchers to analyze and visualize ice core datasets in a single, highly-integrated ecosystem. This Jupyter notebook uses exclusively open-source Python libraries (specifically NumPy, Matplotlib, and Pandas) to create a readily reproducible, publication quality figure. We demonstrate the utility of the notebook as a template for ice core researchers engaging in open science by recreating multiple figures published by Schupbach et al. in Nature Communications, 16 April 2018. This notebook provides a step-by-step guide to systematically recreate two figures using Schupbach’s original data.

Technical contributions#

  • This notebook presents an annotated, step-by-step template to create publication-quality figures presenting ice core chemistry data using open-source Python libraries.

  • Two figures from an article by Schupbach et al. that appeared in Nature Communications, 16 April 2018, are reproduced to demonstrate how to use Python to visualize ice core data.

  • Part 1: First, as closely as possible, we will reproduce Figure 2 as published in Nature using the datasets provided by the authors.

  • Part 2: Then, we will use Part 1 as a model to reproduce Figure 1 (subplots a-e only).

Methodology#

A typical visualization of ice core data is shown, step-by-step, with explanations. Briefly, the steps in the process are:

  1. Prepare the data and initialize a figure with four empty axes.

  2. Move the y-axis labels and tick marks of every other plot to the right-hand side.

  3. Remove extraneous spines (lines).

  4. Draw the tick marks inside the plot area.

  5. Plot the data in a light shade of each selected color.

  6. Use a log scale for the y-axis of slected plots.

  7. Plot a running average of the data using a darker shade of each selected color.

  8. Add horizontal dashed lines to show the early Holocene average.

  9. Plot lines in the final subplot, and create a legend to label them.

  10. Draw vertical gridlines common to all plots in the figure.

  11. Adjust the axes (remove horizontal space, tweak tick settings) to match the published figure.

  12. Draw a depth scale along the top of the figure.

  13. Remove extraneous spines (lines) created by step 12.

  14. Plot gray bands in subplots a-c.

  15. Adjust the limits of the y-axes to match the published figure.

  16. Add annotations to the figure.

  17. Adjust the figure size and the font size of the labels on the x-axes, y-axes, and legend.

  18. Save the figure to a file.

The code used to produce the facimile of Figure 2 (Part 1) is then used as a template to produce a facimile of Figure 1 (subplots a-e) (Part 2).

Results#

Part 1#

Figure 2 (as published in Nature Communications [4])

The resulting facimile of Figure 2

Fig2.jpg

OurFig2.png

Part 2#

Figure 1 (as published in Nature Communications [4])

The template deleveloped in Part 1 was adapted to produce the following facimile of Figure 1 (subplots a-e only)

Fig1.jpg

OurFig1.png

Keywords#

keywords=[“ice core”, “visualization”, “climate”, “climate change”, “reproducible”]

Citation#

Markowsky, L., and Scheick, J. “Visualizing Ice Core Datasets in Python Using a Jupyter Notebook Template,” 2022.

Future work#

The authors’ future work based on this notebook will be to modularize the code and then to:

  • Create a Jupyter book, with the introductory material (subsections 4.1 and 4.2) expanded and separated into a separate notebook. This Jupyter book will be designed for use by novice Python programmers and especially crafted for use in a classroom setting.

  • Develop and release a Python package, designed for use by ice core scientists wishing to use a more flexible, or simply a Python-based, tool to create this well-known type of visualization of chemical constituents in ice core datasets.

Acknowledgements#

The authors thank the organizers of the NSF 2019 Arctic Data Workshop, held at the University of Maine, where a preliminary version of this notebook was presented. The authors also thank the reviewers for their helpful remarks.

Setup#

Library import#

# Data manipulation
import numpy as np
import pandas as pd

# Visualizations
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns

Source data#

All files (paper in PDF format, datasets, figures, and supplementary information) were downloaded from www.nature.com on 30 March 2019. [4] The original datasets were provided in the form of two Excel files, which were then saved as CSV text files using LibreOffice Calc. In this notebook, the data are imported and prepared from both the Excel and CSV text files in subsections 4.1.4 (Exploring data with Pandas) and 4.2.1 (Step 1: Prepare the data and initialize the figure with four empty axes).

Descriptions of the two datasets, reproduced in the following two subsections, are included at the top of Schupbach et al.’s original Excel files.

References for the two datasets:

  • Schüpbach et al., Greenland records of aerosol source and atmospheric lifetime changes from the Eemian to the Holocene, Nature Communications, 2018. https://www.nature.com/articles/s41467-018-03924-3#Sec11

  • Rasmussen et al., A first chronology for the North Greenland Eemian Ice Drilling (NEEM) ice core. Climate of the Past 9, 2713–2730, doi:10.5194/cp-9-2713-2013 (2013).

NEEM ion concentration data in 10 yr resolution#

Column

Description of Data

A

NEEM depth in m below the surface of top of 10 yr interval

B

NEEM depth in m below the surface of bottom of 10 yr interval

C

NEEM depth in m below the surface of middle of 10 yr interval

D

GICC05 age of the top of the 10 yr interval applied to the NEEM ice core
(Rasmussen et al., 2013) in years before present (where present is defined as 1950).

E

GICC05 age of the bottom of the 10 yr interval applied to the NEEM ice core
(Rasmussen et al., 2013) in years before present (where present is defined as 1950).

F

GICC05 age of the middle of the 10 yr interval applied to the NEEM ice core
(Rasmussen et al., 2013) in years before present (where present is defined as 1950).

G

mean NH4+ ice concentration of each 10 yr interval in ng/g
(missing values indicated by NaN)

H

mean NO3- ice concentration of each 10 yr interval in ng/g
(missing values indicated by NaN)

I

mean Na+ ice concentration of each 10 yr interval in ng/g
(missing values indicated by NaN)

J

mean Ca2+ ice concentration of each 10 yr interval in ng/g
(missing values indicated by NaN)

K

meanSO42- ice concentration of each 10 yr interval in ng/g
(missing values indicated by NaN)

Aerosol source concentrations and atmospheric lifetime in 10 yr resolution#

Column

Description of Data

A

NEEM depth in m below the surface of top of 10 yr interval

B

NEEM depth in m below the surface of bottom of 10 yr interval

C

NEEM depth in m below the surface of middle of 10 yr interval

D

GICC05 age of the top of the 10 yr interval applied to the NEEM ice core
(Rasmussen et al., 2013) in years before present (where present is defined as 1950).

E

GICC05 age of the bottom of the 10 yr interval applied to the NEEM ice core
(Rasmussen et al., 2013) in years before present (where present is defined as 1950).

F

GICC05 age of the middle of the 10 yr interval applied to the NEEM ice core
(Rasmussen et al., 2013) in years before present (where present is defined as 1950).

G

mean of NH4+ aerosol concentration at the source for each 10 yr interval in ng/m3
(missing values indicated by NaN)

H

mean + 1 sigma of NH4+ aerosol concentration at the source for each 10 yr interval in ng/m3
(missing values indicated by NaN)

I

mean - 1 sigma of NH4+ aerosol concentration at the source for each 10 yr interval in ng/m3
(missing values indicated by NaN)

J

atmospheric residence time of NH4+ for each 10 yr interval in days

K

mean of Na+ aerosol concentration at the source for each 10 yr interval in ng/m3
(missing values indicated by NaN)

L

mean + 1 sigma of Na+ aerosol concentration at the source for each 10 yr interval in ng/m3
(missing values indicated by NaN)

M

mean - 1 sigma of Na+ aerosol concentration at the source for each 10 yr interval in ng/m3
(missing values indicated by NaN)

N

atmospheric residence time of Na+ for each 10 yr interval in days

O

mean of Ca2+ aerosol concentration at the source for each 10 yr interval in ng/m3
(missing values indicated by NaN)

P

mean + 1 sigma of Ca2+ aerosol concentration at the source for each 10 yr interval in ng/m3
(missing values indicated by NaN)

Q

mean - 1 sigma of Ca2+ aerosol concentration at the source for each 10 yr interval in ng/m3
(missing values indicated by NaN)

R

atmospheric residence time of Ca2+ for each 10 yr interval in days

Data Processing and Analysis#

A brief introduction to the Python data science ecosystem#

Why Python?#

Why spend the time to learn a new language when < R/Excel/MATLAB > gets the job done?

This is a frequently asked question, even among software developers, as it is often easier to use a language or tool that you already know well. Sometimes, however, a new language or tool can offer benefits not available or not easy to achieve otherwise.

Some of Python’s benefits#

The Python ecosystem, and in particular, the Python data science stack, has much to offer:

  • The Python data science stack consists of mature, well-developed and well-documented libraries for analyzing and visualizing data.

  • The Python data science stack is free, open-source, and cross-platform.

  • Although Python, together with its libraries, is an example of a very high level language, well-written programs in the Python ecosystem can have very good performance.

  • Programming in Python yields clean, readable, highly maintainable and reusable source code.

  • Python’s large, friendly user community makes it easy to find answers to common problems online.

  • Python has become the most widely used programming language in data science and machine learning.


A first look at NumPy, Pandas, and Matplotlib#

Exploring a namespace#

We have already imported NumPy and Pandas using standard naming conventions, in part duplicated here for convenience:

import numpy as np
import pandas as pd

Once a library has been imported, tab completion can be used in a Jupyter notebook code cell to see everything that’s included in a particular library. To explore the NumPy namespace, uncomment the line in the next code cell, and then enter a tab at the end of the line to view a dropdown box of all constants and functions available to the user. (The line in the following cell is commented out simply to enable the notebook to run without errors.)

For example, in the NumPy namespace, enter a \(<\)tab\(>\) after np. to see the dropdown menu. To rapidly look for all names in the namespace beginning with the letter ‘p’, type ‘p’. The constant ‘pi’ can then be selected, and as usual, Ctrl-Enter will execute the cell, showing the value of \(\pi\), a global constant in the NumPy namespace.

#np.

As an example of a function, enter a \(<\)tab\(>\) after np., type ‘cor’ to show everything in the namespace beginning with ‘cor’, select ‘correlate’ from the dropdown menu, and finally Ctrl-Enter to execute the cell to see the value of correlate, a function implemented in NumPy.

#np.

Accessing built-in help#

Appending ? or ?? to a function name returns built-in help:

  • ? shows the Python docstring

  • ?? shows the source code (which includes the docstring at the top)

The docstring includes an explanation of each parameter and the return value, if any, and may also examples of the function’s use, as seen by executing the cells below.

np.correlate?
np.correlate??

Three fundamental NumPy and Pandas data structures: ndarray, Series, and DataFrame#

The basic data type in NumPy is the n-dimensional array, or ndarray. The values in the array are all of the same type.

np.ndarray?

The basic data types in Pandas are Series and DataFrame. The Series object is like an ndarray, only with an index (which could be names or other identifying information about the array).

pd.Series?

The DataFrame holds a table of values, where each column is a Series. While each column has a particular type, different columns may have different types. Like indices, columns can be named to make it easier to reference certain ones.

pd.DataFrame?

Exploring an ice core dataset with Pandas#

Calling the function pd.read_csv(<filename>) reads a CSV text file into a DataFrame.

In the following cell, sep=',' is superfluous (because a comma is the default separator), but the same function can read tab-separated values by specifying sep='\t'.

The skiprows= keyword lets Pandas know to not read in the first x rows of the file. This is necessary when those first rows contain metadata or other non-tabulated data. Here we have manually determined how many rows need to be skipped and entered those values.

# Read data from two csv files into two DataFrames.
supp_data_1 = pd.read_csv('data/supplementary-data-1.csv', sep=',', skiprows=18)
supp_data_2 = pd.read_csv('data/supplementary-data-2.csv', sep=',', skiprows=27,
                          na_values=['NaN', '#VALUE!'])

In Python, missing data is often indicated by the None, or in the case of missing numerical data, by NaN, which stands for Not a Number. In the following (abbreviated) table of values, NaN is used for the missing data, which appeared as both NaN and #VALUE! entries in Schupbach et al.’s original datasets.

supp_data_1
depth_top (m) depth_bott (m) depth_mid (m) ageGICC05ext_top (yr BP) ageGICC05ext_bott (yr BP) ageGICC05_mid (yr BP) NH4 (ng/g) NO3 (ng/g) Na (ng/g) Ca (ng/g) SO4 (ng/g)
0 1135.6400 1136.6575 1136.1488 7185.0000 7194.9119 7189.9559 2.26 70.69 19.69 8.58 NaN
1 1136.6575 1137.6295 1137.1435 7194.9119 7205.0488 7199.9803 0.74 50.73 15.51 2.95 NaN
2 1137.6295 1138.5920 1138.1108 7205.0488 7215.0036 7210.0262 2.36 64.50 13.24 4.98 NaN
3 1138.5920 1139.5085 1139.0503 7215.0036 7225.0018 7220.0027 3.51 81.57 11.41 4.71 NaN
4 1139.5085 1140.4435 1139.9760 7225.0018 7235.1555 7230.0787 3.05 76.25 9.96 5.06 NaN
... ... ... ... ... ... ... ... ... ... ... ...
11441 2431.6630 2431.7880 2431.7255 128575.0400 128585.0400 128580.0400 13.69 85.04 9.82 11.09 102.5
11442 2431.7880 2431.9130 2431.8505 128585.0400 128595.0400 128590.0400 15.02 86.84 9.86 13.13 104.5
11443 2431.9130 2432.0460 2431.9795 128595.0400 128605.1227 128600.0814 12.10 82.00 10.12 16.29 139.8
11444 2432.0460 2432.1535 2432.0998 128605.1227 128612.5026 128608.8127 14.65 86.12 12.36 28.48 197.3
11445 2432.1535 2432.2610 2432.2073 128612.5026 128619.8825 128616.1926 14.34 85.44 13.36 17.05 NaN

11446 rows × 11 columns

The data were initially downloaded as Excel files. We could alternatively read the Excel file directly into a Pandas dataframe without first converting the Excel file to an intermediate CSV text file. Notice the slightly different keyword inputs required for this function.

supp_data_1_xlsx = pd.read_excel('data/supplementary-data-1.xlsx',
                                 sheet_name='Blatt1', skiprows=18)
supp_data_1_xlsx
depth_top (m) depth_bott (m) depth_mid (m) ageGICC05ext_top (yr BP) ageGICC05ext_bott (yr BP) ageGICC05_mid (yr BP) NH4 (ng/g) NO3 (ng/g) Na (ng/g) Ca (ng/g) SO4 (ng/g)
0 1135.6400 1136.6575 1136.14875 7185.000000 7194.911859 7189.955929 2.256877 70.691648 19.686037 8.580064 NaN
1 1136.6575 1137.6295 1137.14350 7194.911859 7205.048775 7199.980317 0.744581 50.732062 15.510608 2.945996 NaN
2 1137.6295 1138.5920 1138.11075 7205.048775 7215.003636 7210.026206 2.363056 64.503053 13.243701 4.984971 NaN
3 1138.5920 1139.5085 1139.05025 7215.003636 7225.001818 7220.002727 3.507831 81.572921 11.413816 4.707585 NaN
4 1139.5085 1140.4435 1139.97600 7225.001818 7235.155487 7230.078653 3.045829 76.253548 9.964454 5.062247 NaN
... ... ... ... ... ... ... ... ... ... ... ...
11441 2431.6630 2431.7880 2431.72550 128575.040000 128585.040000 128580.040000 13.694443 85.040346 9.820952 11.091903 102.5
11442 2431.7880 2431.9130 2431.85050 128585.040000 128595.040000 128590.040000 15.020880 86.841639 9.855394 13.127113 104.5
11443 2431.9130 2432.0460 2431.97950 128595.040000 128605.122732 128600.081366 12.096210 81.999379 10.119022 16.292825 139.8
11444 2432.0460 2432.1535 2432.09975 128605.122732 128612.502632 128608.812682 14.645366 86.115549 12.358430 28.478098 197.3
11445 2432.1535 2432.2610 2432.20725 128612.502632 128619.882531 128616.192581 14.344681 85.439942 13.355068 17.046181 NaN

11446 rows × 11 columns

We can gather information about our data using some of Panda’s methods.

supp_data_1.shape
(11446, 11)
supp_data_1.columns
Index(['depth_top (m)', 'depth_bott (m)', 'depth_mid (m)',
       'ageGICC05ext_top (yr BP)', 'ageGICC05ext_bott (yr BP)',
       'ageGICC05_mid (yr BP)', 'NH4 (ng/g)', 'NO3 (ng/g)', 'Na (ng/g)',
       'Ca (ng/g)', 'SO4 (ng/g)'],
      dtype='object')
supp_data_2.shape
(11446, 18)
supp_data_2.columns
Index(['depth_top (m)', 'depth_bott (m)', 'depth_mid (m)',
       'ageGICC05ext_top (yr BP)', 'ageGICC05ext_bott (yr BP)',
       'ageGICC05_mid (yr BP)', 'NH4source (ng/m3)',
       'NH4source+1sigma (ng/m3)', 'NH4source-1sigma (ng/m3)',
       'NH4 atm. residence time (days)', 'Nasource (ng/m3)',
       'Nasource+1sigma (ng/m3)', 'Nasource-1sigma (ng/m3)',
       'Na atm. residence time (days)', 'Casource (ng/m3)',
       'Casource+1sigma (ng/m3)', 'Casource-1sigma (ng/m3)',
       'Ca atm. residence time (days)'],
      dtype='object')

Selecting a subset of columns produces a view into the DataFrame. This can make large DataFrames easier to work with. For efficiency, nothing is copied.

selectSuppDataCols = supp_data_1[['depth_mid (m)', 'ageGICC05_mid (yr BP)',
                                  'NH4 (ng/g)', 'NO3 (ng/g)', 'Na (ng/g)',
                                  'Ca (ng/g)', 'SO4 (ng/g)']]

A powerful tool in Pandas is called grouping. Grouping allows us to bin data and calculate statistics about each group. Here we reduce the number of rows by grouping samples taken in 1 meter segments and then find the mean of each group.

groupedSelectSuppData = selectSuppDataCols.round({'depth_mid (m)':0}).groupby('depth_mid (m)').mean()
groupedSelectSuppData
ageGICC05_mid (yr BP) NH4 (ng/g) NO3 (ng/g) Na (ng/g) Ca (ng/g) SO4 (ng/g)
depth_mid (m)
1136.0 7189.955900 2.260000 70.690000 19.690000 8.580000 NaN
1137.0 7199.980300 0.740000 50.730000 15.510000 2.950000 NaN
1138.0 7210.026200 2.360000 64.500000 13.240000 4.980000 NaN
1139.0 7220.002700 3.510000 81.570000 11.410000 4.710000 NaN
1140.0 7230.078700 3.050000 76.250000 9.960000 5.060000 NaN
... ... ... ... ... ... ...
2428.0 128255.012940 16.572000 81.759000 11.107000 12.325000 104.300000
2429.0 128349.993456 15.378889 82.525556 9.871111 14.343333 104.888889
2430.0 128440.014856 13.934444 85.722222 10.133333 12.745556 111.388889
2431.0 128525.016150 13.946250 84.508750 11.307500 16.842500 137.425000
2432.0 128594.201117 14.113333 85.388333 10.751667 16.296667 133.080000

1139 rows × 6 columns

The describe method computes some statistics for each DataFrame column.

groupedSelectSuppData.describe()
ageGICC05_mid (yr BP) NH4 (ng/g) NO3 (ng/g) Na (ng/g) Ca (ng/g) SO4 (ng/g)
count 1139.000000 1129.000000 1123.000000 1117.000000 1128.000000 990.000000
mean 43672.378440 8.094032 86.185192 39.381287 119.149032 161.776225
std 35066.820306 5.331508 15.274404 31.692534 143.952161 122.421050
min 7189.955900 0.703333 49.870000 3.779677 2.260000 28.164286
25% 11815.005350 4.464286 73.480769 12.490000 11.128782 66.751786
50% 34095.004542 6.727059 85.856923 26.979000 48.780625 101.917476
75% 68242.486354 10.226250 98.014063 67.589000 207.234659 249.153795
max 128594.201117 44.740000 133.250000 131.764545 787.039167 662.541667

A brief look at Seaborn#

Although this notebook uses Matplotlib to create visualizations, for some applications, Matplotlib’s low-level control may be more complex than is required. Seaborn, like Matplotlib, produces publication quality plots but is easier to use.

We have already imported Seaborn using the standard naming convention, duplicated here for convenience:

import seaborn as sns

As before, to explore the Seaborn namespace, uncomment the line in the next code cell, and then enter a tab at the end of the line to view a dropdown box of all constants and functions available to the user.

#sns.

Numerous common visualizations using Seaborn can be constructed in minutes by modifying examples found in the tutorial and gallery. The links below are duplicated from subsection 4.1.3 for convenience:

For example, we can use Seaborn to quickly plot pairwise relationships in the ice core dataset (as selected and grouped in subsection 4.2.4) using the single Python statement:

sns.pairplot(groupedSelectSuppData[['NH4 (ng/g)', 'NO3 (ng/g)', 'Na (ng/g)',
                                  'Ca (ng/g)', 'SO4 (ng/g)']].dropna())
<seaborn.axisgrid.PairGrid at 0x7f109068cf10>
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_53_1.png

Getting started with Matplotlib#

In this notebook, we will use Matplotlib to programmatically create a specialized figure with subplots and customized axes. By convention, Matplotlib’s pyplot and ticker APIs are imported using the statements:

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

These namespaces can be explored by removing the # and entering a tab at the end of the lines:

#plt.
#ticker.

In Matplotlib, a figure contains subplots arranged in a grid, as shown in this example from Chapter 9 of Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, Second Edition, by Wes McKinney (O’Reilly, September 2018) [2]. Unless otherwise specified, data are plotted on the most recently created subplot, as in the line

plt.plot(np.random.randn(50).cumsum(), 'k--')

which plots points on the third subplot, ax3, just created in the preceding line. To select a subplot on which to plot data, use the name of the subplot, as shown in the final two lines:

ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))
fig = plt.figure()
ax1 = fig.add_subplot(2,2,1)
ax2 = fig.add_subplot(2,2,2)
ax3 = fig.add_subplot(2,2,3)
plt.plot(np.random.randn(50).cumsum(), 'k--')
ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))
<matplotlib.collections.PathCollection at 0x7f108a478c70>
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_59_1.png

Part 1: Recreate Figure 2, step-by-step, with intermediate results#

In this section, we will create a reasonable facimile of Figure 2, as published in Nature Communications, one step at a time, showing the resulting figure after each step.

Figure 2 (as published in Nature Communications [4])

The resulting facimile of Figure 2

Fig2.jpg

OurFig2.png

Step 1: Prepare the data and initialize the figure with four empty axes#

Figure 2 uses only Supplentary Data 2, which we’ve already read into a DataFrame in subsection 4.2.4.

supp_data_2
depth_top (m) depth_bott (m) depth_mid (m) ageGICC05ext_top (yr BP) ageGICC05ext_bott (yr BP) ageGICC05_mid (yr BP) NH4source (ng/m3) NH4source+1sigma (ng/m3) NH4source-1sigma (ng/m3) NH4 atm. residence time (days) Nasource (ng/m3) Nasource+1sigma (ng/m3) Nasource-1sigma (ng/m3) Na atm. residence time (days) Casource (ng/m3) Casource+1sigma (ng/m3) Casource-1sigma (ng/m3) Ca atm. residence time (days)
0 1135.6400 1136.6575 1136.1488 7185.0000 7194.9119 7189.9559 35.548 66.914 4.1827 4.76 232.92 582.70 -116.86 1.59 33.85 82.68 -14.98 7.12
1 1136.6575 1137.6295 1137.1435 7194.9119 7205.0488 7199.9803 11.800 22.243 1.3565 4.74 185.62 465.66 -94.42 1.58 11.70 28.64 -5.25 7.10
2 1137.6295 1138.5920 1138.1108 7205.0488 7215.0036 7210.0262 37.856 71.542 4.1694 4.70 161.75 407.77 -84.26 1.57 20.02 49.24 -9.20 7.06
3 1138.5920 1139.5085 1139.0503 7215.0036 7225.0018 7220.0027 56.530 106.980 6.0802 4.68 140.98 356.32 -74.36 1.56 19.02 46.90 -8.86 7.04
4 1139.5085 1140.4435 1139.9760 7225.0018 7235.1555 7230.0787 49.152 93.035 5.2689 4.67 123.39 311.95 -65.16 1.56 20.48 50.53 -9.57 7.04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11441 2431.6630 2431.7880 2431.7255 128575.0400 128585.0400 128580.0400 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11442 2431.7880 2431.9130 2431.8505 128585.0400 128595.0400 128590.0400 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11443 2431.9130 2432.0460 2431.9795 128595.0400 128605.1227 128600.0814 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11444 2432.0460 2432.1535 2432.0998 128605.1227 128612.5026 128608.8127 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11445 2432.1535 2432.2610 2432.2073 128612.5026 128619.8825 128616.1926 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

11446 rows × 18 columns

Listing the column names shows that Supplentary Data 2 lacks a column for age in kyr, so first create a new column and add it to the DataFrame.

supp_data_2.columns
Index(['depth_top (m)', 'depth_bott (m)', 'depth_mid (m)',
       'ageGICC05ext_top (yr BP)', 'ageGICC05ext_bott (yr BP)',
       'ageGICC05_mid (yr BP)', 'NH4source (ng/m3)',
       'NH4source+1sigma (ng/m3)', 'NH4source-1sigma (ng/m3)',
       'NH4 atm. residence time (days)', 'Nasource (ng/m3)',
       'Nasource+1sigma (ng/m3)', 'Nasource-1sigma (ng/m3)',
       'Na atm. residence time (days)', 'Casource (ng/m3)',
       'Casource+1sigma (ng/m3)', 'Casource-1sigma (ng/m3)',
       'Ca atm. residence time (days)'],
      dtype='object')
# Add column for age in kyr to the DataFrame.
supp_data_2['age_kyr'] = supp_data_2['ageGICC05_mid (yr BP)'] / 1000

Now, listing the columns and viewing the dataset confirm that the new column for kyr has been added.

supp_data_2.columns
Index(['depth_top (m)', 'depth_bott (m)', 'depth_mid (m)',
       'ageGICC05ext_top (yr BP)', 'ageGICC05ext_bott (yr BP)',
       'ageGICC05_mid (yr BP)', 'NH4source (ng/m3)',
       'NH4source+1sigma (ng/m3)', 'NH4source-1sigma (ng/m3)',
       'NH4 atm. residence time (days)', 'Nasource (ng/m3)',
       'Nasource+1sigma (ng/m3)', 'Nasource-1sigma (ng/m3)',
       'Na atm. residence time (days)', 'Casource (ng/m3)',
       'Casource+1sigma (ng/m3)', 'Casource-1sigma (ng/m3)',
       'Ca atm. residence time (days)', 'age_kyr'],
      dtype='object')
supp_data_2
depth_top (m) depth_bott (m) depth_mid (m) ageGICC05ext_top (yr BP) ageGICC05ext_bott (yr BP) ageGICC05_mid (yr BP) NH4source (ng/m3) NH4source+1sigma (ng/m3) NH4source-1sigma (ng/m3) NH4 atm. residence time (days) Nasource (ng/m3) Nasource+1sigma (ng/m3) Nasource-1sigma (ng/m3) Na atm. residence time (days) Casource (ng/m3) Casource+1sigma (ng/m3) Casource-1sigma (ng/m3) Ca atm. residence time (days) age_kyr
0 1135.6400 1136.6575 1136.1488 7185.0000 7194.9119 7189.9559 35.548 66.914 4.1827 4.76 232.92 582.70 -116.86 1.59 33.85 82.68 -14.98 7.12 7.189956
1 1136.6575 1137.6295 1137.1435 7194.9119 7205.0488 7199.9803 11.800 22.243 1.3565 4.74 185.62 465.66 -94.42 1.58 11.70 28.64 -5.25 7.10 7.199980
2 1137.6295 1138.5920 1138.1108 7205.0488 7215.0036 7210.0262 37.856 71.542 4.1694 4.70 161.75 407.77 -84.26 1.57 20.02 49.24 -9.20 7.06 7.210026
3 1138.5920 1139.5085 1139.0503 7215.0036 7225.0018 7220.0027 56.530 106.980 6.0802 4.68 140.98 356.32 -74.36 1.56 19.02 46.90 -8.86 7.04 7.220003
4 1139.5085 1140.4435 1139.9760 7225.0018 7235.1555 7230.0787 49.152 93.035 5.2689 4.67 123.39 311.95 -65.16 1.56 20.48 50.53 -9.57 7.04 7.230079
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11441 2431.6630 2431.7880 2431.7255 128575.0400 128585.0400 128580.0400 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 128.580040
11442 2431.7880 2431.9130 2431.8505 128585.0400 128595.0400 128590.0400 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 128.590040
11443 2431.9130 2432.0460 2431.9795 128595.0400 128605.1227 128600.0814 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 128.600081
11444 2432.0460 2432.1535 2432.0998 128605.1227 128612.5026 128608.8127 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 128.608813
11445 2432.1535 2432.2610 2432.2073 128612.5026 128619.8825 128616.1926 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 128.616193

11446 rows × 19 columns

Now that the data has been prepared, create a new figure with four subplots. Matplotlib supports a subset of latex, so we can use familiar latex format to label the y-axes. For more, see the Matplotlib documentation: https://matplotlib.org/stable/tutorials/text/mathtext.html

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')
Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_70_1.png

Step 2: Position the y-axis labels and tick marks#

The 2nd and 4th plots should have y-axis labels and tick marks on the right using the Python code:

for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

Note that in this and each of the following steps, the newly added code is highlighted by ‘##########’ both above and below.

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

###############################################################################
# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')
Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_72_1.png

Step 3: Remove unnecessary spines#

Next, we remove lines (spines) that don’t appear in Figure 2 as published in Nature by adding:

for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

###############################################################################    
# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
###############################################################################
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_74_0.png

Step 4: Configure the tick marks#

To draw the tick marks inside the plot area, we adjust the tick_params by setting the direction to ‘in’:

for i in range(4):
    axes[i].tick_params(which='both', direction='in')
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

###############################################################################
# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')
###############################################################################

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')
Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_76_1.png

Step 5: Plot data in subplots a-c#

Plot the data in subplots a-c using a lighter shade (by setting alpha=0.4) of each selected color:

supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')
    
# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

###############################################################################
# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')
Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_78_1.png

Step 6: Use a log scale for selected subplots#

Next, we use a log scale for the y-axis of subplots a-c, but we don’t use exponential format for the y-axis labels:

for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')
    
###############################################################################
# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())
###############################################################################

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')
Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_80_1.png

Step 7: Plot a running average#

Calculate and plot the smoothed function values using a darker line.

To do this, first we use min_periods to remove extraneous “mean” values when the actual data is “NaN”:

supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

Then, the smoothed function value is represented by plotting the 21 point running means of the 10 year data, as described in the caption for Figure 2, on page 4 of Schupbach et al. [4].

supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()
supp_data_2_rolling_mean
depth_top (m) depth_bott (m) depth_mid (m) ageGICC05ext_top (yr BP) ageGICC05ext_bott (yr BP) ageGICC05_mid (yr BP) NH4source (ng/m3) NH4source+1sigma (ng/m3) NH4source-1sigma (ng/m3) NH4 atm. residence time (days) Nasource (ng/m3) Nasource+1sigma (ng/m3) Nasource-1sigma (ng/m3) Na atm. residence time (days) Casource (ng/m3) Casource+1sigma (ng/m3) Casource-1sigma (ng/m3) Ca atm. residence time (days) age_kyr
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11441 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11442 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11443 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11444 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11445 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

11446 rows × 19 columns

# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

###############################################################################
# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')
Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_83_1.png

Step 8: Draw horizonal dashed lines#

Next, we plot the dashed lines indicating the early Holocene average, using dates selected based on the title of Figure 3, on page 6 of Schupbach et al. [4].

Before modifying the code, we first explore how to find the mean of each column of interest.

early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_data_2.mean()
NH4source (ng/m3)    127.979238
Nasource (ng/m3)      88.577290
Casource (ng/m3)      27.611878
dtype: float64

Then, based on the exploration above, we modify the code by adding:

early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)
###############################################################################
# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')
Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_87_1.png

Step 9: Plot data in subplot d#

The fourth subplot is a bit different from the first three. Rather than a single value plotted with a running average, subplot d will have three values, plotted as colored lines together with a legend to identify them. Although the results of this cell produce a legend that is obscured by the plotted lines, we will fix this soon.

supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

###############################################################################
# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')
Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_90_1.png

Step 10: Draw vertical gridlines#

The first step in drawing the vertical gridlines consists of two for loops:

for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [0, 2]:
    axes[i].spines['right'].set_visible(False)
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

###############################################################################
# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')
###############################################################################

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')
Text(0.5, 0, 'Age (kyr BP GICC05modelext-NEEM-1)')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_92_1.png

Step 11: Adjust axes to match the published figure#

The vertical gridlines produced by the previous cell are disconnected, the gridlines don’t show up on the left and right edges, and the dark tick marks need to be adjusted to match the figure. To make these changes, we will:

  • Remove the space between the subplots:

fig.subplots_adjust(hspace=0)
  • By commenting out (and in subsequent cells, removing) the following for loop, show the rightmost gridline in subplots a and c (axes 0 and 2).

#for i in [0, 2]:
#    axes[i].spines['right'].set_visible(False)
  • Set x-axis ticks (default ticks were drawn every 20 kyr):

NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))
  • Show x-axis tick marks only the bottommost plot:

for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
###############################################################################
fig.subplots_adjust(hspace=0)
###############################################################################

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
###############################################################################
#for i in [0, 2]:
#    axes[i].spines['right'].set_visible(False)
###############################################################################
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

###########################################################################
# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)
###########################################################################
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_94_0.png

Step 12: Draw a depth scale along top of figure#

Draw an x-axis using an alternate scale (depth) along top of figure. First, explore how to plot the depth to correspond to the date axis using a lambda function.

supp_data_2.loc[supp_data_1['depth_mid (m)'] >= 1700, 'ageGICC05_mid (yr BP)']
2630      33490.0091
2631      33500.0082
2632      33509.9918
2633      33519.9707
2634      33529.9916
            ...     
11441    128580.0400
11442    128590.0400
11443    128600.0814
11444    128608.8127
11445    128616.1926
Name: ageGICC05_mid (yr BP), Length: 8815, dtype: float64
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'ageGICC05_mid (yr BP)'].min()
33490.0091
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 2400, 'age_kyr'].min()
123.3099824
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_1['depth_mid (m)'] >= d, 'age_kyr'].min()

ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top_xticks
[7.1899559,
 15.4499691,
 23.48005,
 33.4900091,
 42.43005,
 53.0400041,
 70.6099891,
 86.52002499999999,
 107.96999240000001,
 123.3099824]
kyr_to_depth = lambda y: supp_data_2.loc[supp_data_2['age_kyr'] <= y, 'depth_mid (m)'].max()
print(kyr_to_depth(130))
2432.2073

Then, using the exploration above (repeated here as it will appear in the code):

ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()

Draw an x-axis using an alternate scale (depth) along top of figure:

ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

###########################################################################
# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')
###########################################################################

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_102_0.png

Step 13: Remove extraneous spines#

Remove the right and bottom spines created by the new x-axis by setting their visibility to False:

ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
###########################################################################
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)
###########################################################################

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_104_0.png

Step 14: Plot gray uncertainty bands#

In subplots 2a, 2b, and 2c, plot light gray uncertainty bands:

x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

###############################################################################
# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')
###############################################################################

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_106_0.png

Step 15: Tweak the y-axis settings#

Tweak the y-axis settings to correspond to the published figure:

axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

###############################################################################    
# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)
###############################################################################
(0, 25)
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_108_1.png

Step 16: Annotate the figure#

In this step, we annotate the figure. First, add ‘a’, ‘b’, ‘c’, and ‘d’ to the subplots using the code:

axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

Then, add the notations ‘Hol’, ‘LGM’, ‘Warm interstadials’, ‘Cold interstadials’, and ‘Eem’ to subplot c:

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)
# Figure 2
# Create a new figure with four subplots.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(15,10))
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')
axes[3].legend(loc='upper left', frameon=False)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)

###############################################################################
# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)
###############################################################################
Text(120, 80, 'Eem')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_110_1.png

Step 17. Adjust the figure size and the font size of the labels on the x-axes, y-axes, and legend#

The figure size and the default font size, including all labels of all of the axes and in the legend, can be adjusted by the two lines:

fig, axes = plt.subplots(4, 1, sharex=True, figsize=(16,14))
plt.rcParams['font.size'] = 18

Note that when adjusting the font size, Matplotlib may alter the number of tick marks displayed. For example, in the preceding line, setting the default font size to 20 (instead of 18) causes many of the desired tick marks on the y-axes to disappear.

The default label size can be overridden. Here, the labels in the legend are adjusted in order to create a better fit in the upper left corner of subplot d:

axes[3].legend(loc='upper left', frameon=False, fontsize=14)
# Figure 2

# Create a new figure with four subplots.
###############################################################################
# Adjust the figure size and the default font size for the figure, which 
# includes the labels on all of the x-axes, y-axes, and the legend.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(16,14))
plt.rcParams['font.size'] = 18
###############################################################################
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')

###############################################################################
# Subplot 2d legend. The default label size is overridden in order to create 
# a better fit in the upper left corner of subplot d.
axes[3].legend(loc='upper left', frameon=False, fontsize=14)
###############################################################################

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)

# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)
Text(120, 80, 'Eem')
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_112_1.png

Step 18. Save the figure to a file#

And finally, we save the resulting figure to a file. The file formats available are backend-dependent. On one of the author’s systems, the available file formats were:

{'ps':   'Postscript',
 'eps':  'Encapsulated Postscript',
 'pdf':  'Portable Document Format',
 'pgf':  'PGF code for LaTeX',
 'png':  'Portable Network Graphics',
 'raw':  'Raw RGBA bitmap',
 'rgba': 'Raw RGBA bitmap',
 'svg':  'Scalable Vector Graphics',
 'svgz': 'Scalable Vector Graphics'}

To see a list of the file formats available on your system, execute the following cell.

plt.gcf().canvas.get_supported_filetypes()
{'ps': 'Postscript',
 'eps': 'Encapsulated Postscript',
 'pdf': 'Portable Document Format',
 'pgf': 'PGF code for LaTeX',
 'png': 'Portable Network Graphics',
 'raw': 'Raw RGBA bitmap',
 'rgba': 'Raw RGBA bitmap',
 'svg': 'Scalable Vector Graphics',
 'svgz': 'Scalable Vector Graphics'}
<Figure size 432x288 with 0 Axes>

The file format to be used is specified by the extension on the filename given as the fname parameter to savefig. Here, we request png format and also trim the blank space from the margins:

fig.savefig(fname='OurFig2.png', bbox_inches='tight', pad_inches=0)

Numerous options for savefig are available, including one to set the desired resolution in dots per inch (dpi). For a full list of available options, see the Matplotlib documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html

# Figure 2

# Create a new figure with four subplots.
# Adjust the figure size and the default font size for the figure, which 
# includes the labels on all of the x-axes, y-axes, and the legend.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(16,14))
plt.rcParams['font.size'] = 18
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')

# Subplot 2d legend. The default label size is overridden in order to create 
# a better fit in the upper left corner of subplot d.
axes[3].legend(loc='upper left', frameon=False, fontsize=14)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)

# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)

###############################################################################
# Save the final result to a file.
fig.savefig(fname='OurFig2.png', bbox_inches='tight', pad_inches=0)
###############################################################################
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_116_0.png

Final complete script (with our final output)#

For completeness, reading the data from the csv file and adding a column for the age in kyr have been added to the top of the code.

# Figure 2

# Read data from a csv file into a DataFrame.
supp_data_2 = pd.read_csv('data/supplementary-data-2.csv', sep=',',
                          skiprows=27, na_values=['NaN', '#VALUE!'])

# Add column for age in kyr to the DataFrame.
supp_data_2['age_kyr'] = supp_data_2['ageGICC05_mid (yr BP)'] / 1000

# Create a new figure with four subplots.
# Adjust the figure size and the default font size for the figure, which 
# includes the labels on all of the x-axes, y-axes, and the legend.
fig, axes = plt.subplots(4, 1, sharex=True, figsize=(16,14))
plt.rcParams['font.size'] = 18
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_2.loc[supp_data_2['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_2.loc[supp_data_2['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(4):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-c, but don't use exponential format
# for y-axis labels.
for i in range(3):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 2 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in [0, 1, 2]:
    axes[i].spines['bottom'].set_visible(False)
for i in [1, 2, 3]:
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 2a, 2b, and 2c.
# Plot gray bands first.
x = supp_data_2['age_kyr']
axes[0].fill_between(x, supp_data_2['NH4source+1sigma (ng/m3)'],
                     supp_data_2['NH4source-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[1].fill_between(x, supp_data_2['Nasource+1sigma (ng/m3)'],
                     supp_data_2['Nasource-1sigma (ng/m3)'],
                     facecolor='lightgray')
axes[2].fill_between(x, supp_data_2['Casource+1sigma (ng/m3)'],
                     supp_data_2['Casource-1sigma (ng/m3)'],
                     facecolor='lightgray')

# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_2.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                 color='green', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                 color='blue', alpha=0.4, legend=False)
supp_data_2.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                 color='red', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_2_smoothed = supp_data_2.rolling(21, min_periods=21, center=True)
supp_data_2_rolling_mean = supp_data_2_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_2_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4source (ng/m3)',
                              color='green', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[1], x='age_kyr', y='Nasource (ng/m3)',
                              color='blue', legend=False)
supp_data_2_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Casource (ng/m3)',
                              color='red', legend=False)

# Plot the dashed lines indicating the early Holocene average.
# Dates selected based on the title of Figure 3.
early_hol_data_2 = (
    supp_data_2[(supp_data_2.age_kyr >= 7.6) & (supp_data_2.age_kyr <= 9.8)]
               [['NH4source (ng/m3)', 'Nasource (ng/m3)', 'Casource (ng/m3)']])
early_hol_mean = early_hol_data_2.mean()
axes[0].axhline(y=early_hol_mean['NH4source (ng/m3)'],
                color='black', linestyle='--')
axes[1].axhline(y=early_hol_mean['Nasource (ng/m3)'],
                color='black', linestyle='--')
axes[2].axhline(y=early_hol_mean['Casource (ng/m3)'],
                color='black', linestyle='--')

# Subplot 2d.
supp_data_2.plot(ax=axes[3], x='age_kyr', y='NH4 atm. residence time (days)',
                 color='green', label='${NH_4}^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Na atm. residence time (days)',
                 color='blue', label='$Na^+$')
supp_data_2.plot(ax=axes[3], x='age_kyr', y='Ca atm. residence time (days)',
                 color='red', label='$Ca^{2+}$')

# Subplot 2d legend. The default label size is overridden in order to create 
# a better fit in the upper left corner of subplot d.
axes[3].legend(loc='upper left', frameon=False, fontsize=14)

# Draw vertical gridlines.
for i in range(4):
    axes[i].xaxis.grid(which='major')
for i in [0, 2]:
    axes[i].spines['right'].set_color('lightgrey')

# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $m^{-3}$)')
axes[1].set_ylabel(r'$Na^+$ (ng $m^{-3}$)', )
axes[2].set_ylabel(r'$Ca^{2+}$ (ng $m^{-3}$)')
axes[3].set_ylabel('atm. res. time (days)')

axes[3].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[3].set_xlim(0, 130)
axes[3].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[3].xaxis.set_minor_locator(ticker.LinearLocator(10 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(3):
    axes[i].tick_params(axis='x', which='both', bottom=False)

# Tweak y-axis settings to correspond to published figure.
axes[0].set_ylim(1, 3000)
axes[1].set_ylim(10, 1000)
axes[2].set_ylim(10, 10000)
axes[3].set_ylim(0, 25)

# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[2].annotate('Hol', xy=(5, 80), xycoords='data', fontsize=16)
axes[2].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[2].annotate('Warm interstadials', xy=(29, 12), xycoords='data', fontsize=16)
axes[2].annotate('Eem', xy=(120, 80), xycoords='data', fontsize=16)

# Save the final result to a file.
fig.savefig(fname='OurFig2.png', bbox_inches='tight', pad_inches=0)
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_118_0.png

Published figure (for comparison)#

Compare our result above with the published figure below: Fig2.jpg


Part 2: Recreate Figure 1 (subplots a-e) using the preceding code as a template#

Using the previous code as a template, we can quickly produce a figure similar to Figure 1. Because subplots f and g use data from another source, the facimile only contains subplots a-e. The data used to produce these visualizations are found in Supplementary Data 1.

The original Figure 1, and then our Python code and its output:

Fig1.jpg

# Figure 1 (subplots a-e)

# Read data from a csv file into a DataFrame.
supp_data_1 = pd.read_csv('data/supplementary-data-1.csv', sep=',',
                          skiprows=18)

# Add column for age in kyr to the DataFrame.
supp_data_1['age_kyr'] = supp_data_1['ageGICC05_mid (yr BP)'] / 1000

# Create a new figure with five subplots.
# Adjust the figure size and the default font size for the figure, which 
# includes the labels on all of the x-axes and y-axes.
fig, axes = plt.subplots(5, 1, sharex=True, figsize=(15,15))
plt.rcParams['font.size'] = 16
fig.subplots_adjust(hspace=0)

# Draw an x-axis using an alternate scale (depth) along top of figure.
ax_top = axes[0].twiny()
ax_top.set_xlabel('NEEM depth (m)')
ax_top.xaxis.set_label_position('top')
ax_top.tick_params(top=True)
neem_depth_labels = [1000, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
                     2200, 2400]
supp_data_1.loc[supp_data_1['depth_mid (m)'] >= 1700, 'age_kyr'].min()
depth_to_kyr = lambda d: supp_data_1.loc[supp_data_1['depth_mid (m)'] >= d,
                                         'age_kyr'].min()
ax_top.set_xlim(0, 130)
ax_top_xticks = [depth_to_kyr(d) for d in neem_depth_labels]
ax_top.set_xticks(ax_top_xticks)
ax_top.set_xticklabels(neem_depth_labels, rotation=90)
ax_top.tick_params(direction='in')

# Draw all tick marks inside the plot area.
for i in range(5):
    axes[i].tick_params(which='both', direction='in')

# Use log scale for y-axis in subplots a-e, but don't use exponential format
# for y-axis labels.
for i in range(5):
    axes[i].set_yscale('log')
    axes[i].yaxis.set_major_formatter(ticker.ScalarFormatter())

# The 2nd and 4th plots have y-axis labels and tick marks on the right.
for i in [1, 3]:
    axes[i].yaxis.set_label_position('right')
    axes[i].tick_params(which='both', left=False, right=True,
                        labelleft=False, labelright=True)

# Remove lines that don't appear in Figure 1 as published in Nature.
for i in [1, 3]:
    axes[i].spines['left'].set_visible(False)
for i in range(4):
    axes[i].spines['bottom'].set_visible(False)
for i in range(1, 5):
    axes[i].spines['top'].set_visible(False)
ax_top.spines['right'].set_visible(False)
ax_top.spines['bottom'].set_visible(False)

# Subplots 1a-e.
# Plot data using a lighter shade (by setting alpha=0.4).
supp_data_1.plot(ax=axes[0], x='age_kyr', y='NH4 (ng/g)',
                 color='green', alpha=0.4, legend=False)
supp_data_1.plot(ax=axes[1], x='age_kyr', y='NO3 (ng/g)',
                 color='orange', alpha=0.4, legend=False)
supp_data_1.plot(ax=axes[2], x='age_kyr', y='Na (ng/g)',
                 color='blue', alpha=0.4, legend=False)
supp_data_1.plot(ax=axes[3], x='age_kyr', y='Ca (ng/g)',
                 color='red', alpha=0.4, legend=False)
supp_data_1.plot(ax=axes[4], x='age_kyr', y='SO4 (ng/g)',
                 color='purple', alpha=0.4, legend=False)

# Calculate and plot the smoothed function values using a darker line.
# Use min_periods to remove extraneous "mean" values when the actual data
# is "NaN".
supp_data_1_smoothed = supp_data_1.rolling(21, min_periods=21, center=True)
supp_data_1_rolling_mean = supp_data_1_smoothed.mean()

# Plot the 21 point running means of the 10 year data.
supp_data_1_rolling_mean.plot(ax=axes[0], x='age_kyr', y='NH4 (ng/g)',
                              color='green', legend=False)
supp_data_1_rolling_mean.plot(ax=axes[1], x='age_kyr', y='NO3 (ng/g)',
                              color='brown', legend=False)
supp_data_1_rolling_mean.plot(ax=axes[2], x='age_kyr', y='Na (ng/g)',
                              color='blue', legend=False)
supp_data_1_rolling_mean.plot(ax=axes[3], x='age_kyr', y='Ca (ng/g)',
                              color='red', legend=False)
supp_data_1_rolling_mean.plot(ax=axes[4], x='age_kyr', y='SO4 (ng/g)',
                              color='purple', legend=False)

# Draw vertical gridlines.
for i in range(5):
    axes[i].xaxis.grid(which='major')
for i in [0, 2, 4]:
    axes[i].spines['right'].set_color('lightgrey')
    
# Set labels directly on the axes objects. Matplotlib supports a subset of latex.
# See https://matplotlib.org/stable/tutorials/text/mathtext.html for documentation.
axes[0].set_ylabel(r'$NH_4^+$ (ng $g^{-1}$)')
axes[1].set_ylabel(r'$NO_3^-$ (ng $g^{-1}$)')
axes[2].set_ylabel(r'$Na^+$ (ng $g^{-1}$)')
axes[3].set_ylabel(r'$Ca^{2+}$ (ng $g^{-1}$)')
axes[4].set_ylabel(r'$SO_4^{2-}$ (ng $g^{-1}$)')

axes[4].set_xlabel('Age (kyr BP GICC05modelext-NEEM-1)')

# Set x-axis ticks (default ticks were drawn every 20 kyr).
NUM_XTICKS = 14
axes[4].set_xlim(0, 130)
axes[4].xaxis.set_major_locator(ticker.LinearLocator(NUM_XTICKS))
axes[4].xaxis.set_minor_locator(ticker.LinearLocator(2 * NUM_XTICKS - 1))

# Show x-axis tick marks only on the bottommost plot.
for i in range(4):
    axes[i].tick_params(axis='x', which='both', bottom=False)
    
# Tweak y-axis limits to correspond to published figure.
axes[0].set_ylim(0.1, 100.0)
axes[1].set_ylim(40, 400)
axes[2].set_ylim(1, 1000)
axes[3].set_ylim(1, 1000)
axes[4].set_ylim(10, 10000)

# Tweak y-axis settings for subplot b.
axes[1].set_yticks([50, 100, 200])
axes[1].get_yaxis().set_major_formatter(ticker.ScalarFormatter())
axes[1].get_yaxis().set_minor_formatter(ticker.NullFormatter())
    
# Annotate the figure.
axes[0].annotate('a', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[1].annotate('b', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[2].annotate('c', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[3].annotate('d', xy=(0.95, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')
axes[4].annotate('e', xy=(0.05, 0.85), xycoords='axes fraction',
                 fontsize=20, fontweight='bold')

axes[3].annotate('Hol', xy=(5, 2), xycoords='data', fontsize=16)
axes[3].annotate('LGM', xy=(20, 1000), xycoords='data', fontsize=16)
axes[3].annotate('Cold stadials', xy=(34, 1000), xycoords='data', fontsize=16)
axes[3].annotate('Warm interstadials', xy=(29, 8), xycoords='data', fontsize=16)
axes[3].annotate('Eem', xy=(120, 2), xycoords='data', fontsize=16)

# Save the final result to a file.
fig.savefig(fname='OurFig1.png', bbox_inches='tight', pad_inches=0)
../../_images/LM_01_Visualizing_Ice_Core_Datasets_in_Python_Using_a_Jupyter_Notebook_Template_121_0.png

References#

  1. Free Software Foundation. The GNU General Public License v3.0, accessed 4 April 2022. https://www.gnu.org/licenses/gpl-3.0.en.html

2. McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, Second Edition, O’Reilly Media, September 2018. https://wesmckinney.com/pages/book.html (author’s page) and https://github.com/wesm/pydata-book (datasets and Jupyter notebooks)

3. Python Software Foundation. Python 3.10.4 Documentation, 2001-2022. https://docs.python.org/3/

4. Schüpbach, S., et al. “Greenland records of aerosol source and atmospheric lifetime changes from the Eemian to the Holocene,” Nature Communications, volume 9, Article number: 1476 (2018), last accessed 15 April 2022. https://www.nature.com/articles/s41467-018-03924-3

5. VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data, November 2016. https://jakevdp.github.io/PythonDataScienceHandbook/ (book in html) and https://github.com/jakevdp/PythonDataScienceHandbook (full text in Jupyter notebooks)

6. VanderPlas, Jake. A Whirlwind Tour of Python, August 2016. https://jakevdp.github.io/WhirlwindTourOfPython/ (free Jupyter notebooks and pdf)