Author(s)#

Purpose#

Solar flare prediction plays an important role in understanding and forecasting space weather. The main goal of the Helioseismic and Magnetic Imager (HMI), one of the instruments on NASA’s Solar Dynamics Observatory (SDO), is to study the origin of solar variability and characterize the Sun’s magnetic activity. HMI provides continuous full-disk observations of the solar vector magnetic field with high cadence data that lead to reliable predictive capability; yet, solar flare prediction effort utilizing these data is still limited.

In this notebook we provide an overview of the FlareML system to demonstrate how to predict solar flares using machine learning (ML) and SDO/HMI vector magnetic data products (SHARP parameters).

Technical Contributions#

  • We provide the community with a new tool to predict solar flares.

Methodology#

Here we present a flare prediction system, named FlareML, for predicting solar flares using machine learning (ML) based on HMI’s vector magnetic data products. Specifically, we construct training data by utilizing the physical parameters provided by the Space-weather HMI Active Region Patches (SHARP) and categorize solar flares into four classes, namely B, C, M, X, according to the X-ray flare catalogs prepared by the National Centers for Environmental Information (NCEI). Thus, the solar flare prediction problem at hand is essentially a multi-class (i.e., four-class) classification problem. The FlareML system employs four machine learning methods to tackle this multi-class prediction problem. These four methods are: (i) ensemble (ENS), (ii) random forests (RF), (iii) multilayer perceptrons (MLP), and (iv) extreme learning machines (ELM). ENS works by taking the majority vote of the results obtained from RF, MLP and ELM. This notebook leverages python machine learning and visualization packages: matplotlib, numpy, scikit-learn, sklearn-extensions, and pandas. It describes the steps on how to use the FlareML tool to predict solar flare types: B, C, M, and X. The notebook is trained and tested on sample data sets to show flare predictions and their accuracies in graphical bar plots. FlareML is the backend of an online machine-learning-as-a-service system accessible at: https://nature.njit.edu/spacesoft/DeepSun/.

Notes:

  • Some models used in FlareML are not deterministic due to the randomness of their processes. Therefore, these models do not make the same prediction after re-training.
  • Detailed information about the parameters used for each model can be found in our published paper: https://iopscience.iop.org/article/10.1088/1674-4527/21/7/160

Funding#

This work was supported by U.S. NSF grants AGS-1927578 and AGS-1954737.

Keywords#

keywords=[“Flare”, “Prediction”, “Machine”, “Learning”, “SHARP”]

Citation#

To cite this notebook: Yasser Abduallah, Jason T. L. Wang, & Haimin Wang. Predicting Solar Flares with Machine Learning, available at: https://github.com/ccsc-tools/FlareML/blob/main/YA_01_PredictingSolarFlareswithMachineLearning.ipynb.

Acknowledgements#

We thank the team of SDO/HMI for producing vector magnetic data products. The flare catalogs were prepared by and made available through NOAA NCEI.

Setup#

Installation on Local Machine
Running this notebook in a local machine requires Python version 3.8.x with the following packages and their version:

Library

Version

Description

matplotlib

3.4.2

Graphics and visualization

numpy

1.19.5

Array manipulation

scikit-learn

0.24.2

Machine learning

sklearn-extensions

0.0.2

Extension for scikit-learn

pandas

1.2.4

Data loading and manipulation

You may install the package using Python pip packages manager as follows:

pip install matplotlib==3.4.2 numpy==1.19.5 scikit-learn==0.24.2 sklearn-extensions==0.0.2 pandas==1.2.4

Library Import
The following libraries need to be imported.

import warnings
warnings.filterwarnings('ignore')
# Data manipulation
import pandas as pd
import numpy as np

# Training the models
# The following libraries are used to train the algorithms: Random Forest, MLP, and ELM.
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn_extensions.extreme_learning_machines.elm import GenELMClassifier
from sklearn_extensions.extreme_learning_machines.random_layer import RBFRandomLayer, MLPRandomLayer

# Visualizations
import matplotlib.pyplot as plt
from flareml_utils import plot_custom_result

# Running the training, testing and prediction.
from flareml_train import train_model
from flareml_test import test_model

Data Processing and Analysis#

We created and stored 845 data samples in our database accessible at https://nature.njit.edu/spacesoft/Flare-Predict/, where each data sample contains values of 13 physical parameters or features. The two digits following a class label (B, C, M, X) are ignored in performing flare prediction. The time point of a data sample is the beginning time (00:00:01 early morning) of the start date of a flare and the label of the data sample is the class which the flare belongs to. These labeled data samples are used to train the FlareML system.

For this notebook, we use sample data sets for training and testing.

Binder#

This notebook is Binder enabled and can be run on mybinder.org by using the image link below:



Please note that starting Binder might take some time to create and start the image.

FlareML Workflow and Results#

Data Preparation and Loading#

The data folder includes two sub-directories: train_data and test_data.

  • The train_data includes a CSV training data file that is used to train the model.

  • The test_data includes a CSV test data file that is used to predict the included flares.

The files are loaded and used during the testing and training process.

Predicting with Pretrained Models#

There are default and pretrained models that can be used to predict without running your own trained model. The modelid is set to default_model which uses all pretrained algorithms.

from flareml_test import test_model
args =  {'test_data_file': 'data/test_data/flaringar_simple_random_40.csv', 
         'modelid': 'default_model'}
result = test_model(args)
Starting testing with a model with id: default_model testing data file: data/test_data/flaringar_simple_random_40.csv
Loading data set...
Done loading data...
Formatting and mapping the flares classes..
Prediction is in progress, please wait until it is done...
Finished the prediction task..

Plotting the Pretrained Models Results

from flareml_utils import plot_result
plot_result(result)
../../_images/YA_01_PredictingSolarFlareswithMachineLearning_15_0.png

ENS Model Training and Testing#

You may train the model with your own data or train the model with the default data.

ENS Model Training with Default Data
Here, we show how to train the model with default data. To train the model with your own data:

  1. You should first upload your file to the data directory (in the left hand side file list).

  2. Edit the args variable in the following code and update the path to the training file:
    ‘train_data_file’:’data/train_data/flaringar_training_sample.csv’
    and replace the value ‘data/train_data/flaringar_training_sample.csv’ with your new file name.

print('Loading the train_model function...')
from flareml_train import train_model
args = {'train_data_file':'data/train_data/flaringar_training_sample.csv',
        'algorithm': 'ENS',
       'modelid': 'custom_model_id'
      }
train_model(args)
Loading the train_model function...
Starting training with a model with id: custom_model_id training data file: data/train_data/flaringar_training_sample.csv
Loading data set...
Training is in progress, please wait until it is done...
Training started at: 2022-05-05 21:01:24
Finished 1/3 training..
Finished 2/3 training..
Finished 3/3 training..
Training finished at: 2022-05-05 21:01:42
Training total time: 0.3 Minute(s)

Finished training the ENS model, you may use the flareml_test.py program to make prediction.

Predicting with Your ENS Model
To predict the testing data using the model you trained above, make sure the modelid value in the args variable in the following code is set exactly as the one used in the training, for example: ‘custom_model_id’.

from flareml_test import test_model
args =  {'test_data_file': 'data/test_data/flaringar_simple_random_40.csv', 
         'algorithm': 'ENS', 
         'modelid': 'custom_model_id'}
custom_result = test_model(args)
Starting testing with a model with id: custom_model_id testing data file: data/test_data/flaringar_simple_random_40.csv
Loading data set...
Done loading data...
Formatting and mapping the flares classes..
Prediction is in progress, please wait until it is done...
Finished the prediction task..

Plotting the ENS Results
The prediction result can be plotted by passing the result variable to the function plot_custom_result as shown in the following example. The result shows the accuracy (TSS value) your model achieves for each flare class.

from flareml_utils import plot_custom_result
plot_custom_result(custom_result)
../../_images/YA_01_PredictingSolarFlareswithMachineLearning_21_0.png
Note that the output of ENS is the majority vote of the three underlying models (RF, MLP and ELM), and the accuracy of ENS is calculated based on its output.

RF Model Training and Testing#

RF Model Training with Default Data

print('Loading the train_model function...')
from flareml_train import train_model
args = {'train_data_file':'data/train_data/flaringar_training_sample.csv',
        'algorithm': 'RF',
       'modelid': 'custom_model_id'
      }
train_model(args)
Loading the train_model function...
Starting training with a model with id: custom_model_id training data file: data/train_data/flaringar_training_sample.csv
Loading data set...
Training is in progress, please wait until it is done...
Training started at: 2022-05-05 21:01:45
Training finished at: 2022-05-05 21:01:47
Training total time: 0.04 Minute(s)

Finished training the RF model, you may use the flareml_test.py program to make prediction.

Predicting with Your RF Model

from flareml_test import test_model
args =  {'test_data_file': 'data/test_data/flaringar_simple_random_40.csv', 
         'algorithm': 'RF', 
         'modelid': 'custom_model_id'}
custom_result = test_model(args)
Starting testing with a model with id: custom_model_id testing data file: data/test_data/flaringar_simple_random_40.csv
Loading data set...
Done loading data...
Formatting and mapping the flares classes..
Prediction is in progress, please wait until it is done...
Finished the prediction task..

Plotting the RF Results

from flareml_utils import plot_custom_result
plot_custom_result(custom_result)
../../_images/YA_01_PredictingSolarFlareswithMachineLearning_28_0.png

MLP Model Training and Testing#

MLP Model Training with Default Data

print('Loading the train_model function...')
from flareml_train import train_model
args = {'train_data_file':'data/train_data/flaringar_training_sample.csv',
        'algorithm': 'MLP',
       'modelid': 'custom_model_id'
      }
train_model(args)
Loading the train_model function...
Starting training with a model with id: custom_model_id training data file: data/train_data/flaringar_training_sample.csv
Loading data set...
Training is in progress, please wait until it is done...
Training started at: 2022-05-05 21:01:49
Training finished at: 2022-05-05 21:02:02
Training total time: 0.22 Minute(s)

Finished training the MLP model, you may use the flareml_test.py program to make prediction.

Predicting with Your MLP Model

from flareml_test import test_model
args =  {'test_data_file': 'data/test_data/flaringar_simple_random_40.csv', 
         'algorithm': 'MLP', 
         'modelid': 'custom_model_id'}
custom_result = test_model(args)
Starting testing with a model with id: custom_model_id testing data file: data/test_data/flaringar_simple_random_40.csv
Loading data set...
Done loading data...
Formatting and mapping the flares classes..
Prediction is in progress, please wait until it is done...
Finished the prediction task..

Plotting the MLP Results

from flareml_utils import plot_custom_result
plot_custom_result(custom_result)
../../_images/YA_01_PredictingSolarFlareswithMachineLearning_34_0.png

ELM Model Training and Testing#

ELM Model Training with Default Data

print('Loading the train_model function...')
from flareml_train import train_model
args = {'train_data_file':'data/train_data/flaringar_training_sample.csv',
        'algorithm': 'ELM',
       'modelid': 'custom_model_id'
      }
train_model(args)
Loading the train_model function...
Starting training with a model with id: custom_model_id training data file: data/train_data/flaringar_training_sample.csv
Loading data set...
Training is in progress, please wait until it is done...
Training started at: 2022-05-05 21:02:04
Training finished at: 2022-05-05 21:02:05
Training total time: 0.01 Minute(s)

Finished training the ELM model, you may use the flareml_test.py program to make prediction.

Predicting with Your ELM Model

from flareml_test import test_model
args =  {'test_data_file': 'data/test_data/flaringar_simple_random_40.csv', 
         'algorithm': 'ELM', 
         'modelid': 'custom_model_id'}
custom_result = test_model(args)
Starting testing with a model with id: custom_model_id testing data file: data/test_data/flaringar_simple_random_40.csv
Loading data set...
Done loading data...
Formatting and mapping the flares classes..
Prediction is in progress, please wait until it is done...
Finished the prediction task..

Plotting the ELM Resluts

from flareml_utils import plot_custom_result
plot_custom_result(custom_result)
../../_images/YA_01_PredictingSolarFlareswithMachineLearning_40_0.png

Timing#

Please note that the execution time in mybinder varies based on the availability of resources. The average time to run the notebook is 10-15 minutes, but it could be more.

Conclusions#

We present a machine learning-based system (FlareML) for solar flare prediction. FlareML employs three existing machine learning algorithms, namely random forests (RF), multilayer perceptrons (MLP), extreme learning machines (ELM), and an ensemble algorithm (ENS) that combines the three machine learning algorithms. Our experimental results demonstrated the good performance of the ensemble algorithm and its superiority over the three existing machine learning algorithms. In the current work we focus on data samples composed of SHARP physical parameters. We collect 845 data samples belonging to four flare classes: B, C, M, and X across 472 active regions. In addition, the Helioseismic Magnetic Imager (HMI) aboard the Solar Dynamics Observatory (SDO) produces continuous full-disk observations (solar images). In future work we plan to incorporate these HMI images into our FlareML framework and extend our previously developed deep learning techniques to directly process the images for solar flare prediction.

References#

  1. DeepSun: Machine-Learning-as-a-Service for Solar Flare Prediction
    Yasser Abduallah, Jason T. L. Wang and Haimin Wang
    https://iopscience.iop.org/article/10.1088/1674-4527/21/7/160
  2. Predicting Solar Flares Using SDO/HMI Vector Magnetic Data Products and the Random Forest Algorithm
    Chang Liu, Na Deng, Jason T. L. Wang and Haimin Wang
    https://iopscience.iop.org/article/10.3847/1538-4357/aa789b
  3. Artificial Neural Networks: An Introduction to ANN Theory and Practice
    P. J. Braspenning, F. Thuijsman, A. J. M. M. Weijters
    https://link.springer.com/book/10.1007/BFb0027019
  4. Enhanced Random Search Based Incremental Extreme Learning Machine
    Guang-Bin Huang and Lei Chen
    https://www.sciencedirect.com/science/article/abs/pii/S0925231207003633?via%3Dihub
  5. Predicting Solar Energetic Particles Using SDO/HMI Vector Magnetic Data Products and a Bidirectional LSTM Network
    Yasser Abduallah, Vania K. Jordanova, Hao Liu, Qin Li, Jason T. L. Wang and Haimin Wang
    https://iopscience.iop.org/article/10.3847/1538-4365/ac5f56