2023年10月15日日曜日

Empirical Dynamic Modeling with rEDM: (2) Near-future forecasting (Simplex projection)

20231015_Blogger0021

*This is an English version of my previous post (first translated by ChatGPT, checked and edited manually, and updated some information if necessary).

Benefits of state space reconstruction

In the previous post, we talked about a way to reconstruct the behavior of a system of interest (= the original dynamics) from one variable. This technique is called State Space Reconstruction (SSR) using time-delayed embedding. The “shape” or dynamics obtained through SSR contains useful information such as “information about variables that were not measured” and “information about the driving mechanism of the system”. By using tools of “Empirical Dynamic Modeling” (EDM), we can utilize this information and perform various analyses.

The main applications of EDM include predicting the near future (Sugihara & May 1990; Sugihara 1994; Ye & Sugihara 2016), quantifying dynamic characteristics (Hsieh et al. 2005), detecting causal relationships between variables (Sugihara et al. 2012; Osada et al. 2023), quantifying the strength of interactions between variables (Deyle et al. 2016; Chang et al. 2021), and quantifying the stability of dynamics (Ushio et al. 2018).

I will now explain the analysis using EDM with a model time series. Here, we will use the R language and rEDM package for EDM (Ye et al. 2015; Ye et al. 2018).

Before we begin, let me provide you with some resources where you can find information about EDM:

Making near-future forecasting (Simplex projection)

From here on, we will use R and rEDM to conduct analysis while explaining the methods of EDM. First, let’s talk about near-future forecasting.

Reference: There are several ways to explain Simplex projection. The explanation in this post is relatively general, but it may be easier to understand if you think about it in the original time series (where the horizontal axis is time and the vertical axis is the target variable) rather than in the state space. If you’re interested in such an explanation, please take a look at Simplex projection walkthrough after reading this post.

1. rEDM and preparation of demo data

First, we need to install rEDM and prepare time series data for the demo. Although the rEDM package includes demo data, we will create our own data to explicitly show the model this time. (The environment used was R version 3.6.1 and OS is macOS Mojave. To be updated…)

Note: Please be aware that the output format of the results is different between v0.7.3 and earlier versions and v0.7.5 and later versions. Please use v0.7.5. Also, if you use newer version (>= v1.0) of rEDM, the output format will different. I will check and update the codes if necessary.

# Install library
install.packages("rEDM")

# To install a legacy version
#install.packages("remotes")
#remotes::install_github("ha0ye/rEDM")

# Load rEDM
library(rEDM)
packageVersion("rEDM") # v1.14.3, 2023.10.13

# Prepare two-species demo data
d0 <- as.data.frame(matrix(rep(NA,3*1000),ncol=3))
colnames(d0) <- c("time","x","y")
d0[,1] <- 1:1000
d0[1,2:3] <- c(0.1, 0.1) # Initial values
for(i in 1:999){ # Sugihara et al. (2012) Fig.1
        d0[i+1,2] <- d0[i,2]*(3.8 - 3.80*d0[i,2] - 0.02*d0[i,3])
        d0[i+1,3] <- d0[i,3]*(3.5 - 0.10*d0[i,2] - 3.50*d0[i,3])
}

# Trim the first 100 time-step
d <- d0[100:1000,]

# Visualize
plot(d$time, d$x, type = "l", xlim = c(100,150),
     las = 1, xlab = "Time", ylab = "Value", col = "royalblue")
lines(d$time, d$y, col = "red3") # X = Blue, Y = Red

enter image description here

The model data used here is a chaotic dynamics generated based on a nonlinear logistic equation, where XX and YY interact with each other.

Note: Normally, in EDM, all time series are standardized (converted to have mean 0 and variance 1) before analysis. In EDM, we sometimes calculate the Euclidean distance between points during analysis. If we don’t standardize, the distance between two points will depend on the units (g, kg, etc.) and mean values of the variables, and the standardization is to prevent the bias. However, in order to facilitate the intuitive understanding when visualizing the data and the fact that the model time series used here has no errors, we will proceed with the analysis without standardizing. When analyzing your own data, I recommend that you standardize it using functions such as scale(). In this case, you can use d$x <- as.numeric(scale(d$x)); d$y <- as.numeric(scale(d$y)).

2. Near-future forecasting by simplex projection and estimating the optimal embedding dimension

What is “simplex projection”?

Simplex projection is a method for predicting the near future of a point of interest based on the behavior of neighboring points around the data point (Sugihara & May 1990). I will explain the algorithm using the above model data.

Generating delayed time series

Here, we will focus on YY. First, we will reconstruct the dynamics of YY using “time delay embedding”.

# Generating delayed time series
y_embed <- make_block(d, columns = "y", max_lag = 2)

You can use make_block() function to generate the delayed time series. It is easy to create it manually just by taking the lag, but we will use the function of rEDM here. The output of head(y_embed) is as follows.

> head(y_embed) # Check data
     y(t-0)    y(t-1)
1   0.4485858       NaN
2   0.8468057 0.4485858
3   0.3758589 0.8468057
4   0.8115287 0.3758589
5   0.4771027 0.8115287
6   0.8369673 0.4771027

Next, we will visualize what kind of dynamics can be illustrated when embedded in two dimensions.

plot(y_embed$`y(t-1)`, y_embed$`y(t-0)`, col = rgb(0,0,0,0.5),
     xlab = "Y(t-1)", ylab = "Y(t)", las = 1)

We can see something like a “trajectory.” We will apply simplex projection to this trajectory and try to predict one time-step into the future.

Identifying nearest neighbors

In simplex projection, we first set a point of interest to predict the near future within the reconstructed dynamics. For example, if we are interested in one time-step future of the point at t=240t=240, we set ttarget=240t_{target}​=240.

t_target <- 240

If we represent the delayed time series (i.e., embedding of YY) as Y(t)=[Y(t),Y(t1)]\boldsymbol{Y}(t) =[Y(t), Y(t-1)], then Y(ttaregt)\boldsymbol{Y}(t_{taregt}) will be our target time point.

> y_embed[t_target,]
       y(t-0)    y(t-1)
240   0.8293677 0.4683887

Within the reconstructed dynamics, the red point below corresponds to ttarget=240t_{target} = 240.

plot(y_embed$`y(t-1)`, y_embed$`y(t-0)`, col = rgb(0,0,0,0.5),
     xlab = "Y(t-1)", ylab = "Y(t)", las = 1)
points(y_embed[t_target,2:1], col = "black", bg = "red3", pch = 21, cex = 2)

Below is the enlarged version of the same plot.

plot(y_embed$`y(t-1)`, y_embed$`y(t-0)`, col = rgb(0,0,0,0.5),
     xlab = "Y(t-1)", ylab = "Y(t)", las = 1,
     xlim = c(y_embed[t_target,2] - 0.005, y_embed[t_target,2] + 0.005),
     ylim = c(y_embed[t_target,1] - 0.005, y_embed[t_target,1] + 0.005))
points(y_embed[t_target,2:1], col = "black", bg = "red3", pch = 21, cex = 0.9)

The nearest neighbors are calculated by computing the Euclidean distance from the target point. By default, simplex projection selects (embedding dimension + 1) nearest neighbors. Since we are embedding with E=2E=2, we will search for three nearest neighbors.

# Calculate the distance from t = 240 to other points
distance <- sqrt(rowSums((y_embed[,1:2] - c(y_embed[t_target, 1:2]))^2))
nn_index <- order(distance)[2:4] # Choose three points other than the target point
> nn_index
[1] 628 564 684

It seems that the three nearest neighbors are 628,564,684628, 564, 684. Let’s call them t1=628,t2=564,t3=684t_1 = 628, t_2 = 564, t_3 = 684. We will highlight the positions of Y(t1),Y(t2),Y(t3)\boldsymbol{Y}(t_1), \boldsymbol{Y}(t_2), \boldsymbol{Y}(t_3) with blue dots.

plot(y_embed$`y(t-1)`, y_embed$`y(t-0)`, col = rgb(0,0,0,0.5),
     xlab = "Y(t-1)", ylab = "Y(t)", las = 1,
     xlim = c(y_embed[t_target,2] - 0.005, y_embed[t_target,2] + 0.005),
     ylim = c(y_embed[t_target,1] - 0.005, y_embed[t_target,1] + 0.005))
points(y_embed[t_target,2:1], col = "black", bg = "red3", pch = 21, cex = 0.9)
points(y_embed[nn_index,2:1], col = "black", bg = "royalblue", pch = 21, cex = 0.9)

Near-future forecasting

Now, based on the behavior of Y(t1),Y(t2),Y(t3)\boldsymbol{Y}(t_1), \boldsymbol{Y}(t_2), \boldsymbol{Y}(t_3), we will perform a near-future prediction of Y(ttarget)\boldsymbol{Y}(t_{target}) . First, let’s examine the behavior of the blue dots (i.e., nearest neighbors).

# Investigating the behabior of nearest neighbors
y_embed[nn_index,]
       y(t-0)    y(t-1)
628   0.8294636  0.4680146
564   0.8288434  0.4697511
684   0.8310493  0.4683390

> # One time-step future
> y_embed[nn_index + 1,]
       y(t-0)    y(t-1)
629   0.4665281  0.8294636
565   0.4725765  0.8288434
685   0.4548780  0.8310493

y_embed[nn_index + 1,] is a one time-step future of our target point. Let’s visualize the one time-step future. They moved to bottom right.

plot(y_embed$`y(t-1)`, y_embed$`y(t-0)`, col = rgb(0,0,0,0.5),
     xlab = "Y(t-1)", ylab = "Y(t)", las = 1)
points(y_embed[t_target,2:1], col = "black", bg = "red3", pch = 21, cex = 1.2)
points(y_embed[nn_index+1,2:1], col = "black", bg = "lightblue", pch = 24, cex = 1.2)

Y(t1),Y(t2),Y(t3)\boldsymbol{Y}(t_1), \boldsymbol{Y}(t_2), \boldsymbol{Y}(t_3) have moved to the bottom right (the light-blue triangle points; Y(t1+1),Y(t2+1),Y(t3+1)\boldsymbol{Y}(t_1+1), \boldsymbol{Y}(t_2+1), \boldsymbol{Y}(t_3+1)) in the next time step. Here, if the dynamics are well reconstructed and the trajectory is delineated accurately, we can predict Y(ttarget+1)\boldsymbol{Y}(t_{target}+1) by utilizing the fact that the behavior of Y(ttarget)\boldsymbol{Y}(t_{target}) should be similar to that of Y(t1),Y(t2),Y(t3)\boldsymbol{Y}(t_1), \boldsymbol{Y}(t_2), \boldsymbol{Y}(t_3) on the reconstructed dynamics. In calculations, we predict Y(ttarget+1)\boldsymbol{Y}(t_{target}+1) by weighting the behavior of the nearest neighbor points according to the distance to the target point.

# Get the shortest distance
min_distance <- distance[nn_index][1]

# Calculate weights adjusted by the shortest distance
weights <- exp(-distance[nn_index]/min_distance)

# Calculate the total weight
total_weight <- sum(weights)

# Calculate the weighted average
pred <- (weights %*% as.matrix(y_embed[nn_index + 1,2:1])) / total_weight

The above R code predicts Y(ttarget+1)\boldsymbol{Y}(t_{target}+1) as follows. Y^(ttarget+1)\boldsymbol{\hat{Y}}(t_{target}+1) represents the predicted value of Y(ttarget+1)\boldsymbol{Y}(t_{target}+1), wtotalw_{total} ​ represents total_weight, and w1w_1, w2w_2, w3w_3 represent the weights stored in weights, respectively.


Y^(ttarget+1)={w1×Y(t1+1)+w2×Y(t2+1)+w3×Y(t3+1)}wtotal\boldsymbol{\hat{Y}}(t_{target}+1) = \frac{ \{ w_1 \times \boldsymbol{Y}(t_1+1) + w_2 \times \boldsymbol{Y}(t_2+1) + w_3 \times \boldsymbol{Y}(t_3+1)\} }{w_{total}}

The prediction result is shown below, and you can see that the prediction is very close to the observed value.

> pred[2] # Predicted value
[1] 0.4664999

> y_embed[t_target+1, 1] # Observed value
[1] 0.4675357

When visualizing the predictions, it looks like the following. The light blue triangles represent Y(t1+1),Y(t2+1),Y(t3+1)\boldsymbol{Y}(t_1+1), \boldsymbol{Y}(t_2+1), \boldsymbol{Y}(t_3+1), and the X marks show the predicted values weighted by their respective weights. We can see that the prediction is close to the actual value of Y(ttarget+1)\boldsymbol{Y}(t_{target}+1) (the red triangle).
The prediction results are as follows, and we can see that they are very close to the actual values.

plot(y_embed$`y(t-1)`, y_embed$`y(t-0)`, col = rgb(0,0,0,0.5),
     xlab = "Y(t-1)", ylab = "Y(t)", las = 1,
     xlim = c(y_embed[t_target+1,2] - 0.001, y_embed[t_target+1,2] + 0.002),
     ylim = c(y_embed[t_target+1,1] - 0.02, y_embed[t_target+1,1] + 0.01))
points(y_embed[nn_index+1,2:1], col = "black", bg = "lightblue", pch = 24, cex = 1.2)
points(y_embed[t_target+1,2:1], col = "black", bg = "red3", pch = 24, cex = 1.2)
points(pred, col = "red3", pch = 4, cex = 2)

By repeating the above calculations for all points, we can calculate the prediction accuracy using simplex projection.

Implementation with rEDM::simplex()

We went through each step of the calculations, but we can easily perform the calculations using the rEDM::simplex() function. While there are several options available for the simplex() function, basically we can execute simplex projection with the following code:

# # Calculation with simplex()
simp_res <- simplex(d$y, E = 2, stats_only = F)

The predicted value at t=240t = 240 is as follows, and we can see that it is the same as the one calculated manually. By setting stats_only = F, it outputs statistical summary information and the predicted values.

> simp_res$model_output[[1]][240,]
    time       obs      pred     pred_var
240  241 0.4675357 0.4664999 6.384118e-06

The meaning of each option is as follows:

simplex(time_series,                     # Time series to analyze (1 variable)
        lib = c(1, NROW(time_series)),   # The sections of the time series to reconstruct the attracter
        pred = lib,                      # The sections of the time series to forecast
        norm = 2,                        # Method of calculating distances for weighting
        E = 1:10,                        # Range of embedding dimensions to test
        tau = -1,                        # The time-delay offset to use for time delay embedding
        tp = 1,                          # The prediction horizon (how far ahead to forecast)
        num_neighbors = "e+1",           # The number of nearest neighbors to use
        stats_only = TRUE,               # Whether to output only statistical information about the prediction or also the actual predicted values
        exclusion_radius = NULL,         # How close time to exclude from the predicted point when searching for nearest neighbors (specify a number)
        epsilon = NULL,                  # How far away from the predicted point to exclude when searching for nearest neighbors (specify a number)
        silent = FALSE)                  # Whether to output warning messages

Next, we will execute simplex projection by changing some options. The most important option is E, which will be explained in the next section.

  1. Display only statistical information and no warnings (some output is omitted as it can be lengthy). Since stats_only = T is the default, nothing is specified in the function.
> simplex(d$y, E = 2, silent = T)
  E tau tp nn num_pred       rho         mae        rmse perc p_val
1 2   1  1  3      899 0.9995513 0.002529848 0.005832568    1     0
  1. Specify the future time points to predict as tp = 1:10, i.e., predict from 1 time point ahead to 10 time points ahead. As the future to be predicted becomes further away, the prediction accuracy tends to decrease. This is one of the characteristics of chaotic dynamics. The correlation coefficient rho between the predicted values and the actual values decreases, and the error metrics mae (Mean Square Error) and rmse (Root Mean Square Error) increase.
for (i in 1:10) {
  print(simplex(d$y, E = 2, tp = i, stats_only = T, silent = T))
}

E  tau tp nn num_pred rho       mae         rmse        perc      p_val
2  -1  1  3  900      0.9995495 0.002543257 0.00584417  0.9988889 1.00E-10
2  -1  2  3  900      0.9989813 0.00378813  0.008795181 0.9977778 1.00E-10
2  -1  3  3  900      0.9974164 0.006473631 0.01400167  0.9966667 1.00E-10
2  -1  4  3  900      0.9959234 0.008903225 0.01763394  0.9955556 1.00E-10
2  -1  5  3  900      0.9936307 0.01215886  0.02195842  0.9944444 1.00E-10
2  -1  6  3  900      0.9905653 0.0153443   0.0267139   0.9933333 1.00E-10
2  -1  7  3  900      0.9886095 0.0182205   0.029347    0.9922222 1.00E-10
2  -1  8  3  900      0.9859055 0.02112637  0.03262786  0.9911111 1.00E-10
2  -1  9  3  900      0.9833255 0.02387106  0.03546839  0.99      1.00E-10
2  -1  10 3  900      0.9816006 0.02568902  0.03725363  0.9888889 1.00E-10
  1. We use half of the data as training data to reconstruct the dynamics and use it to predict the other half. This is often used to evaluate prediction performance when the data is long enough. When the data is short, we perform Leave-One-Out Cross Validation (LOOCV) to evaluate the prediction performance by completely matching lib and pred (this is the default setting). You can refer to Wikipedia for information on cross-validation.
> simplex(d$y, E = 2,
          lib = c(1, 450),     # Reconstruct the dynamics using points at t = 1-450
          pred = c(451, 901),  # Predict one time-step ahead of points at t = 451-901
          tp = 1, stats_only = T, silent = T)
  E tau tp nn num_pred       rho         mae        rmse perc p_val
1 2   1  1  3      449 0.9991972 0.003711699 0.007815253    1     0

Another variable that often becomes important is tau. tau is a variable that determines the value of the time lag when embedding. That is, when tau = -1, we embed the dynamics using {Y(t),Y(t1),Y(t2),...}\{Y(t), Y(t-1), Y(t-2), ...\}, but when tau = -2, we embed the dynamics using {Y(t),Y(t2),Y(t4),...}\{Y(t), Y(t-2), Y(t-4), ...\}. In practice, tau = -1 is used in most cases, but there are many cases where a value other than tau = -1 should be used depending on the characteristics of the target time series data. Practically, to determine the appropriate tau, you can try several tau values and compare prediction performance, or you can decide based on the characteristics of the target time series data. In addition, there are many methods proposed for determining tau using frameworks other than EDM (for example, methods using autocorrelation functions or mutual information).

Estimating the optimal embedding dimension

Now, by using the simplex() function, we can obtain the prediction accuracy for different values of E at once. Let’s think a little about what it means to have good prediction performance with simplex projection.

If the prediction based on the behavior of the nearest points is successful, it means that the behavior of the target point is similar to that of the nearest points, and that nearby points in the state space behave similarly. Therefore, it can be said that the dynamics are “smoothly” represented in the state space (one-to-one mapping). On the other hand, if the prediction accuracy is low, it seems that even if multiple points were in nearby locations in the state space, they moved to completely different locations at the next time point, and the trajectory was messy.

“Embedding” refers to a state where the “true dynamics” (which may not be observable) and the “reconstructed dynamics” correspond to each other in a “one-to-one and smooth” manner. Therefore, the prediction accuracy with simplex projection may be used as an indicator to determine whether the state of the system has been properly reconstructed by time-delay embedding. If a smooth trajectory is drawn in the reconstructed state space, the prediction accuracy with simplex projection should be high, and it can be said that the dynamics are represented “properly”.

Now, let’s change E to various values and see how the prediction accuracy changes.

simp_res2 <- simplex(d$y, E = 1:10, silent = T)

The change in prediction accuracy with respect to E is shown below. In this case, RMSE (Root Mean Square Error) is used as an indicator of prediction accuracy, and lower values indicate better prediction accuracy.

plot(simp_res2$E, simp_res2$rmse, type = "b", xlab = "E", ylab = "RMSE", las = 1)

The RMSE is lowest at E=2E=2, indicating the highest prediction accuracy at E=2E=2. Therefore, E=2E=2 seems to be the most appropriate embedding dimension for YY. The estimated embedding dimension obtained in this way is called the optimal embedding dimension.

In the framework of EDM, it is common to estimate the optimal embedding dimension using simplex projection, but other methods have also been proposed. For example, the False Nearest Neighbour is a common method. What is the most appropriate method in estimating the optimal embedding dimension is open question, and careful trial and error is necessary.

Note: Determining the optimal embedding dimension is an important step that affects almost all analyses using EDM, and careful consideration is necessary. Many people struggle with this when conducting analyses, so here are some tips to keep in mind:

  • What is the range of E to examine?

    • The lower limit can be examined from E=1 (but please read the last point), but the upper limit varies depending on the study. Some studies set E=10, but it is better to consider factors such as “data length,” and “characteristics of the target time series.”
    • According to Munch et al. (2019), the maximum embedding dimension to be explored tends to be about the square root of the data length (in the answer to Question 5 in Munch et al. 2019). For example, for a time series with 100 data points, we can examine up to E=10 (although I have not considered how far this can be generalized).
    • If E=2, it means that information from the past two time points affects future behavior, and if E=3, it means that information from the past three time points affects future behavior. Thinking about how far back in time the behavior of the variable of interest may be affected could help determine the upper limit of E.
    • In the case of real data, values inevitably contain observation errors to some extent. As E increases, errors also accumulate in the points (vectors) in the state space, making it difficult to examine large values of E for time series with large errors.
    • Sometimes E=1 is not examined. Since embedding is defined as a “smooth and one-to-one mapping,” it is difficult to imagine a situation where a one-dimensional time series is smoothly embedded at E=1 unless it is monotonically decreasing or increasing. Therefore, it is thought that situations where embedding occurs at E=1 are unlikely. Theoretically, E=1 can be the optimal embedding dimension, but there is still no consensus on this point, so I usually examine E=1 just to check the prediction accuracy when E=1.
  • Which prediction accuracy indicator should be used?

    • rEDM outputs three types of prediction accuracy indicators: correlation coefficient (“rho”), MAE, and RMSE. Initially, correlation coefficient was often used as an indicator (e.g., Sugihara et al. 2012), but since correlation coefficient is often affected by outliers, I think that MAE or RMSE is often better. Personally, I rely on RMSE more often.

In the next post, I will explain the S-map method, which is also a method for future prediction.

References

  • Chang CW, Ushio M, Hsieh C. (2017) Empirical dynamic modeling for beginners. Ecological Research, 32, 785–796
  • Chang CW, Miki T, Ushio M, Ke PJ, Lu HP, Shiah FK, Hsieh CH* (2021) Reconstructing large interaction networks from empirical time series data. Ecology Letters 24, 2763–2774
  • Deyle ER, May RM, Munch SB, Sugihara G. (2016) Tracking and forecasting ecosystem interactions in real time. Proc. R. Soc. Lond. B 283, 20152258
  • Hsieh CH, Glaser SM, Lucas AJ, Sugihara G. (2005) Distinguishing random environmental fluctuations from ecological catastrophes for the North Pacific Ocean. Nature 435, 336–340
  • Munch, S. B., Brias, A., Sugihara, G., Rogers, T. L. (2019) Frequently asked questions about nonlinear dynamics and empirical dynamic modelling. ICES Journal of Marine Science, doi:10.1093/icesjms/fsz209
  • Osada Y, Ushio M, Kondoh M (2023) A unified framework for nonparametric causality detection. bioRxiv doi:10.1101/2023.04.20.537743
  • Sugihara, G. et al. (2012) Detecting causality in complex ecosystems. Science 338, 496–500
  • Sugihara G, May RM. (1990) Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature 344, 734–741
  • Sugihara G. (1994) Nonlinear forecasting for the classification of natural time series. Philos. Trans. R. Soc. A 348, 477–495
  • Ushio M et al. (2018). Fluctuating interaction network and time-varying stability of a natural fish community. Nature 554, 360–363.
  • Ye H, et al. (2015) Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling. PNAS 112 (13) E1569-E1576
  • Ye H, Sugihara G. (2016). Information leverage in interconnected ecosystems: Overcoming the curse of dimensionality. Science, 353, 922–925
  • Ye H, et al. (2018) rEDM: Applications of Empirical Dynamic Modeling from Time Series. doi:10.5281/zenodo.1935847

Written with StackEdit.