Forecasting Time-varying Value–at–Risk and Expected Shortfall Dependence: A Markov-switching Generalized Autoregressive Score Copula Approach

The importance of accurately forecasting extreme financial losses and their effects on the institutions involved in a given financial market has been highlighted by recent financial catastrophes. The flexibility with which econometric models can take into account the highly non-linear and asymmetric dependence in financial returns is a critical component of their capacity to forecast extreme events. Therefore, this study aims to forecast time-varying Value-at-Risk and expected shortfall dependence as a predictive density-based regime changes over time. To achieve this, a non-stationary Markov-switching generalized Autoregressive score model nested with copula is estimated using expectation–maximization (EM) algorithm. Extending this non-stationary model is quite challenging, as it requires specifications not only on how the usual parameters change over time but also those with mass distribution components. Dynamics of the estimated autoregressive score allowed the copula parameters to respond rapidly to time-varying key systemic parameters and risk. This is because regime changes are allowed to oscillated between high and low regimes. This is a clear indication of a regime shift in the parameters of an estimated model. Using the minimum score combining, six extreme value distributions are combined to the estimated MS(2)-GAS(1)-copula model and assessed the performance of each combined model 5 days and 30 days forecasting of value-at-risk and expected shortfall. The results of the forecasting performance indicated that the MS(2)- GAS(1)-GPD is the best model to model and forecast Value-at-risk and expected shortfall for the Botswana stock market. This is a promising technique for stochastic modeling of time-varying Value-at-Risk and Expected Shortfall. In addition, a foundation is provided for future researchers to conduct studies on emerging markets. These results are also important for risk managers and investors.


Introduction
Recent financial catastrophes have highlighted the necessity of precisely predicting extreme financial losses and their effects on the institution's financial stability and, more broadly, on the security of broader economy.Major financial crises, like the Global Financial Crisis (GFC) of 2007-2008, the European Sovereign Debt Crisis (ESDC) of 2010-2011, and 2019-2021 COVID-19, typically affect the entire economy, causing severe downturns and recessions.In fact, during severe crisis episodes, banks and other financial institutions frequently fail, and their failures can set off the failure of other non-financial institutions via balance sheet and liquidity channels, endangering the stability of real economy, see for instance Karaś and Szczepaniak (2017).The ability of econometric models to predict such extreme events strongly relies on their flexibility to model highly non-linear and asymmetric dependence structure of financial returns (McNeil, Frey, and Embrechts 2015).The simple linear correlation, despite being widely used, is unable to capture crucial tail behavior of a joint probability distribution.Thus, in a multivariate environment, especially in light of recent crises episodes, modeling the tail dependence and asymmetric dependence between pairs of assets has been becoming increasingly important.Going beyond the multivariate elliptical assumption for a joint distribution of asset returns is typically required when using a measure of dependence other than linear correlation.The copula methodology enables the modeling of a wide range of dependence structures in this regard (Durante and Sempi 2015).The conditional correlation between asset returns rises during times of financial instability as a result of exposure to common shocks that affect all market participants; despite the fact that the modeling challenge of the joint co-movement of stock returns was already present in Bollerslev (1988).Moreover, structural breaks are observed into dependence structure, which is more evident during crisis periods and other infrequent events.As regards dependence breaks, Markov switching (MS) models have been proven to effectively capture non-smooth evolution of volatility and correlations dynamics.Unlike Degiannakis, Dent, and Floros (2014), who used Fractionally Integrated Generalized Autoregressive Conditional Heteroscedasticity (FIGARCH) framework to forecast Value-at-Risk (VaR) and Expected shortfall (ES), a regime dependent dynamic of copula parameters from a score driven framework is utilized in this study.Parameters of copula dependence are allowed to depend on the realizations of a first order Markovian process with a specific generalized Autoregressive score (GAS) dynamic of Creal, Koopman, and Lucas (2013) and Harvey (2013) in each regime; while retaining an appropriate arbitrary specification conditional distribution dynamics.In this case, the natural volatility is allowed by substituting an appropriate process for the generalized inverse Gaussian (GIG).
The main contribution of this study is the prediction, and classification of regime shifts using MS-GAS Copula because there have been a tremendous number of recent contributions pertaining to the modeling of stochastic parameters.Creal et al. (2013) put forth a group of observation-driven time series models and updated the parameters of the model over time using a scaled likelihood function score.This authors' approach offers a streamlined and cogent framework for adding time-varying parameters to a large class of non-linear models.However, the MS-GAS approach used in this study takes real-time prediction systems into account, which increases forecasting accuracy and makes it easier to spot sudden changes, especially when dealing with economic conditions that are constantly changing over time.Most researchers do not consider this factor when modeling regime shifts and time-varying parameters.The remarkable ability of the GAS to filters and approximate complex non-linear data-generating processes in a straightforward and efficient manner is particularly useful in the case of dependency modeling using copulas discussed here.In particular, the use of a result-oriented model helps in situations where it is not clear how to update the dynamics of the parameters of an Archimedean link function.It is important to note that, estimating this path dependent model is a difficult task because an exact likelihood computation is practically impossible.Due to this challenge, estimation techniques are either simplified or did not depend on a likelihood function.In contrast to the study of Bazzi, Blasques, Koopman, and Lucas (2017), this study adopts the expectation maximization (EM) approach to estimate the model; and, the EM captures uncertainty better than maximum likelihood estimation (MLE).However, there is currently no way to obtain the maximum likelihood estimator without first altering the model.The asymptotic variance-covariance matrix and maximum likelihood estimator of the Markov-switching-GAS model are computed using a novel method that combines the Monte Carlo expectation-maximization (MCEM) algorithm and importance sampling in this study.
Another related contribution of this paper is the implementation and evaluation of timevarying VaR and ES risk measures.The appealing characteristic of the time-varying VaR and ES measures is that, it inherits the flexibility of a dynamic switching copula framework here developed.The copula approach naturally adapts to environments characterized by different kind of upper and lower tail dependence, enabling VaR and ES as an effective measure of extreme co-movements among financial variables.The literature on co-movement risk measures has proliferated during the last few years, see, Bernardi, Gayraud, and Petrella (2015), Sordo, Suárez-Llorens, and Bello (2015) and Castro and Ferrari (2014) among others who have provided an extensive and up to date survey of systemic risk measures that have been proposed.

Markov-switching generalized autoregressive score
be the past history up to time t − 1 of a non-stationary stochastic process {Y S , S > 0}.Assume also that, there exists a first-order ergodic Markov chain {S s , S > 0} defined on a discrete space Ω = {1, • • • , L} with transition probability matrix Pr {p ℓ,k }, where p ℓ,k = Pr (S t = k | S t−1 = ℓ), ∀ℓ, k ∈ Ω is the probability that state k is visited at time t given that at time t−1, in the chain, was in state ℓ and initial vector of probabilities is δ The assumption is that where, After marginalizing the latent state S t in model 1, the distribution of U t is recovered conditionally on the past observations Y 1;t−1 = y 1;t−1 .Bernardi and Catania (2019) declared that this is a mixture of component-specific predictive cumulative distributions and it is defined as with the following mixing weights where, Pr (S t = m | y 1;t ) is the filtered probability at time t for state m, which is evaluated as with h (y t | y t−1 , S t = m) being the density of y t that is conditionally on the Markov chain S t = m and h (y t | y 1;t−1 ) being the density of y t after having marginalized out S t .Exploiting the copula structure, h (y t | y 1;t−1 ) can be represented as where C (• | y t−1 , S t = ℓ) denotes the copula density that is conditional to S t = ℓ and, it analogously such that Pr (S t = m | y 1;t ) simplifies to Pr (S t = m | y 1;t ) and consequently π only on the copula density given by and It is worth noting that the copula specification is parameterized in terms of k ℓ t and Ψ ℓ for ℓ = 1, 2, 3, • • • , L. In Markov-switching generalized Autoregressive score model, k ℓ t allows an AR (herein referenced Autoregressive) where the score of the conditional distribution of the data defined in model 5 acts as a forcing variable.Firstly, λℜ n → D k is defined as an absolutely continuous deterministic vector valued function that maps ℜ n into the natural parameter space D k .A possible choice for λ is the hyperspherical coordinate transformation introduced by Catania (2021), where a similar scheme is used.The process of k ℓ t for ℓ = 1, • • • , L is defined as and it is the score of the density defined in model 5 with respect to k ∼ t that is evaluated in u t , ▽ ∼ u t ; k ℓ t Ψ ℓ , but, pre-multiplied by a positive definite scaling matrix, Γ ℓ t .Therefore, Creal et al. (2013) suggested that where I ∼ k ∼ℓ t Ψ ℓ is the information matrix of k ∼ℓ t and ξ is a scalar that is chosen in {0, 1} or {2, 1}.In this way, if ξ = 1, the score is scaled by its variance-covariance matrix.But, if ξ = 1 2 , it is then scaled by its square root; and, if ξ = 0, there is no scaling such that Γ ℓ t = I n , where I n is the identity matrix (Bernardi and Catania 2019).Note that, if the model is correctly specified, exists, then, the third component of process 2 can be written as with B being a back-shift operator such that Applying chain rule in model 10, the score and the Fisher information matrix of K ∼ℓ t are now recovered by where ▽ ∼ u t ; k ∼ℓ t , Ψ ℓ and I k ℓ t Ψ ℓ are the score and a Fisher information matrix with respect to the original vector of parameters k ∼ℓ t .These latter quantities are then defined as and denotes the Jacobian of the mapping function λ (•) that is evaluated at An interesting feature of the MS-GAS model nested with copulas, which also differentiates it from the usual MS models with a dynamic state-dependent conditional density, is that, the MS-GAS embeds a natural mechanism which suggests updating the parameters in each state of the chain according to the posterior probability of being in that state; that is, for state ℓ, Pr (S t = ℓ | y 1;t ).Indeed, the score ▽ u t ; k ℓ t , Ψ ℓ is then given by , is a score of the i th regime copula distribution.Rewriting the third portion of process 2 and fixing for simplicity of ξ = 0, then is obtained.It is now evident how the information provided by u t−1 is shared across the Markovian regimes according to the following posterior probabilities Pr ( In this way, if a state has not been visited by the circuit, the dynamic system will not automatically update the parameters until the state is revisited, that is, until the relevant information about the state is updated.Catania (2021) and Bazzi et al. (2017) made similar arguments in related contexts where the conditional distribution of the data is still a mixture of distributions.

A novel approach to estimate MS-GAS model
A two-step procedure described in the work of Maaziz and Kharfouchi (2018) is followed for estimating the parameters of a model.The first step consists of estimating parameters that include the following marginals (ϑ 1 , • • • , ϑ n ), followed by a second step where the parameters governing the evolution of the dependence the asymptotical consistency of the proposed two-step procedure of Blazsek, Escribano, and Licht (2020) for conditional copulas.The EM algorithm of Figueroa-Zúñiga, Niklitschek-Soto, Leiva, and Liu ( 2022) is modified in this study to estimate the dependence parameters subject to both the GAS dynamic and the Markovian structure.Assuming that parameters of the marginal distribution ϑ i have been consistently estimated by maximum likelihood (see, for example Asheri, Hosseini, and Araabi ( 2021)), then, the EM algorithm is presented in this study by replacing the pseudo-observation To apply an EM algorithm, the vector of observations y 1;T where T is the sample size, is regarded as being incomplete.Following the implementation described in work of Şimşek and Topaloglu (2018), two missing data structures are now introduced.Let The class membership is unknown and conveniently treated as the value taken by a latent multinomial variable with one trial and L classes, where the temporal evolution of class membership is driven by a hidden Markov chain S t for t = 1, • • • , T .This is similar to latent class approaches.Augmenting the observations y 1;T with the following latent variables Z t , ZZ t , t = 1, • • • , T , this replaces the log-likelihood function with a complete-data loglikelihood which now becomes being a vector containing parameters of the GAS dynamics for the copula dependence parameters k ℓ t and Ψ ℓ for ℓ = 1, • • • , L. According to Ahmad and Bladt (2022), the EM algorithm alternates between carrying out a desired (E) step, which develops a function for the desire of the log-likelihood evaluated using the current assessment for the parameters, and an intensification (M) step, which processes parameters by boosting the expected log-likelihood found in the E-step.The distribution of the latent variables in the accompanying E-step is then selected using these parameter-estimates.On the (m + 1) th iteration, the EM algorithm proceeds as follows: given the observed data and the current parameters.Then, estimate M-Step: chose Ξ (m+1) such that the preceding expected values are maximized with respect to Ξ to obtain The E-step requires the computation of the so called the Q-function which calculates the conditional expectation of the complete-data log-likelihood given the observations and the current estimate of the parameter vector Ξ (m) .Therefore, the following representation of the function and ∀t = 1, • • • , T denote the current smoothed probabilities of the states evaluated using the well-known forward-filtering backward-smoothing (FFBS) algorithm.The M-step in model 19 maximizes the function Q Ξ, Ξ (m) with respect to Ξ to determine the next set of parameters Ξ (m+1) .The updated estimates of the hidden Markov model (HMM); that is, the elements of the transition probability matrix of the hidden Markov chain Q are available in closed form as , for ℓ = 1, . . ., L while the parameters Ξ can be obtained as the function of the following optimization problem Since the last optimization step provides parameter updates that increase the log-likelihood function, convergence of the algorithm with the ML estimator is guaranteed.After running the direct maximum likelihood estimator of the hidden Markov model, the MS-GAS dependence and standard errors of the marginal parameters can be estimated numerically.However, this procedure does not take into account the estimation errors of the boundary parameters.Therefore, the fixed bootstrap procedure of Shang (2018) can be used as an alternative to the above procedure.In the empirical application, this study take the former approach.

Time-varying risk measures
Risk measures are statistical measures that are historical predictors of investment risk and volatility, and they are also major components in modern portfolio theory (MPT).The MPT is a standard financial and academic methodology for assessing the performance of a stock or a stock fund as compared to its benchmark index.The economic relevance of this topic follows partly from the Basel Accords (currently the Basel III Accords), which impose that banks and financial institutions have to meet capital requirements, and must rely on state-ofthe-art risk systems.In particular, they must assess uncertainty about future values of their portfolios and estimate the extent and the likelihood of potential losses using a risk measure.Nowadays, VaR and ES risk measures are the standards (Ardia, Boudt, and Catania 2018).
The estimation of the VaR and ES thus requires first to accurately estimate the conditional distribution of future portfolios' or assets' returns.Formally, assuming a continuous cumulative density function (cdf) with time-varying parameters θ t ∈ ℜ d d and additional static parameters ψ ∈ ℜ q , F (•; θ t , ψ), for the logarithmic returns at time t, r t ∈ ℜ, V aR t (α) is given by where, F −1 (•) denotes the inverse of the cdf, that is, the quantile function.It follows that V aR t (α) is nothing more than the α−quantile of the return distribution at time t. 1 The ES metric measures the expected loss after a violation of VaR level and it is defined as It follows that a crucial point for a correct VaR and ES assessment is the determination of F (•) and its parameters θ t and ψ.For a general overview of existing methods, the reader is referred to Nieto and Ruiz (2016).In this study, this is illustrated on how it can be achieved using the framework of Markov-switching Generalized Autoregressive Score model introduced by Bernardi and Catania (2019) and Blazsek, Ho, and Liu (2018).The GAS models are also referred to as score driven (SD) and dynamic conditional score (DCS) models and have been used extensively for financial risk management purposes.For a comparison between the accuracy of VaR and ES estimates obtained by the GAS approach against alternative volatility models, see Ardia et al. (2018), Gao and Zhou (2016) and Bernardi and Catania (2016) among others.

Evaluating downside risk forecasts and backtesting risk measures
The recursive prediction method is usually used to backtest the appropriateness of a statistical model, and to perform model comparisons in terms of VaR and ES predictions (Ardia et al. 2018).The purpose of a backtesting analysis is to verify the accuracy of the prediction by separating the estimation window and the evaluation period.The goal of a model comparison analysis is usually to sort the models according to a loss function.For this purpose, the full sample of returns T is split into an in-sample period of length S and an out-of-sample period of length H. First, parameters of the model are estimated over the in-sample period, then the h-step forward forecast of the distribution of returns at time S + h is generated along with the associated VaR and ES measures.These steps are repeated, recursively expanding the period in the sample with new observations until the end of the series T is reached.If previous observations are eliminated during the data amplification phase, a moving window is considered; otherwise an expanding window is considered.In this study, a standard approach is followed in the day-to-day risk management of a typical trading desk and use the floating window setup with h = 1.
Once a series of VaR predictions is available, forecasts adequacy is assecssed through backtesting procedures.VaR backtesting procedures usually check the correct coverage of the unconditional and conditional left-tail of the log-returns distribution.Correct unconditional coverage (UC) was first considered by Kupiec (1995), while correct conditional coverage (CC) by Christoffersen (1998).The main difference between UC and CC concerns the distribution that one focuses onto.For instance, UC considers correct coverage of the left-tail of the unconditional log-return distribution f (r t ) while CC deals with the conditional density f (r t | τ t−1 ).From an inferential perspective, UC looks at a ratio between the number of realized VaR violations that are observed from the data and expected number of VaR violations implied by the chosen risk level α during the forecast period; that is, αH.Nevertheless, Escanciano and Olmo (2010) showed that the use of standard unconditional and independence backtesting procedures can be misleading, because they do not take into account uncertainty associated with parameter estimation.They quantify this risk in a very general class of dynamic parametric VaR models and propose a correction of the standard backtest(s) that takes it into account.This authors showed that one of the main determinants of the corrected asymptotic variance is the forecasting scheme used to generate VaR forecasts, i.e., whether one uses recursive, rolling, or fixed parameter estimates.The backtesting methodologies described above focus only on the number of VaR and ES exceptions, and totally disregard their magnitudes.Hence, Nieto and Ruiz (2016) criticizes the statistics proposed by Christoffersen (1998) because they are two-tailed, and, as a consequence, can reject a risk model for being over conservative.As it was mentioned above, risk models can also be rejected for being over conservative because this is not desirable for financial institutions.Alternatively, the tail risk statistic is proposed and is defined as The TR statistic in model tells risk managers the size of the aggregate tail losses that a portfolio may incur over the period considered.The asymptotic distribution of the TR statistic is derived under the assumption of normal returns.The Dynamic Quantile (DQ) test by Engle and Manganelli (2004) Under the null hypothesis of correct unconditional and conditional coverage, the Wald test statistic is asymptotically a chi-square distributed with L + 2 degrees of freedom.Engle and Manganelli (2004) set L = 4 lags, which has become the standard choice.
With the Kupiec likelihood ratio test, the assumption is that the frequency follows the following binomial theorem Hence, the Kupiec LR statistic is given by where Sn n is the rate of violation observed under the null hypothesis.Reject the null hypothesis if the calculated probability value is less than the observed significance level at 1%, 5% and 10% and conclude that the model is not correct, implying that the risk computed from the risk model either both the VaR and ES is unreliable.This means it can give false signals to risk managers.
The Christoffersen test is defined as a new backtesting tool that is based on the length of time between VaR violations.The main insight is that where the VaR model is properly indicated for coverage, p, the expected duration of infringements should be a constant 1/π days.The fact that a test aims at establishing the dependence of violation, the notation η ij is used to present the number of days where the j th condition occurred given that the i th condition previously occurred.The Christoffersen test statistic can be represented by where, π ij presents a violation probability that occurs conditionally on state i at time t − 1.
Reject the null hypothesis if the calculated probability value is less than the observed 1%, 5% and 10% significance level and conclude that the estimated risk measure is not a good measure for a specified risk.
The Basel committee on banking supervision required internal banking or financial market backtesting procedures in 2007.Financial institutions need to regulate their capital requirements and this is done by backtesting in order to cover market risk due to their trading activities.The capital requirement size by Mwamba, Hammoudeh, and Gupta (2014) is computed as where Fact is the multiplication factor reported in Table 1.In other words, the required capital is equal to the multiplication factor that is multiplied by the highest value between today's 99% VaR and the mean of the last 60 days' 99% VaR.Subsequently, a good market risk model will produce four or fewer exceptions during a trading-day period.Should the number of exceptions fall into the green zone, as shown in Table1, that is an indication that the market risk model is probably good.If it falls into the yellow zone, there is uncertainty about the correctness of a market risk model.

Results and data analysis
A five-day financial time series exchange Botswana stock exchange/Domestic Company index (FTSE/BSE-DCI) for the period of January 2010 to June 2023 is fitted with the M S(k) − GAS(p) in this empirical analysis.The use of a five day(s) frequency is based on the fact that, on weekends and holidays, stock markets are closed.Therefore, Botswana stock exchange is not an exception.No trading on these days.Hence the recorded market prices are from Monday(s)-Friday(s).To avoid any exchange rates fluctuations, Makatjane and Moroke (2021) suggested that the index should be kept in its original currency; hence, the FTSE/BSE-DIC used in this study is kept in its BWP (i.e Botswana Pula) currency.As shown in Table 2, the distribution of FTSE/BSE-DCI returns is asymmetric and has fat tails.This suggests that the M S(k) − GAS(p) model follows a skewed distribution in terms of the results of skewness; while normality tests used all rejects the null hypothesis of normality concluding that the returns on FTSE/BSE-DCI is not normally distributed.By assuming asymmetry for the regime differences in this situation, the M S(k) − GAS(p) model would perform more accurately in predictions than the conventional generalized Autoregressive conditional Heteroscedasticity (GARCH) or GAS models.

Markov-switching generalized autoregressive score framework
To begin the main analysis, the M s(k) − GAS(p) is fitted to FTSE/BSE-DCI returns.It is assumed that every regime shift in the parameters is valid and that, during the sample period of the study, an model would be correct 95% to 99% of the time; otherwise, regime  2019) is applied on the basis of non-linear unit root test by Kapetanios, Shin, and Shell and a likelihood ratio test (LR) of Kasahara and Shimotsu (2018) is also utilized for testing number of regime shifts.The FTSE/BSE-DCI does not contain any non-linear unit roots, therefore, a stationary non-linear time series is achieved.This is in accordance with the results of proposed modified Wald test reported in Table 3.In addition, the null hypothesis of 1 regime shift is rejected and that of k > 2 is also rejected concluded that there is at least 2 regime shifts in BSE-DCI index.Moreover, regime dependence is for the following parameters {α, β} and {γ} to derive a Markov-Switching behavior of dependence structure.The results strengthen and provided strong evidence of regime dependence.At both extremes, there is corroboration of state dependence.Because these parameters are significant, as reported in Table 4, this is a clear indication of Markovian structure where returns on FTSE/BSE-DCI display significant switches in both the unconditional mean and persistence of the dependence parameter.δ represent a posterior unconditional volatility mean that has a 96.8% acceptance rate.This implies that the mean is clustered around the rate at which FTSE/BSE-DCI regime switches from low to high regime.This resulted in stable probabilities of 23% in regime 1 and 77% in regime 2. The location parameter in unconditional parameters, which denotes an asymmetric and stable distribution, further supports this.The volatility of a study series is therefore said to be centered on the mean.Assessing the goodness of fit (GoF), the mean, kurtosis, skewness and Jarque-Bera for normality test are performed on the residuals of a fitted model.It is worth noting that these results as reported in Table 5 confirms that residuals of a fitted MS(2)-GAS(1) are not normally distributed, confirming the preliminary results reported in Table 2. Additionally, it is discovered that these residuals follow heavy tailed distributions due to observed kurtosis that is above three and they are also negatively skewed as the reported by a measure of skewness which is -3.0895.

Forecasting performance of value-at-risk and expected shortfall using heavy-tailed distributions
Using Minimum score combining proposed by Trucíos and Taylor (2020), the following distributions: the generalized extreme value (GEV) distribution, the generalized Pareto distribution (GPD), the generalized Logistic distribution (GLD), the log-Pearson-3,a Phased Bi-Exponential, and finally Wakeby, are combined with an estimated MS(2)-GAS(1).Based on the estimates of combined models, an out-of-sample analysis evaluates the ability of regimeswitching methods coupled with these fat-tailed distributions for VaR and ES forecasting.Volatility predictions and extremes are performed for five, and thirty days steps-ahead forecasts.An out-of-sample data concern the last 5% of FTSE/BSE-DCI log-returns series, using a 168 log-returns sample for the rolling-window estimation with an iterated strategy.This method is suggested by Maciel (2021).The parameters of combined model(s) are updated at every observation for the out-of-sample data, producing the following Value-at-Risk and expected shortfall forecasts: h t+5 and h t+30 .
Table 6 to Table 8 present the combined forecasts VaR and ES estimates, number of VaR and ES violations, tail risk test, dynamic quantile test, Kupiec Likelihood ratio test, Christoffersen test and Basel III zones results for the different model combinations on the BSE index.All VaR, and ES computation are performed at the common 95%, and 99%, for short positions of trade.These results are vital in the determination of VaR and ES forecasting adequacy for different distributions that are coupled with MS-GAS copula model discussed in section 2. Additionally, it allows comparison across these distributions that are not necessarily fitted over the same part of the data.In this case, the GEV distribution and GPD are compared with other distributions.This empirical analysis is different from that of Huang, Huang, and Chinhamu (2014) who employed both short and long-term trading positions.For the GEV distribution, a block size of 5 is used and this produced weekly maxima.While the GPD is estimated at a threshold of 95% quantile locating the 5% of observations as exceedances.The negative tails are fitted using the relation It should be noted that, as one might anticipate given the leptokurtic nature of these data series, normality is rejected almost everywhere.Therefore, the normal distribution assumption frequently results in underestimates of VaR, ES and an excess of violations.The standard Student's t-distribution, which is frequently advised, is a better candidate for VaR and ES estimation, but it is still (in most cases) comparably weaker to the heavy-tailed distributions discussed here (Huang et al. 2014).
For BSE-DCI (Table 6) there is a varying pattern of VaR and ES estimates.According to TR, DQ, Kupiec and Christoffersen test(s), the best model(s) at 95% and 99% VaR level is MS-GAS-GPD; the best model at 95% and 99% ES is MS-GAS-GEV.It can then be observed that the clustering of VaR and ES violations start to occur at lesser extreme VaR and ES levels (lower p-values for the Christoffersen test).This may be simply caused by an increase in the number of observations that exceed these risk measures' estimates.Similar observations can be found in the study of Huang et al. (2014).
The values in Table 8 show the number of times in backtests, which are used in 5 days and 30 days by VaR and ES in all three zones.The test is ranked as follows (1) a green zone, if in a window of 5 and 30 trade days the number of exceptions reported is 0 to 4, (2) a yellow area, if there are 5-9 exceptions, and a red zone, if there are more than 9 exceptions.Neither of the risk measures avoids a regulatory penalty, with all of them slipping into the yellow zone, which imposes a penalty between 0.40 and 0.85 times the VaR for calculation of the market risk charge (Bee and Trapin 2018).
Interestingly, the hybrids with 5 and 30 days of data performed uniformly throughout all the risk measures.This gives an average penalty that ranges from 4.3% to 28%.Generally, all the risk measures showed that both the MS-GAS-GEV and MS-GA-GPD distributions are good estimator models for extreme losses for FTSE/BSE-DCI stock returns.At 95% and 99% confidence level, both VaR and ES from MS-GAS-GEV and MS-GAS-GPD did not produce the most worrying results.For example, MS-GAS-GEV only produced zero exceptions out of 30 observations.While MS-GAS-GPD produced zero exception out of 5 observations.
Forecasting Time-varying Value-at-Risk and Expected Shortfall Pigildin (2009) noted there is only a very small likelihood, which is less than 0.01%, that is an accurate model with a correct coverage of 99% would produce as much as 10 or more exceptions.Hence, the current study found that the risk models had estimated extreme tail losses with adequate accuracy for all FTSE/BSE-DCI.It is further worth noting that for five and thirty days penalties, both the MS-GAS-GEV and MS-GAS-GPD, produce less penalties.For the MS-GAS-GEV at five days VaR forecasting, only 8.7% penalty is observed while for ES gave 4.3%.The same interpretation can be made for thirty days and for other distributions.

Comparing the performance of the fitted models
The purpose of this section is to determine which model best mimics the data and produces fewer forecasts for both VaR and ES.This helps in assisting the maximum dispatching of Botswana financial sector.Four error metrics are used this study.These metrics are used to measure the performance of each model and results are summarized in Table ??.Some tentative conclusions are drawn from this Table .The results show that combining point forecasts from different MS(2)-GAS(1) with heavy tailed distributions improves the performance of the model in predicting and forecasting VaR and ES.In Table ??, the MS(2)-GAS(1)-GPD is the best model.The key performance indicators for this combined forecasting model are far below the MS(2)-GAS(1)-GEV.

Discussion and conclusion
Under the regulation of Basel Accords, risk managers of financial institutions need to rely on state-of-the-art methodologies for predicting and evaluating their downside risk.This becomes a complex issue when one has to incorporate regime shift because this requires a substantial amount of planning before throwing machine learning algorithms at it.Nonetheless, it is also an application of data science and machine learning for the good, which makes sure that stochastic parameters are correctly modeled and identified.However, mitigation of financial hazard and management planning plays a critical role in regime shifts with dependence on time-varying parameters in financial society and, the economic world.Due to the asymmetry and leptokurtic nature of financial time series, non-linear models had become a tool in modeling and forecasting time-varying VaR and ES.Nevertheless, the statistical procedures of this study lie in forecasting the stochastic time-varying VaR and ES of FTSE/BSE-DCI by employing a MS-GAS that is nested with copulas and heavy tailed distributions.There are few studies that aim at this approach; and to current knowledge, this is the first study to use such a model to show characterizing performance of stochastic time-varying VaR and ES under regime shifts and extreme value distributions.Performing this analysis, explicitly accounted for the presence of different Markovian regimes as well as a smooth within-regime dynamic evolution of the dependence parameters.Specifically, exploiting the recent advantages for score driven processes, the state dependent copula parameters are then allowed to be updated using the scaled score of the conditional copula distribution.This choice helped in achieving a greater level of flexibility in the context of dynamic copula models, and it also introduced extreme stochastic behavior for the class of VaR and ES models in a natural and effective way.A vibrant advantage of the estimated MS(2)-GAS(1) model is that it exploit a full likelihood information and taking a local density score step as a driving mechanism, timevarying parameters become non-stationary and form a clear signal of extreme distribution behavior which empirical properties of this study revealed the same behavior with the residuals of MS(2)-GA(1)-copula following heavy tailed distributions.
The results obtained in this study showed that the MS-GAS-GEV and MS-GAS-GPD, are the best models to forecast 5 days and 30 days VaR and ES; with MS-GAS-GPD out performing the counter part MS-GAS-GEV.Because the model gave less forecasting performance when assessed by RMSE, MAE, MAPE and MPE.Hence the conclusion that MS(2)-GAS(1)-GPD nested with copulas is the best model that mimics the FTSE/BSE-DIC data.This is a promising technique for regime shift susceptibility detection and extreme time-varying risk.
For further research, Heavy-GAS-tF model subject to regime switch ought to be utilized and estimate deeper correlation dynamics of the heavy tails of the GAS model.It would be interesting to see what sort of results that can be found if a more sophisticated machine learning procedures are used to model stochastic time-varying parameters as a regime shift process through the use of Markov chain Monte Carlo and bootstrapping of the credible confidence interval for the estimated time-varying parameters.A probabilistic description and modeling of extreme peak time-varying loads using Poison point process is another area that requires future research.This approach helps in estimating the frequency of occurrence of peak time-varying.A sensitivity analysis with respect to daily time-variation performed and the development of a two-stage stochastic integer recourse models with the objective of optimizing parameter distribution is an interesting future research direction with stock market data.This will be studied elsewhere.
Bayer and Dimitriadis (2022)rsen (1998)proposed a test on the series of VaR exceedance {d t , t = 1, • • • , s + H} where d t ≡ {r t < V aR t (α)} or d t ≡ {r t < ES t (α)}, which is usually referred to as the hitting series.Specifically, if a correct conditional coverage is achieved by the model, VaR exceedances should be independently distributed over time.For more readings on backtesting VaR and ES, the reader is referred toBayer and Dimitriadis (2022).
assesses the joint hypothesis that E [d t ] = α and the hit variables are independently distributed.The implementation of the test involves the de-meaned process Hit α t ≡ d t − α.Under correct model specification, unconditionally and conditionally Hit α t has zero mean and is serially uncorrelated.The DQ test is then the traditional Wald test of the joint nullity of all coefficients in the following linear regressions

Table 1 :
The Basel III zones and the exceptions

Table 2 :
Descriptive statistics Codes: * * * 0.001, * * 0.01, * 0.05, N classifiers would be useless.To address unit root and number of regime shift in the non-linear time series, a modified Wald test of Güriş (

Table 3 :
Modified Wald and likelihood ratio tests

Table 6 :
VaR and ES Estimates using heavy-tailed

Table 7 :
Backtesting five-step and thirty-step ahead value-at-risk and expected shortfall

Table 8 :
Basel III zones backtest results

Table 9 :
Key performance indicators for the fitted models