Comparing two block estimation procedures for the extremal index: An application

. When extending the analysis of the limiting behaviour of the extreme values from independent and identically distributed sequences to stationary sequences a key parameter appears, the extremal index θ , whose accurate estimation is not easy and is not completely solved. Here we focus on the estimation of θ using blocks estimators, that can be constructed by using disjoint or sliding blocks. Both blocks construction require the choice of a threshold and a block length. The main objective of this work is to revisit another block estimation procedure that only depends on the block length, although some conditions on the underlying process need to be veriﬁed. An application will be presented for illustrating the proposed procedure.


INTRODUCTION
In many real situations a pronounced temporal clustering of the extreme values can be seen, indicating the presence of local dependence in the extremes. This motivate a search for reliable tools to describe these features because quantifying the nature of the dependence structure as well as the duration of extreme events becomes an essential part of the understanding of these time series data. The extremal index (EI) is the main parameter that describes and quantifies the clustering characteristics of the extreme values in many stationary time series. Its formal definition is given next.
Definition 1 ( [1]) Suppose that {X n } n≥1 is a strictly stationary sequence of random variables with marginal distribution function (d.f.) F. This sequence is said to have an EI θ ∈ [0, 1] if, for each τ > 0, there exists a sequence of levels u n ≡ u n (τ), such that as n → ∞ where M 1,n = max{X 1 , . . . , X n }. An informal interpretation of θ is given in [1], namely θ being approximately the reciprocal of the mean cluster size. The extremal index takes values in the interval [0, 1]. . A value close to 0 indicates a very strong short range extremal dependence, while a value close to 1 a rather weak dependence. The case θ = 0 appears in pathological situations. For almost all cases of interest we have θ > 0, the situation here considered.
Dependence in stationary sequences can take different forms, and it is impossible to develop a general characterization of the behavior of extremes unless some constraints are imposed. It is usual to assume a condition that limits the extend of long-range dependence at extreme levels, so that the events X i > u and X j > u are approximately independent, provided that u is high enough, and time points i and j have a large separation. This condition is denominated D(u n ) condition, see [1]. This paper will be useful for the analysis of blocks estimation procedures for θ. To compare the two blocks estimation procedures, we first apply both to a stationary model. Conditions under which the second procedure holds are also verified. An application to a daily mean flow discharge rate time series and some comments are also presented.

CLASSICAL BLOCKS ESTIMATOR versus ANOTHER APPROACH
The concept of extremal index given by interpreting θ −1 as the limiting mean cluster size of the exceedances yields the blocks method. This method consists of partitioning the n observations into consecutive k n = [n/r n ] contiguous blocks of a certain length, r n = o(n). In each block, the number of exceedances over a certain high threshold u n are counted, and the blocks estimator is then defined as the reciprocal of the average number of exceedances per block among blocks with at least one exceedance. The blocks estimator, in [2], is given by Blocks estimators can be constructed considering continuous blocks or sliding blocks. However, for both procedures the blocks estimator requires the choice of a threshold, u n , and a block size, r n . But, the behaviour of the estimates depend strongly of r n and u n . Some recent works trying to deal with that situation can be mentioned, such as [3,4,5,6]. Let us consider a Max-Autoregressive Process model ( [7]) to illustrate how the estimates depend on r n and u n . Let {Y n } n≥1 be a sequence of independent, standard Gumbel distributed random variables. For fixed α define The EI of this process is given by θ = 1−exp(−α), see [7]. Given the sample (X 1 , . . . , X n ) and the associated ascending order statistics, X 1:n ≤ . . . ≤ X n:n , we shall consider the level u n , in (2) substituted by the stochastic one, X n−k:n .
. It seems difficult to decide what r n should be chosen, there is some block size for which the path estimates do not cross the true value of the parameter. On the other hand the region of extremal index estimates that shows some stability around the true value of the parameter depends on r n and even for a given r n , it is not obvious how to choose the threshold appropriately.
In this section, we introduce another method, see [8], for estimating θ that not depends on threshold choice because the threshold is defined inside each block. The validity of this estimator can be extended to dependent processes satisfying the long-range approximate independence condition of [9], called D(u n ) condition, and the D 2 (u n ) condition of [10].

Definition 3 ([10]) Let
{X n } n≥1 be a stationary sequence of random variables. D 2 (u n ) condition is said to be satisfied if nP X j > u n , X j+1 ≤ u n , M j+2,r n > u n →0 as n → ∞, with u n verifying the D(u n ) condition and a sequence r n of block sizes such that n/r n → ∞ and r n = o(n).

320004-2
The proposed estimator was defined in the following way: let k n denote the number of blocks, and r n the respective block size. Let v ni be a sequence of levels such that r n P {X 1 ≤ v ni < X 2 } →1 as n → ∞. Denoting N i (r n , v ni ) as the number of up-crossing of v ni in ith block, the estimator is defined by The estimator in (6) depends on the validity of D (2) (u n ) condition, that can be checked by calculating the proportion of the anti-D (2) (u n ) events {X j+1 ≤ u n , M j+2,r n > u n |X j > u n } among the exceedances for a range of thresholds and block sizes, given u n and r n . By [11], the proportion of the anti-D (2) for the observed sequence {X 1 , . . . , X n }.
Under the validity of the D (2) (u n ) condition, it seems reasonable to substitute v ni , in each block, by adequate levels such that the number of up-crossings is equal to 1, but low enough to identify exceedances. More precisely, we can define, V ni = inf {u : N i (r n , u) = 1} with i = 1, . . . , k n .
For Max-Autoregressive Process with several values of θ (θ = 0.1813, 0.5034, 0.9093), a sample of size n = 1000 was generated and the estimator in (6) was applied. Figure   As we can see, the procedure presents very good results, a large stability region, very close to the true value of the parameter θ. The observed proportions of p(u n , r n ) depend on the θ value, showing higher values for high values of θ and small values for small values of θ.
In [12] we applied a path stability algorithm, see [13] and [14] now adapted to the choice of r n and to obtain a θ estimate who conducted to quite nice results for extreme value parameters estimation.

CASE STUDY
We conclude with an application of the blocks estimator and the other blocks procedure aforementioned to a time series of daily mean flow discharge rate (m 3 /s) from 1 October, 1946 to 30 April, 2012 ("SNIRH: Sistema Nacional de Informação dos Recursos Hídricos"). The stationarity of the data can be assumed from November until April (n = 11947). We shall estimate the extremal index. In Figure 3, we depict estimates of the extremal index as a function of the block length parameter, ranging from r n = 120 to r n = 1500. We also checked the proportion of the anti-D (2) (u n ) events in our data and we verify very low proportion of anti-D (2)

A FEW COMMENTS
The estimation of the extremal index governs the clustering of the extremes of a univariate observational series. This work apply block estimation procedures to estimate this parameter, one of which may not depend of threshold choice. The comparison of the two above procedures needs some more research, mainly regarding to the D 2 (u n ) condition.