Goodness-offit Testing for Left-truncated Two-parameter Weibull Distributions with Known Truncation Point

The left-truncated Weibull distribution is used in life-time analysis, it has many applications ranging from financial market analysis and insurance claims to the earthquake inter-arrival times. We present a comprehensive analysis of the left-truncated Weibull distribution when the shape, scale or both parameters are unknown and they are determined from the data using the maximum likelihood estimator. We demonstrate that if both the Weibull parameters are unknown then there are sets of sample configurations, with measure greater than zero, for which the maximum likelihood equations do not possess non trivial solutions. The modified critical values of the goodness-of-fit test from the Kolmogorov-Smirnov test statistic when the parameters are unknown are obtained from a quantile analysis. We find that the critical values depend on sample size and truncation level, but not on the actual Weibull parameters. Confirming this behavior, we present a complementary analysis using the Brownian bridge approach as an asymptotic limit of the Kolmogorov-Smirnov statistics and find that both approaches are in good agreement. A power testing is performed for our Kolmogorov-Smirnov goodness-of-fit test and the issues related to the left-truncated data are discussed. We conclude the paper by showing the importance of left-truncated Weibull distribution hypothesis testing on the duration times of failed marriages in the US, worldwide terrorist attacks, waiting times between stock market orders, and time intervals of radioactive decay.


Introduction and preliminaries
The Weibull distribution with scale and shape parameters, α > 0 and β > 0 respectively, is widely used in areas such as statistics, engineering, finance, insurance and biology (e.g.Weibull (1951), Balakrishnan and Cohen (1991), Rinne (2009)), mainly in the context of life-time analysis (survival analysis in medical studies and reliability analysis in engineer-Left-Truncated Weibull Distribution ing).In practical applications, very often truncated statistical distributions must be used (see also Nadarajah and Kotz (2006)) these truncated statistical distributions arise when a random variable τ follows a known distributional model, except that a portion of the sample space cannot be observed or is removed (for example in radioactive decay phenomena a Geiger-Müller counter does not permit detection of decays events within its dead time).An independent identically distributed (i.i.d.) left-truncated data set τ = (τ 1 , • • • , τ n ) of sample size n has the property that τ L < τ i , i = 1, ..., n for a given non-negative parameter τ L , the truncation point (Kendall and Stuart (1979), pp. 551, section 32.15).The left-truncated cumulative Weibull distribution function (cdf) is given by Wingo (1989) and the left-truncated probability density function (pdf) is Putting τ L = 0 in Equation (1) and Equation ( 2), cdf and pdf of the complete Weibull distribution will be recovered, respectively.Throughout this paper we use the term complete Weibull distribution to refer to the untruncated Weibull distribution and in our investigation we assume that the truncation point τ L is known or can be set.The literature on data analysis tends to focus either on complete or censored data, with much less attention paid to truncated data, moreover truncation formally defined as in Kendall and Stuart (1979), (pp.551, section 32.15) is sometimes confused by censoring.In the literature confusingly Type I censoring is sometimes called truncation and Type II censoring is sometimes known simply as censoring, see for example Koziol and Byar (1975), Dufour and Maag (1978), Barr and Davidson (1973).We define censoring as when all of the data is used to generate the empirical CDF, but only the uncensored data is used estimated the parameters and calculate the goodness of fit statistics.In this paper we concentrate on left-truncated (as defined in the above paragraph) data only.
When dealing with a sample data obtained from observations one may wish to test the hypothesis that these data are drawn from a left-truncated Weibull distribution, even if the scale parameter α and shape parameter β are unknown.A common method for estimating the parameters of a pdf from a sample data set is maximum likelihood estimation (MLE).Note that the left-truncated Weibull pdf, Equation (2), is continuously differentiable in the argument τ and its two parameters, 0 < α, β < ∞, to any order and thus f ∈ C ∞ ((τ L , ∞) × (0, ∞) × (0, ∞)).Also f and all its derivatives with respect to τ, α, β vanish for τ → ∞, at least like exp [−(τ /α) β ] for α > 0 and any β ∈ (0, β).These regularity conditions are essential for the "well-behaviour" of MLE.
To determine how well the sampled data fits the hypothesized distribution one must measure the goodness-of-fit (gof).Studies using Kolmogorov-Smirnov gof test to determine whether the sampled data belong to an untruncated Weibull distribution began in the late 1970s by Littell, McClave, and Offen (1979), Chandra, Singapurwalla, and Stephens (1981); Parsons and Wirsching (1982).In performing the hypothesis test it is crucial to use the correct critical values.When the Weibull parameters are estimated from the sample data, the standard Kolmogorov-Smirnov test tables Smirnov (1948); Miller (1956) for the case where the parameters are known cannot be used, because the probability integral transform using the estimated parameters destroys the independence of the transformed random variables as demonstrated by David and Johnson (1948).
In the literature there are very few studies dedicated to the left-truncated Weibull distributions (LTWD) is Wingo (1989), Balakrishnan and Mitra (2012).However, the MLE-approach in the first reference is rather heuristic level whereas the second reference is more concerned with a maximisation-expectation approach to handle left-truncation and right-censoring.For theoretical investigations of the Weibull distribution the reader is referred to Agostino and Stephens (1986) and Lehmann and Casella (1998).
For the left-truncated 2-parameter Weibull distribution we shall distinguish four cases throughout this article : Case I: Both parameters, the scale parameter, α > 0, and the shape parameter, β > 0, are known a-priori.
Case II: Both parameters, the scale parameter, α > 0, and the shape parameter, β > 0, are unknown a-priori and need to be estimated from the sample data.
Case IIIa: The scale parameter, α > 0, is unknown and needs to be estimated from the sample data, but β > 0 is known.
Case IIIb: The shape parameter, β > 0, is unknown and needs to be estimated from the sample data, but α > 0 is known.
In the next section we briefly review the maximum likelihood estimation for Cases II -III and comment on the consistency, asymptotic normality and efficiency of the MLE when applied to data sampled from a left-truncated Weibull distribution.Details on these issues have been discussed in Kreer, Kizilersu, Thomas, and dos Reis (2015).In Section 3 we discuss and develop the Kolmogorov-Smirnov (KS) goodness-of-fit (gof) statistics for the left-truncated Weibull distribution to decide whether the sample data could belong to the hypothetical distribution.In Section 4 we present an asymptotic analysis exploiting the Brownian bridge character of the KS statistics following some prior work of Durbin (1973) and Stephens (1977) on untruncated distributions and give our results for the left-truncated Weibull distribution for all cases.The quantile analysis to determine the modified critical values using Monte Carlo simulations is given in Section 5, where we discuss our numerical algorithm and present our results on the left-truncated data for the four cases listed above.
All the results obtained on modified critical values are discussed and analysed in Section 6.In Section 7 we give a procedure for interpreting the results and a power study for Case I and Case II.Section 8 discusses the application of the methods discussed throughout the paper to failed US marriages, worldwide terrorist attacks, a sample of stock market data from New York stock exchange, and the radioactive α-decay of Americium-241.All the results are discussed in the concluding section.

Maximum likelihood estimation of left-truncated Weibull parameters
The maximum likelihood estimates of the left-truncated Weibull parameters differ from the complete ones because the left-truncated pdf f (•) with left-truncation point τ L > 0 has an additional multiplicative factor exp τ L α β in comparison to the complete one.In this paper, the left-truncation point τ L is assumed to be known.From Equation (2) we determine that the likelihood function for the left-truncated Weibull distribution as and consequently the logarithm of the likelihood as where L(τ |α, β, 0) is the likelihood function for the untruncated distribution.
The Weibull parameters that maximize the likelihood function, Equation (3), are the same as those that maximise the log-likelihood function, Equation (4), and are obtained by calculating the partial derivatives with respect to α and β : Rearranging Equation ( 5) we get Note that Equation ( 7) is one of two MLE equations in Case II but is the only MLE equation in Case IIIa.There always exists a solution for α in Case IIIa for a given β.Rewriting Equation ( 6) we obtain the following Equation ( 8) is the second MLE equation in Case II but the only MLE equation in Case IIIb, where α is known.Eliminating α in Equation (8) using Equation ( 7), we obtain (after some algebraic manipulation) the following equation for β (for Case II) (Wingo (1989) and arxivversion of Malevergne, Pisarenko, and Sornette (2005)) Equations ( 7) and ( 9), reduce, in the limit τ L → 0, to those given in Cohen (1965) for untruncated MLE equations.The solutions for α and β to the simultaneous Equations ( 7) and ( 9) are denoted by βn = βn (τ 1 , . . ., τ n |τ L ) and αn = αn (τ 1 , . . ., τ n |τ L ).For convenience we shall suppress the dependence on the sample τ 1 , ..., τ n and the left-truncation value τ L and simply write α and β.The existence and uniqueness of a non-trivial MLE solution is almost trivial for Case IIIa, whereas the Case II and Case IIIb are dealt with the Lemma I given in Kreer et al. (2015).To assert the existence of a non-vanishing MLE-solution, the sample data need to satisfy the following inequality If the condition given Equation ( 10) is not satisfied then the only solution to the MLE equation for β, Equation ( 9) is the trivial solution α = β = 0.This can be shown by inserting α = β1/β and taking the limit β → 0 in Equation (3).Only in this case the likelihood Equation ( 3) is positive and non vanishing 1 .
Table 1 was generated using a Monte Carlo Simulation with 10,000 steps, and without loss of generality the parameters were chosen as α = 1 and β = 1.It gives the percentage of lefttruncated Weibull distributed random samples of size n satisfing Equation (10) for which the MLE provides a non-trivial solution.Note that the various truncation points τ L were chosen in such a way, that η = (τ L /α) β yields the desired truncation probability p = 1 − exp (−η) of 10%, ..., 90% respectively.The truncated Weibull numbers were generated by Equation (30).
The consistency, asymptotic normality and efficiency of the MLE method for left-truncated Weibull distribution are discussed in Theorem 1 in Kreer et al. (2015) and the relevant proofs are provided as well.Key for the proof is the smoothness property of the left-truncated Weibull distribution.Denoting the true parameter vector by (α 0 , β 0 ), we note in particular that all the asymptotic properties follow in this case from the asymptotic normality, i.e. √ n (α n , βn ) − (α 0 , β 0 ) is asymptotically normal with vector mean zero and covariance matrix [Z((α 0 , β 0 ))] −1 being the inverse of the Fisher information matrix The elements of the Fisher information matrix, Equation (11), are calculated as where we have used the functions E 1 (s) = ∞ s dy exp (−y)/y (i.e. the exponential integral) and E 2 (s) = ∞ s dy exp (−y) log (y)/y.

Kolmogorov-Smirnov goodness-of-fit test for the left-truncated Weibull distribution
Let us test the following null hypothesis H 0 : The i.i.d.sample τ 1 , τ 2 , • • • , τ n satisfying τ L < τ i for i = 1, 2, ..., n for some positive τ L and some integer n, is drawn from a left-truncated Weibull distribution F (τ ) as given in Equation ( 1) with estimated parameters (α, β) obtained from MLE as discussed in the previous section 2 .Using the empirical distribution function F n (τ ), defined as the proportion of the values of the order statistics τ (1) , τ (2) , ..., τ (n) smaller than τ ∈ (τ L , ∞), the Kolmogorov-Smirnov (KS) test statistic is given (e.g.Kendall andStuart (1979), sect. 30.49 andShorack andWellner (2009)), Here D n is the KS distance which is compared with a critical value D cv (n, p, 0.05), that depends on the sample size n, the truncation level p (the theoretical percentage removed from the untruncated distribution ) and significance level 0.05 used throughout the paper.If the value of √ nD n is greater than some critical value D cv (n, p, 0.05) then the hypothesis that F n (τ ) and F (τ ) come from the same distribution is rejected, i.e., H 0 is the hypothesis that the set of values τ is sampled from a random distribution with a known cdf F (τ ), The critical values used in the hypothesis test, Equation ( 14), depend on whether the parameters, α, β, are known or unknown and are estimated from the data itself.The cases introduced earlier in section 1 can be grouped under two the categories for the purpose of KS statistics.

Out-sample KS statistics
If the parameters of the distribution from which the sampled data is to be tested against are known precisely, i.e. if F (τ ) in Equation ( 12) is known this referred to as an out-sample KS statistic.In this study this statistics is named as Case I where the critical values (CVs) of Kolmogorov and Smirnov are recovered.Moreover the CVs are independent of the distribution and the range of parameters.
In-sample KS statistics If the parameters of the distribution must be estimated from the sampled data to construct the theoretical cdf (F (τ ) in Equation ( 12)), then D n is referred as an in-sample KS statistic.It is well known, when the parameters are estimated from the sample and then the goodness-of-fit test is performed, that the probability integral transformation of the sample variables destroys their independence (see e.g.David and Johnson (1948)).Thus Kolmogorov's argument leading to Equation (13) for his universal critical values becomes invalid.We expect for each of our three cases to have different critical values, and in Case I we should recover Kolmorogorov's values.
Making use Equation (1) for F and of the τ i 's representation as given by Equation (30), Equation ( 13) can be written as : 2 From this point onwards we will drop the index n and use α and β.
where α and β are the estimated parameters while α 0 and β 0 are the true ones, η ≡ (τ L /α 0 ) β 0 and likewise η ≡ (τ L /α) β and y i 's are standard exponential random variates, as described in Appendix A. Equation (15) describes the modified critical values for all four cases above.
The critical values in general are a function of the sample size n only when the untruncated data set is considered.But clearly, they also depend on the truncation parameters, such as the truncation level p or truncation parameter η, when truncated data is considered.However, for two cases we find simplified relations for D n , which are independent of the truncation parameter τ L or η and also independent of the true values of α 0 and β 0 : Case I : (η = η, α = α 0 and β = β 0 ) Case IIIa : One observes that in Case IIIa when the shape parameter β 0 is known, Equation (15) simplifies to Equation ( 17) and becomes independent of truncation, τ L (or η), because of β/β 0 = 1 the η-terms cancel each other out.
Only for Case II and Case IIIb do we need to investigate the dependence of D cv (n, p, 0.05) on the parameter η (η = x L if (α 0 , β 0 ) = (1, 1)) and n in greater detail.

Brownian bridge and Donsker's theorem
As in the discussion of the MLE in section 2 it will be interesting to consider what happens to the KS test when n → ∞.The asymptotic behaviour of the KS-test has been of interest from the 1940s onwards, Durbin (1973), Stephens (1977), and Shorack and Wellner (2009), in calculating the asymptotic critical values.For a random variable τ distributed according to a theoretical Weibull distribution function F (τ |θ 0 ), one may define the difference between the theoretical (with or without estimated parameters θn = (α n , βn )) and empirical distributions as (Durbin (1973), Equation ( 2) ) where Fn (t) is the proportion of and θn is the MLE estimate for the true parameter θ 0 = (α 0 , β 0 ).Note that taking the absolute value of the supremum in Equation ( 19) would yield the KS-distance in Equation ( 13).
Viewing Equation ( 19) as a stochastic process in t ∈ [0, 1], Doob's Theorem (also known as the functional central limit theorem) asserts the convergence in distribution against a limiting stochastic process which is Gaussian with zero mean and the covariance structure of a Brownian bridge (see Shorack and Wellner (2009)).For the case where the parameters are estimated from the sample itself a modification (due to Durbin (1973)) has to be made.We may apply Theorem 2 of Durbin (1973) (here θ = (α, β)), where the limiting Gaussian process is denoted, in analogy from above, by G n (t) with mean of 0, i.e.E (G n (t)) = 0 and a covariance structure given by where Σ = Z −1 is the inverse of the Fisher information matrix Z(α, β) given in Equation ( 11), and u(•) are certain vector-valued functions given by Equation ( 21) below.Note that the supremum of this Gaussian process using only n points will converge to the asymptotic value of the KS-distance D n , as given in Equation ( 13) of the previous section, when n → ∞.This will be key in deriving the asymptotic values.We readily check that Durbin's assumptions (A2) and (A3) in Durbin (1973) are also satisfied for the truncated case with truncation point τ L > 0, so that Theorem 2 of Durbin (1973) may be applied.Stephens (1977) studies the Brownian bridge with the covariance structure given in Equation (20) for complete data, (i.e.τ L = 0).The vector-valued function u(s) in Equation ( 20) for left-truncated Weibull distributions with τ L > 0 is where s = F (τ ) = F (τ |α, β, τ L ).In the following calculations, without loss of generality, we may choose for convenience (α, β) = (1, 1).Using the covariance equations, Equation ( 20), together with Equation ( 21) and the matrix Σ as the inverse of Z, from Equation ( 11), we can now for any m ∈ N simulate a Brownian bridge with discrete values

Numerical implementation of the Brownian bridge
We perform the following procedure as described in Anderson and Stephens (1997) to calculate the critical values in the Brownian bridge approach : 2. The discrete covariance matrix C (m+1) = (C i,j ) i,j from Equation (20) now has entries and is symmetric and positive definite.6. Keep D m in a list and sort in ascending order.Take the 95% as a critical D BB m (95%).

Results: asymptotical critical values from Brownian bridge
We apply the Brownian Bridge (BB) approach to find the asymptotic critical values for the following cases and present the results in Table 2.
Case I Out-sample testing: Put Σ = 0 (because α and β are known precisely therefore Fisher information matrix is irrelevant here) and sample a pure Brownian bridge.
Case II In-sample testing for two unknown parameters (with truncation).

Case IIIa -IIIb
In-sample testing with one-parameter known (with truncation): Get a one-dimensional Fisher Information matrix from Equation ( 11) with the unknown parameter and invert this element to obtain the corresponding Σ-matrix.

The Monte-Carlo algorithm
The quantile procedure to calculate the critical values is described below.
Algorithm 1: Procedure for calculating the mean and variance of the critical values of the KS-test

Input:
The values of α and β are both set to 1 Output: The mean and standard deviation of the critical values of the KS-test for a range of sample sizes n and truncation levels p, η for p = 0 to 0.9 -STEP 0.1 do for n = 30, 50, 100, 200, 500, 1000, 10000 do for j = 1 to 100 do for k = 1 to 1000 do • Draw n random numbers u i from a uniform distribution u i ∼ U(0, 1).It follows directly from the discussion in appendix A that the left-truncated Weibull distributed random variables are τ i = τ L − log u i • Estimate α and β using MLE equations Equations ( 7) and ( 9).

Results: critical values from Monte-Carlo simulations
Our results obtained for the modified critical values using the quantile analysis (outlined in Algorithm 1) for each sample size n = (30, 50, 100, 500, 1000, 10000) and truncation parameter η are summarised in Table 3 for Case I, in Table 4 for Case II, in Table 5 for Case IIIa, and in Table 6 for Case IIIb.
Table 3: The critical values, D q cv , calculated from the quantile analysis for Case I. Table 4: The critical values, D q cv , from the quantile analysis for Case II.Table 5: The critical values, D q cv , from the quantile analysis for Case IIIa.Table 6: The critical values, D q cv , from the quantile analysis for Case IIIb.

Discussion of the results
The truncation in the analysis can be defined three equivalent ways: 1) τ L , the value below which all data is removed/absent, 2) p the percentage of data removed/absent by the truncation procedure, and 3) the generalised truncation parameter η ≡ τ L α β .These three parameters are related by the equations All of these parameters will be used throughout this paper, depending on which is the most convenient.

Estimation of the Weibull parameters
Identifying and analyzing the distribution which represents the data set is our main focus, since it is the source of the predictability.In summary, the estimation is better in Cases II and IIIb when the sample size is larger and the truncation is smaller.The single parameter estimates are far better than the double parameter estimates as expected.Case IIIa, where the shape parameter β is known, and the scale parameter α is unknown, is superior to Case IIIb with the unknown shape parameter β and known scale parameter α, since in Case IIIa the CVs are independent of truncation and the estimation of α is more precise, which is the optimum scenario.Comparisons on estimation of parameters show that the variance is reduced by 75% in α and by 50% in β between the two parameter and one parameter cases.

Critical values as a function of sample size n
Figure 2 depicts, for Cases II and IIIb, the dependence of the critical values, (given in Table 4 and Table 6) on n for a range of truncation levels, p (or truncation parameter η).For clarity the x−axis is plotted on a log scale.Both the cases show a distinctive separation between the lines for different truncation levels, indicating a dependence on the truncation level p.
On the other hand in Case I, the critical values are independent of truncation and only depend on n.As predicted by the theory, Equation ( 17), we also note that truncation has no noticeable effect on the critical values in Case IIIa as well.According to Miller's formula, Miller (1956), which was derived for the out sample, untruncated case namely Case I, the critical values are quadratic in 1/ √ n where the first term in above expression is Simirnov's asymptotic formula and calculated as 1.358 and Although Miller's formula,Equation (24), is designed to be used for only Case I, where both the parameters are known a priori, we will however use it as a guide to investigate the functional dependence of the critical values on the sample size n for all cases.This can be achieved by fitting the critical values given in Tables 3 -6 for each value of p to the function   The linear function is a better fit to the data D q cv (p|n), in the sense that there is no significant change in the adjusted r-squared goodness of fit statistic, but the standard deviation in B over all values of p is an order of magnitude better when Equation ( 26) is used instead of Equation ( 25).The results for Ã1 (p|n) and B1 (p|n) given by fitting D cv (p|n) are in very good agreement with Miller's formula, Equation ( 24).The fit results are given in Table 8

Critical values as a function of left-truncation parameter eta
To determine the relationship between the critical values and the truncation parameter η, we plot in Figure 3 the critical values given in Tables 3-6, as a function of √ η for n = (30, 100, 1000, 10000) for all cases.We have also included a plot of the Brownian Bridge  results (with error bars) for all cases, since that provides an alternative way of estimating D cv in the limit n → ∞.For out-sample data the critical values are independent of truncation and this is verified in Figure 3a.We see that there is no variation in the critical values as a function of √ η.On the other hand Figure 3b for Case II shows that the critical values initially decrease but then increase as the truncation level increases (boomerang shape), which is totally different behaviour from the out-sample case (Case I).In Figure 3c for Case IIIa the CV's do not change as η increases, similar to Case I in that for a fixed value of n the critical values are independent of the truncation.These results are consistent with the theory we outlined in Equation ( 17).Case IIIb in Figure 3d, on the other hand, shows that CV's initially slightly increase then decrease as the truncation level increases.
In summary, the CV's in Cases I and IIIa are truncation independent while in Cases II and IIIb they are not.For all cases the asymptotic critical value analysis from the Brownian Bridge confirms the same η dependence as we found in the quantile analysis.This section deals with formulating the critical values as a function of truncation parameter η.In both Case II and Case IIIb, the CV's are truncation dependent and among the many fit functions tried to describe the data we found that the quadratic ratio function fited best.Its parameters are given in Tables 9 and plotted in Figs.4c,g and 4d,h for n = 30 and n = 10000, respectively.In the figure the light shaded grey, tick band shows the error range on D q cv (η|n) values whereas the darker shaded grey area between the dashed lines is the error band on the fit values.In addition the asymptotic critical values from the Brownian Bridge analysis (squares) are shown in the figures for only n = 10, 000.
Table 9: The critical values obtained by fitting the ratio function to the data from the quantile analysis for various sample sizes, n.The fit parameters defined in Equation ( 27) are given for each n values and for Cases II and IIIb.

The modified critical values as a function of n and eta
In this section both the sample size, n, and truncation dependence, η, are combined to give one formula for the critical values as a function of n and η.
Case II and Case IIIb that both are sensitive to the truncation parameters.The critical values in Tables 4 and 6 can be fitted to the two dimensional function and the fit results are given in Table 10.

Exploring CV's for the dependence of Weibull parameter ranges
This section numerically explores the effects of the range of the Weibull parameters on the critical values as discussed in section 3.For this purpose, we consider various combinations of the scale parameter α = 1, 400, 1000, 2000 and shape parameter, β = 0.2, 0.35, 0.58, 0.8, 1.
The results are displayed in Figure 5, where the critical values plotted as a function η for sample sizes n = 30(lef t) for Case II, Case IIIb.All the curves for different parameter combinations overlap with each other to show the insensitivity to different parameter values.
In Case I and IIIa the CV's are independent of parameter, as is well known.

Comparison of the results with literature
Comparison of our CV's with those already published are shown in Tables 11-14.We can see that there is excellent agreement.All of the previous studies in the literature only considered complete (untruncated) data, whereas our study considers a range of truncations, including the untruncated case.Therefore, we can only compare the complete case results with the literature.Also, we wish to remind the reader that the Weibull distribution is a special case of the generalised extreme value distribution.Table 11: Comparison of our results with the available literature for Case I (α and β are known).To the best of our knowledge, no data is available for the CV's of left-truncated Weibull distribution.For complete data sets, (τ L = 0 = η = 0 = p = 0, our error is ±0.025).Table 13: Comparison of our results with the available literature for Case IIIa (α unknown and β known).To the best of our knowledge, no CV's of left-truncated Weibull distributions are available.For untruncated (complete) data sets, (τ L = 0 = η = 0 = p = 0, our error is ±0.020).

The eta-parameter in practical applications
If α and β are unknown then p and hence η are estimated from the sample so that η = τ L α β and p = 1 − e −η .As η is a non-linear function of α and β then η will be a biased estimate of η.As discussed in Appendix B, for a sample size n the bias in η is defined as so that an unbiased estimate of η is given in Appendix B by Equation (32) in conjunction with Equation (36).Estimated η and corrected (unbiased) η values for various sample sizes and truncation levels are given in the Tables 22 and 23 for Case II and Case IIIb respectively.Making use of these tables, we demonstrate the passing rates with and without the biascorrection of η in Tables 15 and 16 for Case II and Case IIIb, respectively.For large sample size the bias vanishes in accordance with Theorem 1 in Kreer et al. (2015).Furthermore, for small truncation parameters η the bias is of no relevance.Only for small sample sizes (n = 30, 50, 100) and truncation levels, p above 0.7 does the correction (unbiasing) formula need to be applied.

Power studies: comparison with other distributions in Case II
In order to answer the question " What is the chance that data drawn from some alternative distribution will pass the hypothesis test for a Weibull distribution?", the power test is employed.
We compare the power of our out-sample (Case I) and in-sample (Case II) tests by drawing the random numbers of our samples from alternative distributions commonly used in the literature for making goodness-of-fit comparisons.We follow Aho, Bain, and Engelhardt (1985) and consider as possible alternatives to the 2-parameter Weibull distribution, those distributions defined on the positive range.In particular, we consider the log-normal, log-Cauchy, Pareto (power law), log-double exponential, log-logistic and chi-square distributions with 1, 3 and 4 degrees of freedom (note that the chi-square distribution with 2 deegrees of freedom is the exponential and thus not in the scope here).We consider the chi-square distributions with 1, 3 and 4 degrees of freedom as academic only, as they only permit one to fit one single parameter, i.e. the degree of freedom k.As noted earlier by Aho et al. (1985), for the complete data set our test performs well for log-Cauchy, Pareto, log-double-exponential and log-logistic, namely one can rule out these distributions as candidates explaining the data set.On the other hand, we found that the power-testing does have problems ruling out χ 2 -distributions with 1, 3 and 4 degrees of freedom and log-normal distributions.The latter can be ruled out by a likelihood ratio test in the spirit of Dumonceaux and Antle (1973).The results are summarized in Table 17 for the complete and the truncated Case I and Case II. 8. Application of our modified KS test

US data on duration of ethnically mixed marriages
Data on the duration of marriages that end in divorce in the US is publicly available at (http://data.princeton.edu/wws509/datasets/#divorce).Most states in the United States require a minimum legal separation time prior to divorce, although not all do.The duration of marriages that ultimately end in a divorce in the database will therefore contain a mixture of those with a minimum duration (from 0 to 12 months).In order to determine the distribution that describes the duration of failed marriages in the US, it is therefore necessary to lefttruncate the data.
We have taken a subset of 230 divorced couples where husband and wife belong to different ethnic groups.We then analyze the duration of the marriages for a range of left-truncation values, specifically τ L = 0.25, 1, 5 and 10 years in Table 18.We observe from the data also that the smallest life time is bigger than 0.25 years.This is further evidence that the data is left-truncated.Before starting our Weibull analysis, we firstly generate a Q-Q plot for the most commonly used alternatives: Weibull, Pareto and log-normal distribution.In our case the Pareto distribution can clearly be singled out by purely looking at its curved graph in the Q-Q plot.To decide for either Weibull or log-normal is more delicate as both graphs in the Q-Q plot are more or less straight lines.Here we use a likelihood ratio test as proposed firstly by Dumonceaux and Antle (1973) for the discrimination between (un-truncated) lognormal and (un-truncated) Weibull distributions.As their table covers only sample sizes of n = 20, 30, 40, 50 we had to extend it to sample sizes n = 100, 200, 300.The likelihood ratio test gives a clear verdict in favor of the Weibull distribution.3 In the following Weibull analysis, truncation rates p are given as percentage of data which have been eliminated by the truncation procedure.From the estimated parameters α and β we got η as estimator for our critical value using Equation ( 28) and the KS distance D n is calculated from the data using Equation ( 12).Due to moderate truncation levels we do not need to un-bias the value of η.Hence, we can not reject the hypothesis, that the data come from a left-truncated Weibull distribution for a wide range of truncation levels with β = 1.25 ± 0.07 and α = 11.4 ± 0.06 years.The details of this analysis can be seen in the Table 18.

Time between major terrorist attacks with minimum 10 casualties
The worldwide probability distribution of terrorist attacks has been investigated by Clauset and Woodard (2013).We utilize the RAND-MIPT database (available at http://www.rand.org/nsrd/projects/terrorism-incidents/download.html) containing 13,274 terrorist events worldwide from 1968 to 2007.Like Clauset and Woodard (2013) we are interested in "major attacks", defined as terrorist events with at least 10 casualties.We investigate the times between these major attacks and find that a large proportion of their tail can be described as left-truncated Weibull.From the estimated parameters α and β we get η as estimator for our critical value using Equation ( 28) and the KS distance D n is calculated from the data using Equation ( 12).Results are given in Table 19.We note that the tail of the distribution can be described by a Weibull distribution with shape parameter β 0.50 whereas the short-end is described by something else and does not pass the Weibull hypothesis.Truncation of arrival time differences is the process of taking the differences between consecutive arrival times and keeping only those with differences greater than τ L = 1, 2, 5, 10 milliseconds.As we did in the previous examples, having singled out the alternatives of Pareto and log-normal distribution, we estimate the Weibull parameters and perform the hypothesis test; the results are given in Table 20.From Table 20 we see that we can not reject the hypothesis that our truncated samples come from a Weibull distribution.However when we analyse the complete (untruncated) sample we see by a similar computation that it leads to the rejection of the Weibull hypothesis as the zeroinflated data with arrival time differences below 1 millisecond prevent the MLE converging onto a solution.One millisecond truncation seems to corrupt the estimation of the Weibull parameters due to the error in time measurement of ±1 millisecond.From 2 millisecond truncation onwards one finds consistent parameter estimation.Taking the weighted means and errors from the truncated data sets with truncations of 2, 5 and 10 milliseconds we find for the parameters α = 179 ± 37 milliseconds and β = 0.53 ± 0.04.

Time intervals for radioactive decay of Americium-241
Since the pioneering work of Geiger and Rutherford (1910) the counting process of the particles arising from radioactive decay have been found to be described by a Poisson process.Due to the so-called "dead time" of the detection device, certain decay events might not be measured because the detector is still busy with "detecting" the previous event.Thus, the data set will be incomplete due to "truncation".This has given rise to certain corrections for the Poisson process.Only 60 years later it was possible to measure waiting times between radioactive decay events with acceptable accuracy using multichannel analyzers.Garfinkel and Mann (1968) did one of the first measurement using a probe of 0.2 µCi Americium-231 as a nearly pure α-source Their entire data set, comprising some 300'000 time intervals, was evaluated later by Berkson (1975) albeit under the assumption of a Poisson process and performing a χ 2 -test on the bin-ed data.Here, we want to demonstrate our analysis of a smaller sample which is displayed in Garfinkel and Mann (1968) on page 709.We use the second, third and fourth block only because the first block contains some control measurements for calibration.
Our data sample comprises 300 measurement points describing the time between subsequent α-particles.The dead time was estimated by the authors to be 2.54 T.U.(1 T.U.denotes a time unit and corresponds to the pulse frequency of 370 kHz).Our results are displayed in Table 21.We recover as expected a shape parameter β = 1 indicating that the waiting times are exponentially distributed giving rise to the Poisson process discovered in Geiger and Rutherford (1910).

Conclusion
The Weibull distributions with a shape parameter less than one is known as "heavy-tailed" because it has significant probabilities quite far from its mean.In insurance and other industries the cost of rare events due to "heavy tails" can be very high, so it is important to determine exactly how rare they actually are.This can only be done by taking the available data and testing it against hypothesized distributions.
Data obtained from real life examples are often left-truncated.To test the hypothesis that the data are sampled from a left-truncated Weibull distribution, one can perform a Kolmogorov-Smirnov goodness-of-fit test.If the shape and scale parameters are not known they must be estimated from the data itself.The commonly used maximum likelihood estimator does not always give a non-trivial solution to estimating the shape and scale parameters, especially for small sample sizes.For a small sample size there is a chance that the solution of the maximum likelihood estimate lie on the trivial boundaries where either one or both of the parameters vanish.A criterion for determining when non-vanishing solutions for the parameters exist was given in this paper.We demonstrated also that with increasing sample size non-trivial estimates exist with probability tending to one and these estimates are consistent, asymptotically normal, and efficient.Having obtained non-trivial estimates, a goodness-of-fit can be judged using a Kolmogorov-Smirnov test.If either the shape and/or scale parameters are unknown the critical values differ significantly from those when the parameters are known.If both the parameters or only the shape parameter are unknown the critical values depend on the truncation value as well the number of data.
The modified critical values presented here should be used to test if a set of data is sampled from a left-truncated Weibull distribution with a known truncation point but unknown shape and/or scale parameter.When both the parameters or only the shape parameter are unknown and the truncation level is greater than 10%, then the dependence of the critical value on the truncation level must be included, otherwise incorrect conclusions from the hypothesis tests will be drawn.We provided the modified CVs in Tables (3) -( 6) for various sample sizes and truncation ranges and also formulas Equation ( 27) and Equation ( 28) where one can calculate them for any desired p (or η) for given n and for combination of (p (or η), n), respectively.
Although the results presented here on the left truncated Weibull distribution can be applied to a wide range of applications in many disciplines we are not aware of any other comprehensive studies that discuss the effects of truncation dependence on the critical values and parameter estimation.We are in the process of applying our techniques to investigate financial, insurance, and real estate data using our tables and models for the critical values which include the dependence on truncation and sample size.
both η and ∆η are random variables whereas η is a fixed real number: By definition η ≥ 0 hence we use an un-biasing formula motivated by Equation ( 31) where the individual η is unbiased by a correction term E[∆η] subject to η ≥ 0.
Defining the parameter estimation vector (suppressing the index n) as θ = (α, β) and the true parameter vector as θ 0 = (α 0 , β 0 ), from Section 2 for large sample size n the difference √ n( θ − θ 0 ) is asymptotically normal with vector mean zero and covariance matrix Z −1 (θ 0 ), the inverse of the Fisher matrix Equation ( 11).Thus To estimate the effect of errors in η due to errors in θ we write similarly Note that in Equation ( 36) we take the expectations only over the ∆ α and ∆ β but not over the estimates α or β.Estimated η (uncorrected) values and corrected (unbias) η using Equation (36) for various sample sizes and truncation levels are given in Table 22 for Case II and in Table 23 for Case IIIb .

•
Calculate the mean D q cv (n, p) and variance σ 2 D q cv (n,p) from the 100 values.end end Figure (1) is an errorbar plot of the MLE estimates of the parameters α (Upper left, in Case II and Lower left in Case IIIa) and β (Upper right in Case II, Lower right in Case IIIb) for various sample sizes n and truncation levels p.Here the true values are taken as α 0 = 400 and β 0 = 0.58.From these plots one can see that as the sample size increases the variance in the estimation of α and β decreases in all cases.Furthermore as the truncation level increases the variance in estimation of α and β increases continuously in Case II, while in Case IIIb it increases initially then decreases.Finally, in Case IIIa the estimation of the parameter is totally insensitive to the truncation, see Figure(1c).

Figure 1 :
Figure 1: The mean value of the MLE of Weibull parameters α and/or β as a function of n and p (the percentage data removed by truncation).The error bars show one standard deviation in the estimated values of the parameters.The horizontal dashed line shows the true value of the parameters that was used to generate the data for α 0 = 400 and β 0 = 0.58.

Figure 2 :
Figure 2: Critical values as function of n for a range of truncation level p.

Figure 3 :
Figure 3: The critical values as a function of √ η for a range of n values.The circled dashed line with the error bars are the Brownian Bridge calculation.

Figure 4 :
Figure 4: Critical values obtained from the quantile analysis and their fits are plotted as a function of √ η for a sample sizes n = 30 (left), n = 10, 000 (right).

Table 1 :
Percentage of left-truncated Weibull distributed random samples for which there exists a solution to the MLE equations for Case II.

Table 2 :
The asymptotical critical values from BB approach for all cases. )

Table 7
The fit results are tabulated in Table7for Case I where the values of C(p) are quite variable and the standard deviation in C(p) is greater than the value itself.This suggests that D q cv (p|n) is better approximated by a function that is linear in 1 /

Table 8 :
The critical values obtained from the quantile analysis fitted to the linear function for a range of truncation level p for Case II, Case IIIa and Case IIIb.Ã1 (p|n), B1 (p|n) are the fit parameters defined in Equation (26).
for Case II, Case IIIa and Case IIIb, respectively.

Table 10 :
The fit parameters in Equation (28) are presented here for Cases II and IIIb.

Table 12 :
Comparison of our results with the available literature for Case II (α and β are unknown).To the best of our knowledge, no CV's of left-truncated Weibull distributions are available.For complete data sets, (τ L = 0 = η = 0 = p = 0, our error is ±0.015).

Table 14 :
Comparison of our results with the available literature for Case IIIb (α known and β unknown).To the best of our knowledge, no CV's of left-truncated Weibull distributions are available.For untruncated (complete) data sets, (τ L = 0 = η = 0 = p = 0, our error is ±0.025).

Table 16 :
Percentage pass rates in KS-test with and without η-correction for 10000 simulations in Case IIIb (error is less than ±0.5%).

Table 15 :
Percentage pass rates in KS-test with and without η-correction for 10000 simulations in Case II (error is less than ±0.5%).

Table 17 :
Summary of in-sample KS-test, truncation rate p = 0, 0.5 for Case I and Case II, number of simulations, N = 1000.

Table 18 :
Duration of ethnically mixed marriages ending in divorce in the US.y indicates year as a unit.

Table 19 :
Time between major terrorist attacks with minimum 10 casualties.d indicates day as a unit.
21 seconds of data.The resolution of the arrival times is milliseconds.

Table 21 :
Time intervals for radioactive decay of Americium-241.T.U.indicates time unit.