M2 Collective Risk Modelling

Introduction: Models for aggregate losses #

A portfolio of contracts or a contract will potentially experience a sequence of losses: Y1,Y2,Y3, We are interested in the aggregate sum S of these losses over a certain period of time.

  • How many losses will occur?
    • if deterministic (n) individual risk model
    • if random (N) collective risk model
  • How do they relate to each other?
    • usual assumption: iid
  • When do these losses occur?
    • usual assumption: no time value of money
      short term models
  • How big are these losses?

The Individual Risk Model #

Definition #

The Individual Risk Model #

In the Individual Risk Model S=Y1++Yn=i=1nYi, where Yi, i=1,2,...,n, are iid claims. There are several methods to get probabilities about S:

  • get the whole distribution of S (if possible)
    • Convolutions
    • Generating functions
  • () approximate with the help of the moments of S (Module 4)

Convolutions of random variables #

In probability, the operation of determining the distribution of the sum of two random variables is called a convolution. It is denoted by FX+Y=FXFY. The result can then be convolved with the distribution of another random variable. For instance, FX+Y+Z=FZFX+Y. This can be done for both discrete and continuous random variables. It is also possible for mixed rv’s, but it is more complicated.

Formulas #

In short

  • Discrete case:
    • df: FX+Y(s)=xFY(sx)fX(x)
    • pmf: fX+Y(s)=xfY(sx)fX(x)
  • Continuous case:
    • cdf: FX+Y(s)=sFY(sx)fX(x)dx
    • pdf: fX+Y(s)=sfY(sx)fX(x)dx

Examples:

  • discrete case: Bowers et al. (1997) Example 2.3.1 on page 35
  • continuous case: Bowers et al. (1997) Example 2.3.2 on page 36

Numerical example #

Consider 3 discrete r.v.’s with probability mass functions

f1(y)=14,12,14 for y=0,1,2f2(y)=12,12 for y=0,2f3(y)=14,12,14 for y=0,2,4

Calculate the pmf f1+2+3 and the df F1+2+3 of the sum of the three random variables.

Solution #

y f1(y) f2(y) f1+2(y) f3(y) f1+2+3(y) F1+2+3(y)
0 1/4 1/2 1/8 1/4 1/32 1/32
1 1/2 0 2/8 0 2/32 3/32
2 1/4 1/2 2/8 1/2 4/32 7/32
3 0 0 2/8 0 6/32 13/32
4 0 0 1/8 1/4 6/32 19/32
5 0 0 0 0 6/32 25/32
6 0 0 0 0 4/32 29/32
7 0 0 0 0 2/32 31/32
8 0 0 0 0 1/32 32/32

f1+2(2)=1/41/2+1/20+1/41/2f1+2+3(4)=1/81/4+2/80+2/81/2+2/80+1/81/4

Using generating functions #

There is a 1-1 relation between a distribution and its mgf or pgf.

Because MS(t)=E[etS]=E[et(Y1++Yn)]=E[etY1etYn] and if losses are independent then we have MS(t)=E[etS]=E[etY1]E[etYn]=MY1(t)MYn(t). The same argument holds for the pgf’s.

  • Sometimes, MS(t) or pS(t) can be recognised: this is the case for infinitely divisible distributions (Normal, Poisson, Inverse Gaussian, ) and certain other distributions (Binomial, Negative binomial).
  • Otherwise, MS(t) or pS(t) can be expanded numerically to get moments and/or probabilities.

Example #

Consider a portfolio of 10 contracts. The losses Yi’s for these contracts are iid rv’s with mean 100 and variance 100. Determine the distribution, the expected value and the variance of S if these losses are

  1. Normal;
  2. Gamma;
  3. Poisson.

Using R #

  • Contrary to Excel, convolutions are extremely easy to implement in R using vectors.
f1 <- c(1/4, 1/2, 1/4, 0, 0)
f2 <- c(1/2, 0, 1/2, 0, 0)
f12 <- c(f1[1] * f2[1], sum(f1[1:2] * f2[2:1]), sum(f1[1:3] *
  f2[3:1]), sum(f1[1:4] * f2[4:1]), sum(f1[1:5] * f2[5:1]))
f12
## [1] 0.125 0.250 0.250 0.250 0.125
  • The example above is generalised in Exercise los9R.
  • A more advanced R function is convolve. It actually involves the Fast Fourier Transform (a method that is related to that of the mgf’s) for efficiency. We do not discuss this here, but it is used in the implementation of convolutions in the function aggregateDist of the package actuar (introduced later).

The Collective Risk Model (Compound distributions, MW 2.1) #

Definition #

Introduction #

Two models, depending on the assumption on the number of losses:

  • deterministic - n
    • main focus on the claims of individual policies (whose number is a priori known)
    • Individual Risk Model
    • discussed in previous sections
  • random - N
    • main focus on claims of a whole portfolio (whose number is a priori unknown)
    • Collective Risk Model
    • this is another way of separating frequency and severity

In this section we focus on the Collective Risk Model.

Definition #

In the Collective Risk Model, aggregate losses become S=Y1++YN=i=1NYi. This is a random sum. We make the following assumptions:

  • N is the number of claims
  • Yi is the amount of the ith claim
  • the Yi’s are iid with
    • (c)df G(y)
    • p(d/m)f g(y)
  • the Yi’s and N are mutually independent

Moments of S #

We have E[S]=E[E[S|N]]=E[NE[Y]]=E[N]E[Y], and

Var(S)=E[Var(S|N)]+Var(E[S|N])=E[NVar(Y)]+Var(E[Y]N)=E[N]Var(Y)+E[Y]2Var(N)=E[N](E[Y2]E[Y]2)+E[Y]2Var(N)=E[N]E[Y2]+E[Y]2(Var(N)E[N]).

Moment generating function of S #

It is possible to get MS(t) as a function of MY(t) and MN(t):

MS(t)=E[etS]=E[E[et(Y1+Y2++YN)|N]]=E[MY(t)N]=E[eNlnMY(t)]=MN(lnMY(t))

Example (Bowers et al. (1997), 12.2.1) #

Assume that N is geometric with probability of success p: Pr[N=n]=pqn,n=0,1,, where 0<q<1 and p=1q. We have then MN(t)=E[etN]=n=0pqnetn=p1qet, and thus MS(t)=MN(lnMY(t))=p1qelnMY(t)=p1qMY(t).

Distribution of S #

It is possible to get a fairly general expression for the df of S by conditioning on the number of claims:

FS(x)=n=0Pr[Sx|N=n]Pr[N=n]=n=0Gn(x)Pr[N=n],(1)

where Gn(y) is the n-th convolution of G.

Note that

  • N will always be discrete, so this works for any type of rv Y. (continuous, discrete or mixed)
  • However, the type of S will depend on the type of Y.

Distribution of S if X is continuous #

If X is continuous, S will generally be mixed:

  • with a mass at 0 because of Pr[N=0] (if positive)
  • continuous elsewhere, but with a density integrating to 1Pr[N=0]

Example, continued (Bowers et al. (1997), 12.2.3) #

Assume now that G(y)=1ey and hence MY(t)=11t for t<1. Now, we have that (remember Pr[N=0]=p) MS(t)=p1qMY(t). It follows that MS(t)=p1q11t=p+qppt=pE[et0]+(1p)E[etZ], where Z is an exponential rv with parameter p. Therefore, fS(s)={p=Pr[N=0] (probability mass)s=0;(1p)(peps) (probability density)s>0.

Distribution of S if Y is mixed #

If Y is mixed, S will generally be mixed:

  • with a mass at 0 because of Pr[N=0] and Pr[Y=0] (if positive)
  • mixed (if Y is not continuous for x>0) or continuous elsewhere
  • with a density integrating to something 1Pr[N=0]

Distribution of S if Y is discrete #

For discrete Y’s we can get a similar expression to for the pmf of S:

fS(s)=n=0Pr[S=s|N=n]Pr[N=n]=n=0gn(s)Pr[N=n],(2)

where g0(0)=1 (and thus 0 anywhere else).

  • This can be implemented in a table and/or in a program.
  • However, if the range of N goes really to infinity, calculating fS(s) may require an infinity of convolutions of Y.
  • This formula is more efficient if the number of possible outcomes for N is small.
  • The pmf gn(s) can be calculated using de Pril’s algorithm.
    (see Module 4)

Example with tabular approach #

From Bowers et al. (1997), 12.2.2:

  • The convolutions are in done the usual way.
  • The number of columns depends on the range of N.
  • The fS(x) are the sumproduct of the row x and row Pr[N=n]:

fS(3)=00.1+0.10.3+0.40.4+0.1250.2.

Using R #

We will make extensive use of the function aggregateDist from the package actuar (Dutang, Goulet, and Pigeon 2008):

  • This function allows for several different aggregate distribution approaches, which will be introduced here (and in Module 4 as the associated theory is presented).
  • Here, we show how the function can be used to implement formulas (1) and (2) (using the function convolve in the background). This corresponds to the method="convolution" approach.

actuar::aggregateDist(method="convolution"):

  • A discrete distribution for Y is required. Note that discretisation methods are discussed in Module 4. This is input as a vector of claim amount probability masses after the argument model.sev=. The first element must be Pr[Y=0].
  • There is no restriction on the shape of the frequency distribution, but it must have a finite range. This is input as a vector of claim number probability masses after the argument model.freq=. The first element must be Pr[N=0].
  • The outcome of the function is (1). Additional outputs:
    • plot: to get a pretty plot of the df
    • summary: to get summary statistics
    • mean: to get the mean
    • diff: to get the pmf
  • Additional options are:
    • x.scale: currency units per unit of sev in the severity model (this allows calculations on multiples of $1)

# Bowers 12.2.2
fy <- c(0, 0.5, 0.4, 0.1)
fn <- c(0.1, 0.3, 0.4, 0.2)
Fs <- aggregateDist("convolution", model.freq = fn, model.sev = fy)
mean(Fs)
## [1] 2.72
pmf <- c(Fs(0), diff(Fs(0:9)))
cbind(s = c(0:9), fs = pmf, Fs = Fs(0:9))
##       s     fs     Fs
##  [1,] 0 0.1000 0.1000
##  [2,] 1 0.1500 0.2500
##  [3,] 2 0.2200 0.4700
##  [4,] 3 0.2150 0.6850
##  [5,] 4 0.1640 0.8490
##  [6,] 5 0.0950 0.9440
##  [7,] 6 0.0408 0.9848
##  [8,] 7 0.0126 0.9974
##  [9,] 8 0.0024 0.9998
## [10,] 9 0.0002 1.0000

summary(Fs)
## Aggregate Claim Amount Empirical CDF:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.00    3.00    2.72    4.00    9.00
plot(Fs)

Explicit claims count distributions (MW 2.2) #

Introduction #

Exposure #

  • It makes no sense to talk about frequency in an insurance portfolio without considering exposure. Chapter 4 of Werner and Modlin (2010) defines exposure as “the basic unit that measures a policy’s exposure to loss”.
  • One primary criterion for choosing an exposure base is that it “should be directly proportional to expected loss”. Here we are focussing on frequency, so exposure should be something directly proportional to the expected frequency.
  • Wuthrich (2023) calls exposure “volume”, denoted v, and defines the claims frequency as Nv.

Basic models for claims frequency #

  • In our case, we will assume that it directly affects the likelihood of a claim to occur - the frequency - such that N/v is normalised
  • MW defines pk=Pr[N=k],for kAN0, where A us the set of possible frequency outcomes.
  • There are three main assumptions for pk:
    • binomial (with variance less than mean)
    • Poisson (with variance equal to the mean)
    • negative-binomial (a Poisson with random mean, so that variance is more than the mean)
  • A summary table of those distributions is also given in Bowers et al. (1997), see Table 12.3.1 on page 376.
  • These all belong to a class of distributions called (a,b)

Binomial distribution #

  • fixed volume vN
  • fixed default probability p(0,1) (expected claims frequency)
  • pmf of NBinom(v,p) is pk=Pr[N=k]=(vk)pk(1p)vk,for all k{0,,v}=A.
  • same as a sum of Bernoulli (which is the case v=1)
  • makes sense for homogenous portfolio with unique possible events, such as credit defaults, or deaths in a life insurance model
  • In R: dbinom, pbinom, qbinom, rbinom, where size is v, and where prob is p
  • Note that (vk) can be computed with the R function choose.

Compound binomial model #

The total claim amount S has a compound binomial distribution SCompBinom(v,p,G) if S has a compound distribution with NBinom(v,p) for given vN and p(0,1) and individual claim size distribution G.

Corollary 2.7: Assume S1,,Sn are independent with SjCompBinom(vj,p,G) for all j=1,,n. The aggregated claim has a compound binomial distribution with S=j=1nSjCompBinom(j=1nvj,p,G).


Exercise NLI3 considers the decomposition of S into small and large claims. It shows that Slc—the sum of those claims exceeding a certain threshold M only (see notation in Example 2.16 later in those slides)—is compound binomial again.

Poisson distribution #

  • fixed volume v>0

  • expected claims frequency λ>0

  • pmf of NPoi(λv) is pk=Pr[N=k]=eλv(λv)kk!for all kA=N0.

  • Lemma 2.9: increase volume while keeping E[N] fixed in a binomial model leads to a Poisson distribution (more so for small p compared to v).

  • In R: dpois, ppois, qpois, rpois, where lambda is λv

Compound Poisson model #

The total claim amount S has a compound Poisson distribution SCompPoi(λv,G) if S has a compound distribution with NPoi(λv) for given λ,v>0 and individual claim size distribution G.

  • The compound Poisson distribution has nice properties such as:
    • The aggregation property
    • The disjoint decomposition property
  • These are reviewed in the next section, along with related new techniques for computing the distribution of S.

Mixed Poisson distribution #

Inhomogeneous portfolio #

  • So far we have seen distributions with variance less (binomial) or exactly equal (Poisson) to the mean.
  • In reality, actuarial data is often overdispersed, that is, variance is larger than mean.
  • This could be due to frequency or severity, but it makes sense that some of this extra variability would come from frequency.
  • If we believe in a Poisson frequency for known frequency parameter, then additional uncertainty such as heterogeneity of risks in a portfolio, uncertain conditions (weather, for instance) could be modelled with a random Poisson parameter, and could explain the extra variability.
  • This is the idea of a mixed Poisson.

The mixed Poisson distribution #

  • Assume random ΛH with H(0)=0, E[Λ]=λ, and Var(Λ)>0.
  • Conditionally, given Λ, NPoi(Λv) for fixed volume v>0.

We have then

Pr[N=n]=0Pr[N=n|Λ=λ]dH(λ)=0eλv(λv)nn!dH(λ);E[N]=E[E[N|Λ]]=E[Λ]v=λv;Var(N)=E[Var(N|Λ)]+Var(E[N|Λ])=λv+v2Var(Λ)>λv;MN(t)=E[etN]=E[E[etN|Λ]]=E[eΛv(et1)]=MΛ(v[et1]).

Example #

If Λinverse Gaussian(α,β) (Example 12.3.2):

  • N is Poisson Inverse Gaussian.
  • This distribution is the pig distribution in actuar, so that you can use dpig, ppig, etc…); see Section 5 of the vignette “distribution” of actuar.
  • S will be compound inverse Gaussian.

Another example, which is very famous, is ΛΓ, which leads to the negative-binomial distribution.

Negative-binomial distribution #

Assume λ is the mean, and will be “spread” according to a gamma distribution:

  • Define Λ=λΘ.
  • Now, ΘΓ(γ,γ) such that E[Θ]=1andVar(Θ)=1γ and E[Λ]=λandVar(Λ)=λ2γ.
  • If conditionally, given Θ, NPoi(Θλv), then NNegBin(λv,γ) with volume v>0, expected claims frequency λ>0,
    and dispersion parameter γ>0.

Proof:

MN(t)=E[etN]=E[E[etN|Θ]]=E[eΘλv(et1)]=(γγλv(et1))γ=(γγ+λvλvet)γ=(γλv+γ1λvλv+γet)γ,

which can be recognised as a negative-binomial with probability of “failure” p=λvλv+γ (if we count failures until the γ-th success) so that pk=Pr[N=k]=(k+γ1k)pk(1p)γ In R, use dnbinom, pnbinom, qnbinom, rnbinom, where size is γ and prob is probability of success 1p (note volume is hidden in p
and will affect the scale of the distribution).

Interpretation #

  • Θ reflects the uncertainty about the `true’ parameter of the Poisson distribution.
    Alternatively, it describes the distributions of “$\lambda$’s” in the population.
  • In the end we have

E[N]=λv,Var(N)=λv(1+λvγ)>λv,Vco(Nv)=(λv)1+γ1.

  • This additional uncertainty is not diversifiable
    (remains even for large v): Vco(Nv)=(λv)1+γ1γ1/2>0 for v.

Compound negative-binomial model #

The total claim amount S has a compound negative-binomial distribution SCompNB(λv,γ,G) if S has a compound distribution with NNegBin(λv,γ) for given λ,v,γ>0 and individual claim size distribution G.

Additional properties and applications of Poisson frequencies #

Theorem 2.12: Aggregation property #

Assume S1,,Sn are independent with SjCompPoi(λjvj,Gj) for all j=1,,n. Aggregated claims have a compound Poisson distribution S=j=1nSjCompPoi(λv,G), with v=j=1nvj,λ=j=1nvjvλj,G=j=1nλjvjλvGj. So what?

  • Independent n portfolios of losses can be easily aggregated.
  • Alternatively (or in addition), total claims paid over n years are compound Poisson, even if the severity and frequency of losses vary across years.
  • “Bottom-up” modelling
  • In Bowers et al. (1997), this is Theorem 12.4.1.

Example 12.4.1 of Bowers et al. (1997) #

Suppose that N1,N2,,Nm are independent random variables. Further, suppose that Ni follows Poisson($\lambda_i$). Let y1,y2,,ym be deterministic numbers. What is the distribution of y1N1++ymNm?

Theorem 2.14: Disjoint decomposition property #

Preliminary 1: Add LoBs in the CompPoi formulation #

Let us introduce Lines of Business (“LoB”) in the notation:

  • Let the set {1,,m} be a partition of the portfolio, or different lines of business (“LoB” thereafter). For instance, we could have j{1,2,3} for car (j=1), building (j=2) and liability (j=3) LoBs.
  • Let (pj+)j=1,,m be a discrete probability distribution on the finite set of sub-portfolios/LoBs {1,,m} (thereafter just “LoB”).
  • We assume pj+>0 for all j, that is, the probability of having claims in any of the m LoBs is strictly positive.
  • We further assume that Gj is the claim size distribution of LoB j, with Gj(0)=0.

  • Finally, we define the mixture distribution by G(y)=j=1mpj+Gj(y) for yR. This is the distribution of a claim, if we don’t know which LoB it comes from.
  • Note that this matches the formulation in the aggregation property Theorem 2.12 with pj+=λjvjλv.
  • Now, define a discrete random variable I which indicates which sub-portfolio/LoB a randomly selected claim Y belongs to: Pr[I=j]=pj+ for all j{1,,m}.

We are now ready to define the following extended compound Poisson model:

  • The total claims S=i=1NYi has a compound Poisson distribution as defined earlier.
  • In addition, we assume that (Yi,Ii)i1 are
    • mutually i.i.d. and independent of N,
    • with Yi having marginal distribution function G with G(0)=0, and
    • Ii having marginal distribution function given by Pr[I=j]=pj+ for all j{1,,n}.

Preliminary 2: Partition #

  • The random vector (Y1,I1) takes values in R+×{1,,m}.
  • On this set we choose a finite sequence of sets A1,,An such that

AkAl= for all kl(no overlap);k=1nAk=R+×{1,,m}(all-inclusive).

  • Such a sequence is called a “measurable disjoint decomposition” or “partition” of R+×{1,,m}.
  • This partition is called “admissible” for (Y1,I1) if for all k=1,,n p(k)=Pr[(Y1,I1)Ak]>0. Note k=1np(k)=1 due to the properties of the
    partition above (no overlap and all-inclusive)

We have two levels of partition:

  • Into LoBs:
    • Claims are classified according to a sub-portfolio or LoB
    • For instance: domestic motor and commercial motor
    • The probability of a claim being in LoB j is pj+
    • The indicator for the claim to be in LoB j is Ij
      (with probability pj+ of being 1)
  • Into a second level:
    • Claims are classified according to another set of criteria
    • For instance: geographical areas NSW and VIC
    • The probability of a claim being in geographical area k is p(k)

Theorem 2.14: Disjoint decomposition #

Assume S is “doubly partitioned” as described above:

  • S fulfills the extended compound Poisson model assumptions above (Preliminary 1).
  • We chose an admissible partition A1,,An for (Y1,I1) (Preliminary 2).

Then the random variable (sum of claims for partition k): Sk=i=1NYi1{(Yi,Ii)Ak}CompPoi(λkvk,Gk), for k=1,,n, with λkvk=λvp(k)>0,Gk(y)=Pr[Y1y|(Y1,I1)Ak]. Furthermore, the Sk’s are independent (over k).

Thinning of the Poisson process #

  • Assume that m=1 (only one LoB)
  • The disjoint decomposition theorem implies that Yi=Yi1YiA1++Yi1YiAn.
  • For for each partition Ak (defined on the claims) a natural choice is
    • vk=v
    • λk=λp(k)
  • This means that the volume remains constant in each partition, but the expected claims frequencies λk change proportionally to the probabilities of falling in partition Ak, k=1,,n.
  • This is called thinning of the Poisson process.

Sparse vector algorithm #

If Scompound Poisson(λ,g(yi)=πi), i=1,,m then S=y1N1++ymNm, where the Ni’s

  • represent the number of claims of amount yi;
  • are mutually independent;
  • are Poi (λi=λπi).

Proof: see tutorial exercise los18. Note also that this is a special case of Theorem 2.14, and is Theorem 12.4.2 of Bowers et al. (1997).

So what?

  • Sparse vector algorithm: allows to develop an alternative method for tabulating the distribution of S that is more efficient as m is small.
  • S can be used to approximate the Individual Risk Model if X=Ib (see Module 3).

The sparse vector algorithm #

(Bowers et al. 1997, Example 12.4.2) Suppose S has a compound Poisson distribution with λ=0.8 and individual claim amount distribution

$y_i$ Pr[Y=yi]
1 0.250
2 0.375
3 0.375

Compute fS(s)=Pr[S=s] for s=0,1,...,6.

This can be done in two ways:

  • Basic method (seen earlier in the lecture): requires to calculate up to the 6th convolution of Y.
  • Sparse vector algorithm: requires no convolution of Y.

Solution - Basic Method

x g0(x) g(x) g2(x) g3(x) g4(x) g5(x) g6(x) fS(x)
0 1 - - - - - - 0.4493
1 - 0.250 - - - - - 0.0899
2 - 0.375 0.0625 - - - - 0.1438
3 - 0.375 0.1875 0.0156 - - - 0.1624
4 - - 0.3281 0.0703 0.0039 - - 0.0499
5 - - 0.2813 0.1758 0.0234 0.0010 - 0.0474
6 - - 0.1406 0.2637 0.0762 0.0073 0.0002 0.0309
n 0 1 2 3 4 5 6
Pr[N=n]=e0.8(0.8)nn! 0.4493 0.3595 0.1438 0.0383 0.0077 0.0012 0.0002
  • The convolutions are done in the usual way.
  • The fS(x) are the sumproduct of the row x and row Pr[N=n].
  • The number of convolutions (and thus of columns) will increase by 1 for each new value of fS(x), without bound!

Solution - Sparse vector algorithm

Thanks to Theorem 2.12, we can write S=N1+2N2+3N3

x Pr[N1=x] Pr[2N2=x] Pr[3N3=x] Pr[N1+2N2=x] fS(x)
0 0.818731 0.740818 0.740818 0.606531 0.449329
1 0.163746 0 0 0.121306 0.089866
2 0.016375 0.222245 0 0.194090 0.143785
3 0.001092 0 0.222245 0.037201 0.162358
4 0.000055 0.033337 0 0.030974 0.049906
5 0.000002 0 0 0.005703 0.047360
6 0.000000 0.003334 0.033337 0.003288 0.030923
xi 1 2 3
λi=λπi 0.2 0.3 0.3
Pr[Ni=x/i] e0.2(0.2)xx! e0.3(0.3)x/2(x/2)! e0.3(0.3)x/3(x/3)!

The fS(x) are convolution, e.g.: (5)[3]=.8187310+.163746.222245+.0163750+.001092.740818 (6)[3]=.740818.037201+0.194090+0.121306+.222245.606531

Note that only two convolutions are needed: columns (5) and (6).

Example 2.16: Large claim separation #

  • This is a very important (and convenient) application of the Disjoint decomposition property (Theorem 2.14).
  • Attritional and catastrophic claims often have very different distributions (different G’s); see also https://www.actuaries.digital/2022/01/10/catastrophe-vs-standard-loss-modelling/
  • The idea here is to divide the claims into different layers with different distributions:
    • Small claims are modelled using a parametric distribution for which it is easy to obtain the distribution of the compound distribution, potentially even approximated with a normal distribution thanks to volume and light right tail;
    • Large claims are typically modelled with a Pareto distribution with threshold M and tail parameter α>1 (see Module 6 for a justification of this, and for the choice of an appropriate M). The could also be “modelled” (see article above)

Assuming two layers:

  • We choose a large claims threshold M>0 such that 0<G(M)<1, that is, there is probability mass on either size of M.
  • We define the partition A=Asc={Y1M}andAc=Alc={Y1>M}.
  • Assume that SCompPoi(λv,G).
  • We now define the small and large claims layers as Ssc=i=1NYi1{YiM}, andSlc=i=1NYi1{Yi>M}, respectively.

  • Theorem 2.14 implies that Ssc and Slc are and compound Poisson distributed with SscCompPoi(λscv=λG(M)v,Gsc(y)=Pr[Y1y|Y1M]), andSlcCompPoi(λlcv=λ(1G(M))v,Glc(y)=Pr[Y1y|Y1>M]), respectively.
  • The distribution of S=Ssc+Slc can then be obtained by a simple convolution of distributions of Ssc and Slc (thanks to independence); see Module 4 for examples ().

Parameter estimation (MW 2.3) #

Introduction #

Estimation methods #

You should be familiar with the main estimation methods:

  • Method of moments
  • Maximum likelihood estimation

Here the problem is slightly complicated because our observations may not be directly comparable due to varying exposures v’s.

Assume that (N1,,NT) is the vector of observations.

What to do with volumes? Lemma 2.26 #

  • The key idea here is to find the minimum variance method of moments estimator, when the volumes across the observations can vary.
  • This is what is different from a straight method of moments estimator, and explains why we need to think it through: how to deal with those volumes?
  • Assume there exist strictly positive volumes v1,,vT such that the components of (N1/v1,,NT/vT) are independent with λ=E[Ntvt] and τt2=Var(Ntvt)(0,), for all t=1,,T.

Lemma 2.26 states that the unbiased, linear estimator for λ with minimal variance is given by λ^TMV=(t=1T1τt2)1t=1TNt/vtτt2, with variance Var(λ^TMV)=(t=1T1τt2)1.

Note:

  • We haven’t made any distributional assumption yet - this estimates E[Ntvt] via method of moments, taking the vt’s into account in an optimal way (in the sense that it minimises the variance of the estimator).
  • The superscript “MV” stands for “minimal variance”.

Method of moments #

Binomial and Poisson cases #

Unbiased, minimal variance estimators:

  • binomial case for p: p^TMV=1s=1Tvst=1TNt=t=1Tvts=1TvsNtvt Furthermore, t=1TNtBinom(s=1Tvs,p), which means we know the distribution of p^TMV.
  • Poisson case for λ: λ^TMV=1s=1Tvst=1TNt=t=1Tvts=1TvsNtvt Here, t=1TNtPoi(λs=1Tvs).

Negative binomial case #

More complicated, because: E[Ntvt]=λ and Var(Ntvt)=λ/vt+λ2/γ=τt2, Unbiased (but not guaranteed minimal variance): λ^TNB=1s=1Tvst=1TNt=t=1Tvts=1TvsNtvt


We need a sense of the dispersion for estimating the dispersion parameter γ.

Let the weighted sample variance V^T2=1T1t=1Tvt(Ntvtλ^TNB)2. Then we have γ^TNB=(λ^TNB)2V^T2λ^TNB1T1(t=1Tvtt=1Tvt2t=1Tvt), ONLY if V^T2>λ^TNB. Otherwise use Poisson or binomial.

Maximum likelihood estimators #

Binomial and Poisson cases #

Estimators are identical to method of moments estimators. Or conversely, the MLE estimators are actually unbiased.

  • binomial case for p: p^TMLE=1s=1Tvst=1TNt=t=1Tvts=1TvsNtvt=p^TMV
  • Poisson case for λ: λ^TMLE=1s=1Tvst=1TNt=t=1Tvts=1TvsNtvt=λ^TMV

Negative binomial case #

Assume N1,,NT are independent and NegBin(λvt,γ). The MLE (λ^TMLE,γ^TMLE) are the solution of (λ,γ)t=1Tlog(Nt+γ1Nt)+γlog(1pt)+Ntlogpt=0, with pt=λvt/(γ+λvt)(0,1).

The (a,b,0) and (a,b,1) classes of distributions #

The (a,b) class of Panjer distributions (4.2.1) #

A class of distributions has the following property

Pr[N=n]=(a+bn)Pr[N=n1], or pkpk1=(a+bk).

This is the (a,b) class of “Panjer distributions”. This means that Pr[N=n] can be obtained recursively with initial value Pr[N=0]; see Wuthrich (2023), Definition 4.6.

The exhaustive list of its members (see Wuthrich 2023 Lemma 4.7) is


Distribution a b Pr[N=0]
Poisson (λ) 0 λ eλ
Neg Bin (γ,p) p (γ1)p (1p)γ
Binomial (m,p) p/(1p) (m+1)p/(1p) (1p)m

Exercise: prove the results in the above table!

(Note the Negative Binomial is parametrised as per Proposition 2.20 in Wuthrich (2023) (second definition))

First three cumulants of the (a,b) family #

Distribution E[N] Var(N) E[(NE[N])3]
Poisson (λ) λ λ λ
Neg Bin (γ,p) γp1p γp(1p)2 γp(1+p)(1p)3
Binomial (m,p) mp mpq mpq(qp)

Exercise:

  • check these results using the cgf
  • find the first 3 cumulants of S, as well as ςS for each member of the family

actuar and the (a,b,1) class #

  • The package actuar extends the definition above to allow for zero-truncated and zero-modified distributions.
  • The Poisson, binomial and negative-binomial (and special case geometric) are all well supported in Base R with the d, p, q and r functions.
  • If one takes the Panjer equation for granted, then we can think of p0 as the mass that will make the pmf add up to one: given Panjer:p0 is such that k=0pk=1.
  • We introduce here the (a,b,1) class which extends the idea above so that we have more freedom on the mass at 0.
  • The reference for this section is Section 4 of the vignette “distribution” of actuar

The (a,b,1) class of distributions #

A discrete random variable is a member of the ** (a,b,1) class of distributions** if there exist constants a and b such that pkpk1=a+bk,k=2,3,. Note:

  • The recursion starts at k=2 for the (a,b,1) class.
  • The extra freedom allows the probability at zero to be set to any arbitrary number 0p01

Zero-truncated distributions #

  • Setting p0=0 in the (a,b,1) class defines the subclass of zero-truncated distributions
  • Members are the zero-truncated Poisson (actuar::ztpois), zero-truncated binomial (actuar::ztbinom), zero-truncated negative-binomial (actuar::ztnbinom), and the zero-truncated geometric (actuar::ztgeom).
  • Let pkT denote the probability mass at k for a zero-truncated distribution (“$T$” for truncated). We have pkT={0,k=0;pk1p0,k=1,2,., where pk is the probability mass of the corresponding member of the (a,b,0) — that is, (a,b) — class.
  • actuar provides the d, p, q, and r functions of the zero-truncated distributions mentioned above.

Zero-modified distributions #

  • Setting p0p0M (0<p0M<1) in the (a,b,1) class defines the subclass of zero-modified distributions (“$M$” for “modified”)
  • These distributions are discrete mixtures between a degenerate distribution at zero, and the corresponding distribution from the (a,b,0) class.
  • Let pkM denote the probability mass at k for a zero-modified distribution. We have then pkM=(11p0M1p0)1{k=0}+1p0M1p0pk. Alternatively, pkM={p0M,k=0;1p0M1p0pk,k=1,2,., where pk is the probability mass of the corresponding member of the (a,b,0) class.

  • Quite obviously, zero-truncated distributions are zero-modified distributions with p0M=0, and pkM=p0M1{k=0}+(1p0M)pkT.
  • Members are the zero-modified Poisson (actuar::zmpois), zero-modified binomial (actuar::zmbinom), zero-modified negative-binomial (actuar::zmnbinom), and the zero-modified geometric (actuar::zmgeom). actuar provides the d, p, q, and r functions of the zero-truncated distributions mentioned above.

plot(dpois(0:7, 2.5), pch = 20, col = "red", ylim = c(0, 0.3),
  cex = 1.5, type = "b")
points(dztpois(0:7, 2.5), pch = 20, col = "blue", type = "b")
points(dzmpois(0:7, 2.5, 2 * dpois(0, 2.5)), pch = 20, col = "green",
  type = "b")

References #

Bowers, Newton L. Jr, Hans U. Gerber, James C. Hickman, Donald A. Jones, and Cecil J. Nesbitt. 1997. Actuarial Mathematics. Second. Schaumburg, Illinois: The Society of Actuaries.

Dutang, Christophe, Vincent Goulet, and Mathieu Pigeon. 2008. “Actuar: An r Package for Actuarial Science.” Journal of Statistical Software 25 (7).

Werner, Geoff, and Claudine Modlin. 2010. Basic Ratemaking. Casualty Actuarial Society.

Wuthrich, Mario V. 2023. “Non-Life Insurance: Mathematics & Statistics.” Lecture notes. RiskLab, ETH Zurich; Swiss Finance Institute.