Prerequisite Knowledge

Summary of some of the prerequisite knowledge required for this subject:

The first two parts ( Mathematics and Probability) below are basic reminders, and could probably be skipped by most students.

The third part is more advanced and corresponds to Section 1.2 of the prescribed textbook MW.

Students are recommended to review all those concepts prior to week 1, and to ask questions at the week 1 tutorial if needed.

Mathematics #

Functions and their derivatives #

  1. Be familiar with functions xα,eαx,ln(1+x)
  2. Basic derivatives: (xα)=αxα1 (eαx)=αeαx (ln(1+x))=11+x (ax)=axln(a)
  3. Taylor's expansion (here for an exponential random variable): ex=1+x+x22!+x33!+...+xnn!+...

Be able to find expressions for following summations #

i=1nxi,i=1nixi,i=1ni See Tutorial 0 (Revisions) for solutions.

Change the order of double summation #

k=1nj=1kak,j=j=1nk=jnak,j

The solution to a quadratic equation #

The equation ax2+bx+c=0 has two solutions:

x1=b+b24ac2a,b2>4ac x2=bb24ac2a,b2>4ac

If b24ac = 0, the equation has a double solution: x=b2a

Example #

For example, find a value for v such that v(0,1) and v satisfies the equation: v2v0.5+34=0.

Solution: let v12=x. Then the equation above simplifies to x22x+34=0 which has two solutions:

x1=2+432=1.5,        x2=2432=0.5

We reject the solution x1=1.5 as x1>1. Then v=x22=0.52=0.25 is the required solution.

Be able to solve simple differential equations #

Example 1 #

For example, solve f(x)=2x with initial condition f(0)=1.

Solution:

f(x)=f(0)+0xf(t)dt=1+0x2tdt=1+t2|0x=1+x2

Example 2 #

Solve f(x)=2f(x) with initial condition f(0)=1.

Solution:

f(x)f(x)=2(lnf(x))=2. lnf(x)lnf(0)=0x(lnf(t))dt=0x2dt=2x lnf(x)=2xf(x)=e2x

Integrals #

  1. We have abf(x)dx=F(b)F(a), where F(x) is the anti-derivative of f(x), such that F(x)=f(x).
  2. The integration variable is just a tool, that is, abf(x)dx=abf(y)dy, it does not matter to use x or y.
  3. We have $0f(x)dx=limb0bf(x)dx=limbF(b)F(0). For example: 0exdx=limb0bexdx=limb(ex|0b)=1limbaeb=1
  4. Integration by parts: abf(x)g(x)dx=f(x)g(x)|ababg(x)f(x)dx=f(b)g(b)f(a)g(a)abg(x)f(x)dx

The average of a function on [a,b] #

Let f(x) be a continuous function on [a,b]. Then there exists a point c[a,b] such that abf(x)dx=f(c)(ba),        c[a,b]

Interpretation: abf(x)dx is the area of the region enclosed by f(x), x-axis, x=a, x=b. f(c)(ba) is the area of the rectangle of height f(c) and length (ba).

Definition: abf(x)dxba=f(c): the average value of f(x) on [a,b] interval.

Example 1: #

11xdx2=0

The average value of f(x)=x on [1,1] is 0.

Example 2: #

01xdx10=12

The average value of f(x)=x on [0,1] is 12.

Example 3: #

11x2dx2=13 The average value of f(x)=x2 on [1,1] is 13.

The trapezoid rule in integration #

abf(x)dx12[f(b)+f(a)](ba) abf(x)dxba12[f(b)+f(a)]

The average value of f(x) on [a,b] can be approximated by 12[f(b)+f(a)].

The definition of abf(x)dx and its numerical calculations #

abf(x)dx=limnk=0n1f(xk)ban,(1) where x0=a,x1=x0+ban,...,xk+1=xk+ban,...,xn=b

In the summation in (1), each term represents the area of a rectangle. f(xk)ban represents the area of the k-th rectangle.

Approximations:

  1. abf(x)dxf(a)(ba)       (n=1)
  2. abf(x)dxba2[f(a)+f(b+a2)]
  3. abf(x)dx(ba)f(x0)+f(x1)+...+f(xn1)n: the average of f(x0),f(x1),...,f(xn1) times (ba).
  4. If a=0,b=1, 01f(x)dxf(x0)+f(x1)+...+f(xn1)n.

Alternatively,

abf(x)dx=limnk=1nf(xk)ban(2), where x1=x0+ban,x2=x1+ban,...,xn=b.

Approximations:

  1. abf(x)dxf(b)(ba)
  2. abf(x)dxba2[f(b+a2)+f(b)]
  3. abf(x)dx(ba)f(x1)+f(x2)+...+f(xn)n

The average number of n numbers #

Let x1,x2,...,xn be n numbers. Then x1+...+xnn=1ni=1nxi is the average value of x1,x2,...,xn.

Example 1 #

The average value of 1,2,...,n is 1ni=1ni=1nn(n+1)2=n+12, where 1+2+3+...+n=n(n+1)2 is given in Tutorial 0.

Example 2 #

One student took 8 subjects in his first year at University of Melbourne. The results are as follows: Semester 1: 75, 83, 65, 90; Semester 2: 60, 76, 80, 50.

Then

  • 75+83+65+90+60+76+80+50=579 is the total marks from year 1.
  • The average mark is 5798=72.4
  • The average mark for S1 is 75+83+65+904=78.2
  • The average mark for S2 is 60+76+80+504=66.5

The weighted average of n numbers #

Let x1,x2,...,xn be n real numbers.

Let θ1,θ2,....,θn be n numbers such that 0θi1 and i=1nθi=1. Then i=1nθixi is called the weighted average of x1,x2,...,xn.

Note:

  • θi is the weight attached to xi.
  • if θi=1n, then i=1n1nxi=1ni=1nxi is the average of x1,x2,...,xn (equally weighted).

Example #

In the assessment of ACTL10001, the assignments account for 20%, the mid-semester exam accounts for 10%, and the final exam accounts for 70%. A student got 70 out of 100 for mid-semester result, 95 out of 100 for assignments, and 80 for final exam. Then the overall weighted average mark is 70×10%+95×20%+80×70%=82.

Probability #

The following contents are the object of a video recorded in August 2021 for the subject ACTL10001 Introduction to Actuarial Studies: annotated pdf

If you wish to watch the embedded videos from Lecture Capture, you need to have logged in and entered Lecture Capture via Canvas once for each session. This is to restrict access to students enrolled at the University of Melbourne only.

Events and Probability #

Vocabulary: events vs probability #

It is important to understand the difference between events and probability:

  • Event: what could happen - an actual “thing,” in real life, that could happen;
  • Probability: our understanding of the “likelihood” (or frequency) of an event (something that could occur).

So when we are building a mathematical model for uncertain outcomes:

  1. The first step is to work out what are all the possible things that could occur (for instance, “rain” or “no rain”). The full set of those is denoted Ω.
  2. The second step is to make assumptions about how likely those things can occur. Here ” Pr ” is an operator that maps an event into a probability. For instance, Pr[rain]=0.2 means that the likelihood corresponding to the event “rain” is 20%.

In what follows we outline basic results and axioms around events and their probabilities. Often logic means that a result or definition on one side (e.g. events) can be translated on the other side (e.g. probabilities).
For instance, the complement to an event is exactly whatever could happen, that is not the event. Hence, the probability of the complement must be 1 minus the probability of the original event; see 2.1.2.4 below.

Events, operations of events, probability of an event #

  1. : empty set, that it, it is an impossible event: Pr()=0.
  2. Ω: the full set of possible outcomes, that is, it is a certain event: Pr(Ω)=1.
  3. A: an event (within Ω), 0Pr(A)1.
  4. AC: the event that A does not occur (called a “complement”): Pr(AC)=1Pr(A).
  5. AB: A and B, the event that both A and B occur.
  6. AB: A or B, the event that either A or B, or both events occur.
  7. AB: If A occurs, B must, and:
    • Pr(A)Pr(B);
    • AB=A.
      Example: A = {a 20-year old survives to age 70}, B = {the 20-year old survives to age 50}. Then AB.

Mutually exclusive events #

If AB=, then A and B are mutually exclusive. Also,
Pr(AB)=Pr(A)+Pr(B).

Independent events A and B #

If A and B are independent, then Pr(AB)=Pr(A)Pr(B).

Conditional probability formula #

We have

Pr(A|B)=Pr(AB)Pr(B).

This leads to Bayes’ theorem, see for instance this.

Also,

  1. If AB, then AB=A and Pr(A|B)=Pr(A)Pr(B).
  2. If A and B are independent, then Pr(A|B)=Pr(AB)Pr(B)=Pr(A)Pr(B)Pr(B)=Pr(A).
  3. If BA, then AB=B and Pr(A|B)=Pr(B)Pr(B)=1. Given B has occurred, A is certain.

Random variables and their distribution #

Definition #

A random variable, denoted by capital letters X,Y,Z, is a quantity whose value is subject to variations due to chance.

Distribution Function #

Definition: F(x)=Pr(Xx)        xR F(x) is called the distribution function of X, and it has the following properties:

  1. F()=0,F()=1.
  2. F(x1)F(x2), if x1x2.
  3. F(x) is right-continuous (aka “càdlàg”), i.e., limxx0+F(x)=F(x0).
  4. F(b)F(a)=Pr(a<Xb).
  5. F(b)F(a)=Pr(a<X<b).
  6. F(b)F(a)=Pr(aXb).
  7. F(b)F(b)=Pr(X=b)0

In our subject, we generally assume that X0 so that F(x)=0 for x<0.

Difference betweeen continuous and discrete random variables #

As an introduction to differences between continuous and discrete random variables, review this video:

Continuous random variables ( X0 ) #

X is said to be a continuous r.v. if X has a probability density function f(x), x0, with the following properties:

  1. f(x)=F(x).
  2. F(x)=0xf(y)dy.
  3. Pr(a<Xb)=Pr(a<X<b)=Pr(aXb)=Pr(aX<b)=abf(x)dx
  4. F(x) typically looks like but note that it does not need to be concave.
  5. E(X)=0xf(x)dx=0[1F(x)]dx

Discrete random variables #

A random variable X is said to be a discrete random variable if X takes values from a countable set of numbers {x1,x2,...,xn,...}.

  1. Probability distribution of X where pn=Pr(X=xn),n=1,2,3,...

  2. E(X)=n=1xnpn

  3. The distribution function F(x) is a piece-wise constant function (also called step function).

Moments of a random variable #

Expectation and variance #

Expectation of X: E(X)=0xf(x)dx. Variance of X: Var(X)=E{[XE(X)]2}=E(X2)[E(X)]2.

Furthermore:

  1. Var(X) measures the variability of X. The larger the variance, the more variability X has.
  2. If Var(X)=0, XcE(X). There is no variability for X. X is a constant.
  3. If X and Y are independent, then Var(X+Y)=Var(X)+Var(Y).
  4. Var(aX)=a2Var(X)

Moments of the average of iid rv’s #

Assume X1,X2,...,Xn are independently and identically distributed, with E(X1)=μ, and Var(X1)=σ2. Define Yn=1n(X1+X2+...+Xn) to be the average of X1,X2,...,Xn. Then E(Yn)=1n(E(X1)+E(X2)+...+E(Xn))=1n(μ+μ+...+μ)=μ and Var(Yn)=1n2(Var(X1)+Var(X2)+...+Var(Xn))=1n2(σ2+σ2+...+σ2)=σ2n. When n, Var(Yn)0. That is to say, as n, Ynμ. With an infinite sample of X’s, you can estimate μ with certainty.

Selected distributions #

Binomial distribution #

If XBin(n,p), then Pr(X=k)=(nk)pk(1p)nk,   0<p<1,k=0,1,2,...,n.

Note:

  1. E(X)=np, Var(X)=np(1p).
  2. F(k)=Pr(Xk)=j=0kPr(X=j)=j=0k(nj)pj(1p)nj,   k=0,1,2,...,n.
  3. F(2.5)=Pr(X2.5)=Pr(X2)=F(2).
  4. X represents # of successes out of n independent trials, each trail has two outcomes: sucess with probability p OR failure with probability 1p.

Exponential distribution #

If XExp(λ) then f(x)=λeλx,x>0,λ>0.

Note:

  1. F(x)=1eλx,x0.
  2. E(X)=0xf(x)dx=0[1F(x)]dx=1λ.

Uniform distribution #

If XU(0,M) then f(x)=1M,0xM.

Note:

  1. F(x)=xM,0xM.
  2. E(X)=0Mx1Mdx=0M(1xM)dx=M2.

Probablility theory and statistics (MW 1.2) #

Please review:

  • Random variables and distribution functions (MW 1.2.1)
  • Terminology in statistics (MW 1.2.2)

Tutorial exercises for Module 1 are focused on revisions of assumed known materials.

Continuous vs discrete random variables #

Random variables are either

  • discrete
  • distribution function (df) is Pr[Xx]FX(x) (cadlag)
  • probability mass function (pmf) is Pr[X=x]
  • continuous
  • cumulative distribution function (cdf) is FX(x) (cadlag)
  • probability density function (pdf) is defined by fX(x)dFX(x)dx. This is NOT a probability.
  • mixed

The Riemann-Stieltjes integral #

The Riemann-Stieltjes notation allows to write expressions for any type of rv. For instance, E[g(X)]=g(x)dFX(x), where dFX(x) is to be interpreted as

  • fX(x)dx for the continuous bits of F, and
  • FX(x)FX(x0)=Pr[X=x] (remember F is cadlag) for the discrete bits of F.

Moments of random variables #

There are two types of moments:

  • the moments around the origin: E[Xk], k>0
    • k=1: the mean E(X)=μX=xdFX(x)
  • the central moments: E[(XE[X])k], k>0
    • k=2: the variance Var(X)=σX2=E[(XμX)2]=E(X2)μX2

Some extremely useful formulas: E[S]=E[E[S|N]](LIE)Var(S)=Var(E[S|N])+E[Var(S|N)](DVR)Cov(X,Y)=E[Cov(X,Y|Z)]+Cov(E[X|Z],E[Y|Z])(DCR)

Tail value method of calculating expectation #

For positive random variables we have E[X]={0[1FX(x)]dxif X is continuous0[1FX(x)]if X is discrete

Descriptive statistics #

Some indicators are functions of the moments

  • Coefficient of variation (measure of spread): Vco(X)=σXμX.
  • Skewness (measure of… skewness!): ςX=E[(XμX)3]σX3.
  • If symmetric, ςX=0 [vice versa not true]
  • ςX>0 indicates heavy right-tail [skewed to the right]
  • ςX<0 indicates heavy left-tail [skewed to the left]

*Excess Kurtosis** (measure of peakedness) γ2(X)=E[(XμX)4]σX43.

  • γ2(X)=0 mesokurtic [like Normal, Binomial (p=0.5) ]
  • γ2(X)>0 leptokurtic [fatter tails]
  • γ2(X)<0 platykurtic [thinner tails]

Note that these indicators have no units, which allows comparisons between distributions.

Generating functions #

Probability generating function pgfonly for discrete rv! pX(t)=E(tX)=Pr[X=0]+Pr[X=1]t+Pr[X=2]t2+Pr[X=3]t3+ Moment generating function mgf MX(t)=E(etX)=1+E[X]t+E[X2]t22+E[X3]t36++E[Xk]tkk!+ and thus E[Xk]=dkdtkmX(t)|t=0


Example: Xnorm(μ,σ2) distribution

E[etX]=eμt+12σ2t2


Cumulant generating function cgf κX(t)=ln[mX(t)]=E[X]t+Var(X)t22+E[(XμX)3]t36+γ2(X)[Var(X)]2t44!+ and thus E[(XμX)k]=dkdtkκX(t)|t=0,k=2 and **3.

  • CAUTION: the second and third cumulants are the second and third central moments, but NOT the following ones!
  • Cumulants are additive: the k-th cumulant of a sum is the sum of the k-th cumulants (conditions?)

@: References: MW 1

Use of the R software #

R is required prior knowledge for this course, and part of the prerequisite course MAST20005 (Statistics).

This is because R is a required software for the actuarial professional exams CS1 and CS2 (see https://www.actuaries.org.uk/studying/curriculum/actuarial-statistics and also https://www.actuaries.org.uk/studying/curriculum/frequently-asked-questions-curriculum ). It is used very widely by actuaries in industry (see, for instance, http://insightriskconsulting.co.uk/blog/r-for-actuaries/ or https://www.actuaries.digital/2019/09/26/my-top-10-r-packages-for-data-analytics/ ). Some companies also use R to produce presentations and documentation for generally.

In order to help you with R I have put together a website that summarises all the things I think you should know before starting your first grad role: https://communicate-data-with-r.netlify.app At the very least, you need to know what is under “Base R.” Learning the “tidyverse” will be most useful, as well as “ggplot2” for better visualisations. “htmlwidgets” is more advanced, and not required for the course. You may want to create your assignment with “R Markdown” (under “Communicate Data”), although this is not required either.

The main reference for Base R is the book http://biostatisticien.eu/springeR/index-en.html, which is also available in other languages (including mandarin). The English version can be downloaded for free from the Unimelb library: https://go.openathens.net/redirector/unimelb.edu.au?url=http%3A%2F%2Fdx.doi.org%2F10.1007%2F978-1-4614-9020-3

I strongly recommend you review the R materials mentioned above before the semester starts.

See also the Actuaries Institute Analytics Cookbook: https://www.actuaries.digital/2021/11/30/the-actuaries-analytics-cookbook-recipes-for-data-success/ https://actuariesinstitute.github.io/cookbook/docs/index.html

Credit #

The initial version of Part 1 and 2 were developed by Professor Shuanming Li in 2018. These were then transcribed modified and augmented by Professor Benjamin Avanzi in 2021 for the subject ACTL10001.