Introduction to Short Term Actuarial Mathematics (STAM)

This article provides an overview of actuarial and financial subjects related to Short Term Actuarial Mathematics, following the syllabus of IFM exam given by the SOA. It is also based on the book Loss Models (From Data to Decisions) written by Stuart A. Klugman, Harry H. Panjer and Gordon E. Willmot.

1. Basic distributional quantities
2. Classifying and creating distributions

1. Basic distributional quantities

1.a. Moments

Let’s consider a random variable X. Its \( k^{th} \) raw moment is defined by \( \mu’_k = E[X^k] \) whereas its \( k^{th} \) central moment is defined by \( \mu_k = E[(X - \mu )^k] \).

The first raw moment is called mean and denoted \( \mu \). The second central moment is called variance and denoted \( \sigma^2 \). Its square root \( \sigma \) is the standard deviation (SD). The coeffcient of variation \( \sigma / \mu \) measures how the distribution spreads relative to its mean. The sknewness \(\gamma_1 = \mu_3 / \sigma^3 \) measures the distribution’s assymetry. A symmetric distribution has a skewness equal to 0 while a right-skewed distribution has a positive skewness. The kurtosis \( \gamma_2 = \mu_4 / \sigma^4 \) measures the distribution’s flatness compared to the normal distribution. A normal distribution has a kurtosis equal to 3. A distribution with kurtosis higher than 3 has more probability assigned to values far away from its mean, compared to a normal distribution with the same standard deviation.

The mean excess loss function (a.k.a. mean residual life function or complete expectation of life, depending on the context) is defined as: \[ e_X(d) = E[X-d \mid X > d ] \] The variable \( Y = X - d\) given that \( X > d \) is called a left truncated and shifted variable. In insurance’s contexts, we can think about (Y, X, d) as (expected payment, claim, deductible) or (expected remaining time to death, age, actual age), etc.

If \(X\) has a continuous distribution with density \(f\), we can rewrite its mean excess loss function as: \[ e_X(d) = \frac{ \int_{d}^{\infty} (x-d)^k f(x)dx }{ 1 - F(d)} = \frac{ \int_{d}^{\infty} S(x)dx }{ S(d) } \]

The left censored and shifted variable is \( Y = (X - d)_{+} = X - d \) if \(X \ge d \) and 0 otherwise.

Note that the left truncated ans shifted variable is not defined for \( X < d \) whereas the left censored and shifted variable is 0 for \( X < d \). Thus, the formula to link the \( k^{th}\) moment of the left censored and shifted variable to the \( k^{th}\) moment of the left truncated and shifted variable is: \( E[(X-d)^k_+] = e^k_X(d) S(d) \).

The right censored variable (or limited loss variable) is \( Y = min(X,u)\) with expected value \( E[Y] = E[min(X,u)]\) called limited expected value. Note that the right censored variable and the left censored and shifted variable are complementary: \( X = (X - d)_{+} + min(X,d) \). In insurance terms, one can be fully covered by purchasing simultaneously a policy with a deductible of d and another with a limit d.

A formula that is worth remembering is \[ E[min(X,u)] = \int_{- \infty}^{0} F(x)dx + \int_{0}^{u} S(x)dx \]

1.b. Percentile

The percentile function is the inverse of the distribution function. The \(100p^{th} \) percentile of a variable is \( \pi_p \) such that \( F(\pi_{p-}) \le p \le F(\pi_{p})\). The median corresponds to the \(50^{th} \) percentile \( \pi_{0.5} \)

1.c. Generating functions

For a randon variable \( X \) we define its probability generating function (pgf) as \( P_X(z) = E[z^X] \) and its moment generating function (mgf) as \( M_x(t) = E[e^{tX}] = P_X(e^t)\). We usually work with pgf if \( X \) is discrete and with mgf if \( X \) is continuous.

Now consider k random independent variables \( (X_i, i = 1,…,k) \), each with mgf \( M_{X_i}(.) \) One can show that its sum \( S_k = \sum_{i=1}^{k} X_i \) has mgf \( M_{S_k}(t) = \prod_{i=1}^{k} M_{X_i}(t) \) and mgf \( P_{S_k}(z) = \prod_{i=1}^{k} P_{X_i}(z) \).

Calculating mgf or pgf is a common way to identify the distribution of a sum of independent variables (e.g. by calculating the mgf, one can show that a sum of idependent gamma distributions is still gamma distribution).

2. Classifying and creating distributions

2.1. Parametric distributions

Parametric and scale distributions

A parametric distribution is caracterized by distribution functions with some finite parameters. A parametric distribution is a scale distribution if, multiplied by a positive constant, it stays in the same set of distribution (e.g. the exponential distribution). For this kind of distribution (with nonnegative support), a scale parameter is the one that, when the variable is multiplied by a constant, is multiplied by the same constant. A parametric distribution family is a set of parametric distributions related to each other in some way.

Mixture distribution

When modeling a complex phenomenon that can be decomposed by other phenonoena with some respective probabilities, one can use a mixture distribution. A random variable \( Y \) is a k-point mixture of k other random variables \( (X_i, i = 1,…,k)\) if its cumulative distribution function is \( F_Y(t) = \sum_{i=1}^{k} a_i F_{X_i}(t) \) with \( a_i > 0 \space \forall i = 1,…,k \) and \( \sum_{i=1}^k a_i = 1 \). A more flexible distribution where the number of underlying variables is also a random variable \(K\) is called a variable-component mixture distribution. It’s also called semi-parametric distribution because its complexity is in between parametric an nonparamatric distributions. The number of parameters is not fixed nor limited because //( K//) is not fixed.

Data-dependent distribution

A data-dependent distribution is at lease as complex as the data (or knowledge that produced it) and the number of parameters increases with the number of data points (or the amount of knowledge). One example is the empirical distribution which gives probability of \( 1/n \) for each of the \(n\) observed data point. Another example, which is the continuous version of the first example, is the kernel smoothing model with a continuous density function with area \( 1/n \) for each of the \(n\) observed data point.

2.2 Tail weight

In actuarial science and many other quantitative domains, when assessing loss, one is usually interested in the right tail of the loss distribution. For the rest of the article, when we mention the tail of a distribution, we refer to the right tail. There are several common ways of judging whether or not a distribution’s tail is heavy. Please note that the tail’s heaviness is a relative notion: when we say a distribution has a heavy or a light tail, it’s relative to some reference distribution or class of distribution.

Moments

Existing of all positive moments indicates that the distribution has a heavy tail. In contrary, if positive moments do not exist, or only exist up to a certain level, the distribution has a relatively light tail.

Limiting ratio

To compare the tails of two distributions, one can check the convergence of their limiting ratio (ratio of surviving functions or ratio of density functions; they have the same limit according to the L’Hopital’s rule): \[ \lim_{x \to \infty} \frac{S_1(x)}{S_2(x)} = \lim_{x \to \infty} \frac{S_1’(x)}{S_2’(x)} = \lim_{x \to \infty} \frac{-f_1(x)}{-f_2(x)} = \lim_{x \to \infty} \frac{f_1(x)}{f_2(x)} \] If this ratio diverges, the first distribution has a heavier tail.

Hazard rate

If the hasard rate function \( x \to h(x)= f(x)/S(x)\) is decreasing, the distribution has a heavy tail. In fact, lossely speaking, this hasard rate measures the ratio between the probability assigned to a point \(x\) and the probability assigned to all the points larger than \(x\).

Mean residual life function

If the mean residual life function (a.k.a. the mean excess loss function) \(d \to e(d)= \frac{\int_{x=d}^{\infty} S(x)dx }{S(d)} \) is increasing, the distribution has a heavy tail.

It’s worth noting some relationships between the hasard rate function and the mean residual life function:

If the hazard rate function is decreasing, the mean residual life function is increasing
The reverve is not true. An increasing mean residual life function does not necessarly imply a decreasing hazard rate function.
\( \lim_{d \to \infty} e(d) = \lim_{d \to \infty} \frac{\int_{x=d}^{\infty} S(x)dx }{S(d)} = \lim_{d \to \infty} \frac{-S(d) }{-f(d)} \lim_{d \to \infty} \frac{1}{h(d)} \)
For each positive random variable with distribution \(f\), one can introduce the equilibrium distribution \( f_e(x)= S(x)/E[X], x > 0 \). It is a density function because \(f_e(x) > 0 \space \forall x > 0 \) and \( \int_{x=0}^{\infty} f_e(x)dx = \frac{1}{E[X]} \int_{x=0}^{\infty} S(x)dx = 1 \). The hasard rate function of the equilibrium distribution is \( h_e(x) = \frac{f_e(x)}{S_e(x)} = \frac{1}{e(x)} \). Finally, we note that \( \frac{e(x)}{e(0)} = \frac{S_e(x)}{S(x)} \).

2.3 Creating new distributions

Given a random variable \(X\) following a continuous distribution with density \(f\) and cdf \(F\), one can create new ones by multiplying \(X\) by a constant, raising it to a power, taking its exponential, etc.

Multiplication by a constant

Let \(Y = \theta X\) with \(\theta > 0\) called the space parameter. We have \(F_Y(y) = F_X(y/\theta)\) and, by taking the derivative on both sides, \( f_Y(y) = 1/\theta \space f_X(y/\theta) \)

Raising to a power

Let \(Y = X^{1/\tau}\) with \(\tau > 0\). We have \(F_Y(y) = F_X(y^{\tau})\) and, by taking the derivative on both sides, \( f_Y(y) = \tau y^{\tau - 1} f_X(y^\tau) \). If \( \tau < 0\), express \(S_Y\) in terms of of \(S_X\) and then take the derivative on both sides to find an expression for \(f_Y\).

If \( \tau > 0\), the new disrtibution is called transformed.
If \( \tau = -1\), the new disrtibution is called inverse.
If \( \tau < 0\) and \( \tau \neq -1\), the new disrtibution is called inverse transformed.

Exponentiation

Let \(Y = exp(X)\). We have \(F_Y(y) = F_X(ln(y))\) and, by taking the derivative on both sides, \( f_Y(y) = 1/y \space f_X(ln(y)) \).

Mixing

When a parameter \( \lambda \) of a distribution //(f_X\) is a realization of a random variable \( \Lambda \), \(f_X\) is called a mixture distribution with density function: \[f_X(x) = E_{\Lambda}[f_{X \mid \Lambda}(x)] = \int f_{X \mid \Lambda} (x \mid \lambda) f(\lambda) d \lambda \] and cdf: \[F_X(x) = E_{\Lambda}[F_{X \mid \Lambda}(x)] = \int F_{X \mid \Lambda} (x \mid \lambda) f(\lambda) d \lambda \] The \(k^{th}\) moment is: \( E[X^k] = E[E[X^k \mid \Lambda]]\) and \(Var(X) = E[Var(X \mid \Lambda)] + Var(E[X \mid \Lambda]) \).

A mixture distribution usually has a heavy tail. It is often used to deal with parameter uncertainty.

Frailty models

Frailty models form a subset of mixture distribution. Let’s \( \Lambda > 0 \) a frailty variable which we introduce to quantify the uncertainty in the hasard rate function. We define the conditional hazard rate function for a random variable \(X\): \(h_{X \mid \Lambda} (x \mid \lambda) = \lambda a(x)\). The conditional survival function is: \[S_{X \mid \Lambda}(x\mid\lambda) = e^{-\int_{0}^{x} h_{X \mid \Lambda} (t \mid \lambda)dt} = e^{-\lambda A(x)}\] where \(A(x) = \int_{0}^{x} a(t)dt \).

The marginal survival function is then: \[S_{X}(x) = E_{\Lambda}[S_{X \mid \Lambda}(x\mid \Lambda)] = E_{\Lambda}[e^{-\Lambda A(x)}] = M_{\Lambda} (-A(x)) \] where \(M_{\Lambda}\) is a moment generating function of \(\Lambda\).

Splicing

A k-component spliced distribution has density function: \[ f(x) = \sum_{i=1}^{k} a_i f_i(x) \mathbb{1}(c_{i-1} < x < c_i)\] with \( a_i > 0 \space \forall i = 1,…, k \) and \( \sum_{i=1}^{k} a_j = 1\); \(f_i\) a density function well defined over the interval \( (c_{i-1}, c_i)\).

A good case application of spliced distributions is where the tail’s behaviour is very different to small losses’ behaviour.

2.4 Selected distributions and their relationships

Parametric families

Many paramtric distributions are special cases of other ones. We present here two famous ones.

example 1: Transformed beta family

example 2: Transformed/inverse transformed gamma family

Limiting distributions

One distribution can be a limiting case of another (by pushing the last one’parameters to their limiting values of zero or infinity).

2.5 Discrete distributions

Some notations

Let N be random variable for modeling discrete phenomena. The probability function (pf) is \(p_k = P(N=k)\) with \(k = 0,1,2, …\) The probability generating function (pgf) is \(P(z)=E[z^N]=\sum_{k=0}^{\infty} p_kz^k\). It can be used to generate moments \(p_m\): \[P^{(m)}(z)= \sum_{k=m}^\infty k(k-1)…(k-m+1)z^{k-m}p_k \] For example, \(P’(1)= E[N]\) and \(P”(1) = E[N(N-1)]\)

The Poisson distribution

Si \(N \sim Poisson(\lambda)\) avec \(\lambda > 0\) alors \[p_k = e^{-\lambda}\lambda^k / k!\]. From the pgf \(P(z)=e^{\lambda(z-1)}\), one can compute: \[ E[N]=P’(1)= \lambda \] \[ E[N(N-1)]=P”(1)= \lambda^2 \] \[ Var(N)]= \lambda \]

The sum of independent Poisson variables is a Poisson variable. Indeed, if \(N_k \sim Poisson(\lambda_k)\) with \(k=1,…,n\) and all of \((N_k)\) are independent, one can show that \( N= \sum_{k=1}^{n} N_k \sim Poisson(\sum_{k=1}^{n} \lambda_k) \)

Little Actuary

A blog to keep track of my progress on Actuarial Science and Data Science learning