# Stable Distributions: Generalised CLT and Series Representation Theorem

This is a continuation of the previous post that can be found here. We discuss the Generalised CLT in this post. We give a sloppy proof of the theorem. Then we end with a series representation theorem for the Stable distributions.

The Generalised Central Limit Theorem

Theorem 9: ${X_1,X_2,\ldots}$ are iid symmetric random variables with ${P(X_1>\lambda) \sim k\lambda^{-\alpha}}$ as ${\lambda \rightarrow\infty}$ for some ${k>0}$ and ${\alpha \in (0,2)}$. Then

$\displaystyle \frac{X_1+X_2+\cdots+X_n}{n^{1/\alpha}} \stackrel{d}{\rightarrow} Y$

where ${Y\sim S\alpha S}$ with appropiate constants.`

Motivation for proof: Recall the proof of usual CLT. We assume ${X_i}$ are iid random variables mean ${0}$ and variance ${\sigma^2}$ and characteristic function ${\phi}$. We use the fact that under finite variance assumption we have

$\displaystyle \frac{1-\phi(t)}{t^2} \rightarrow \frac{\sigma^2}{2}$

and hence using this we get

\displaystyle \begin{aligned} E\left[e^{i\frac{t}{\sqrt{n}}(X_1+X_2+\cdots+X_n)}\right] = \left[E\left(e^{i\frac{t}{\sqrt{n}}X_1}\right)\right]^n & = \left[\phi\left(\frac{t}{\sqrt{n}}\right)\right]^n \\ & = \left[1-\left(1-\phi\left(\frac{t}{\sqrt{n}}\right)\right)\right]^n \\ & \rightarrow \exp(-t^2\sigma^2/2) \end{aligned}

We will this idea in our proof. We will leave some of the technical details of the proof. The proof presented here is not rigourous. We primarily focus on the methodology and tricks used in the proof.

# Stable Distributions: Properties

This is a continuation of the previous post that can be found here. We will discuss some of the properties of the Stable distributions in this post.

Theorem 2. $Y$ is the limit of $\displaystyle\frac{X_1+X_2+\cdots+X_n-b_n}{a_n}$ for some iid sequence $X_i$ and sequence $a_n >0$ and $b_n$ if and only if $Y$ has a stable law.

Remark: This kind of explains why Stable laws are called Stable!

Proof: The if part follows by taking $X_i$ to be stable itself. We focus on the only if part.

Let $\displaystyle Z_n=\frac{X_1+X_2+\cdots+X_n-b_n}{a_n}$. Fix a $k\in \mathbb{N}$. Let

$\displaystyle S_n^{j}= \sum_{i=1}^n X_{(j-1)n+i}= X_{(j-1)n+1}+X_{(j-1)n+2}+\cdots+X_{jn} \ \ \mbox{for} \ (1\le j \le k)$

Now consider $Z_{nk}$. Observe that

$\displaystyle Z_{nk} = \frac{S_n^1+S_n^2+\cdots+S_n^k-b_{nk}}{a_{nk}}$

$\displaystyle \implies \frac{a_{nk}Z_{nk}}{a_n} = \frac{S_n^1-b_n}{a_n}+\frac{S_n^2-b_n}{a_n}+\cdots+\frac{S_n^k-b_n}{a_n}+\frac{kb_n-b_{nk}}{a_n}$

As $n\to \infty$,

$\displaystyle \frac{S_n^1-b_n}{a_n}+\frac{S_n^2-b_n}{a_n}+\cdots+\frac{S_n^k-b_n}{a_n} \stackrel{d}{\to} Y_1+Y_2+\cdots+Y_k$

where $Y_i$‘s are iid copies of $Y$. Let $Z_{nk}=W_n$ and let

\displaystyle \begin{aligned} W_n' & :=\frac{a_{nk}Z_{nk}}{a_n}-\frac{kb_n-b_{nk}}{a_n} \\ & = \alpha_nW_n+\beta_n \end{aligned}

where $\displaystyle\alpha_n:=\frac{a_{nk}}{a_n}$ and $\displaystyle\beta_n:=-\frac{kb_n-b_{nk}}{a_n}$.

Now $W_n \stackrel{d}{\to} Y$. and $W_n'\stackrel{d}{\to} Y_1+Y_2+\cdots+Y_k$. If we can show that $\alpha_n \to \alpha$ and $\beta_n \to \beta$, then this will ensure

$\alpha Y+\beta \stackrel{d}{=} Y_1+Y_2+\cdots+Y_k$

Note that $\alpha$ and $\beta$ depends only on $k$. Hence the law of $Y$ is stable.

$\square$

# Stable Distributions: An introduction

In this series of posts I will introduce the stable distributions and discuss some of its properties. The theorems and proofs presented here are written in a lucid manner, so that it can be accessible to most of the readers. I tried my best to make this discussion self contained as far as possible so that the readers don’t have to look up to different references to follow the proofs. It is based on the presentation that I did in Measure Theory course in MStat 1st year.

## Introduction

In statistics while modelling continuous data we assume that the data are iid observations coming from some ‘nice’ distribution. Then we do statistical inference. The strongest statistical argument we usually use is the Central Limit Theorem, which states that the sum of a large number of iid variables from a finite variance distribution will tend to be normally distributed. But this finite variance assumption is not always true. For example many real life data, such as financial assets exhibit fat tails. Such finite variance assumption does not hold for heavy tails. So there is a need for an alternative model. One such alternative is the Stable distribution. Of course there are other alternatives. But one good reason to use Stable distribution as an alternative is that they are supported by General Central Limit Theorem.

Notations: $U\stackrel{d}{=} V$ means $U$ and $V$ have same distribution. Throughout this section we assume

$\displaystyle X,X_1,X_2,\ldots \stackrel{iid}{\sim} F \ \ \ \mbox{and} \ S_n=\sum_{i=1}^n X_i$

Motivation: Suppose $F= \mbox{N}(0,\sigma^2)$. In that case we know

$S_n\stackrel{d}{=} \sqrt{n}X$

Motivated from the above relation we question ourselves that can we get a distribution such that the above relation holds with some other constants like $n$ or $\log n$ or $n^{1/3}$ instead of $\sqrt{n}$? Hence we generalise the above relation in the following manner.

Definition: The distribution $F$ is stable (in the broad sense) if for each $n$ there exist constants $c_n >0$ and $\gamma_n$ such that

$S_n \stackrel{d}{=} c_nX+\gamma_n$

and $F$ is non degenerate. $F$ is stable in the strict sense if the above relation holds with $\gamma_n=0$.

# Tracy-Widom Law: Vague convergence and Ledoux Bound

This is a continuation of the previous post available here. In the previous post, we develop the ingredients required for the vague convergence proof.  Let us now return to the random matrix scenario.

Proof of Theorem 4: $X_n$ be a sequence of $n\times n$ GUE matrices. Let $\lambda_1,\ldots, \lambda_n$ be the eigenvalues of $X_n$. Fix $-\infty < t< t' < \infty$. Let us quickly evaluate the limit

$\displaystyle \lim_{n\to \infty} P\left[n^{2/3}\left(\frac{\lambda_i}{\sqrt{n}}-2\right) \not\in (t,t'), \ \ i=1,2, \ldots, n\right]$

Observe that by using Theorem 3, we have

\displaystyle \begin{aligned} & P\left[n^{2/3}\left(\frac{\lambda_i}{\sqrt{n}}-2\right) \not\in (t,t'), \ \ i=1,2, \ldots, n\right] \\ & = P\left[\lambda_i \not\in \left(2\sqrt{n}+\frac{t}{n^{1/6}},2\sqrt{n}+\frac{t'}{n^{1/6}}\right), \ \ i=1,2, \ldots, n \right] \\ & = 1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!}\int\limits_{2\sqrt{n}+\frac{t}{n^{1/6}}}^{2\sqrt{n}+\frac{t'}{n^{1/6}}}\cdots\int\limits_{2\sqrt{n}+\frac{t}{n^{1/6}}}^{2\sqrt{n}+\frac{t'}{n^{1/6}}} \det_{i,j=1}^k K^{(n)}(x_i,x_j)\prod_{i=1}^{k}dx_i \\ & = 1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!} \int_{t}^{t'}\cdots\int_{t}^{t'} \det_{i,j=1}^k \left[\frac1{n^{1/6}}K^{(n)}\left(2\sqrt{n}+\frac{x_i}{n^{1/6}},2\sqrt{n}+\frac{x_j}{n^{1/6}}\right)\right]\prod_{i=1}^k dx_i \end{aligned}

where in the last line we use change of variable formula. Let us define

$\displaystyle A^{(n)}(x,y):=\frac1{n^{1/6}}K^{(n)}\left(2\sqrt{n}+\frac{x}{n^{1/6}},2\sqrt{n}+\frac{y}{n^{1/6}}\right)$

Note that $A^{(n)}$ are kernels since $K^{(n)}$ is a kernel. Let us also define $\displaystyle \phi_n(x) = n^{1/12}\psi_n\left(2\sqrt{n}+\frac{x}{n^{1/6}}\right)$. Then using proposition 1 (c) we have

$\displaystyle A^{(n)}(x,y)=\frac{\phi_n(x)\phi_n'(y)-\phi_n'(x)\phi_n'(y)}{x-y}-\frac1{2n^{1/3}}\phi_n(x)\phi_n(y)$

Note that Proposition 4 implies that  for every $C>1$, $\phi_n \to Ai$ uniformly over a ball of radius C in the complex plane. $\phi_n$ are entire functions. Hence $\phi_n' \to Ai'$ and $\phi_n'' \to Ai''$. Hence for $t\le x,y \le t'$ we have

$\displaystyle ||A^{(n)}(x,y)-A(x,y)|| \to 0$

Hence by continuity lemma for fredholm determinants we have

\displaystyle \begin{aligned} & \lim_{n\to \infty} P\left[n^{2/3}\left(\frac{\lambda_i}{\sqrt{n}}-2\right) \not\in (t,t'), \ \ i=1,2, \ldots, n\right] \\ & = \lim_{n\to\infty} \Delta(A^{(n)}) \\ & = \lim_{n\to\infty} \Delta(A) \\ & = 1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!} \int_{t}^{t'}\cdots\int_{t}^{t'} \det_{i,j=1}^k A(x_i,x_j) \prod_{i=1}^k dx_i \end{aligned}

where the measure is the lebesgue measure on the bounded interval $\displaystyle (t,t')$. This completes the proof of Theorem 4

$\square$

# Tracy-Widom Law: Fredholm determinants and Airy functions

This is a continuation of the previous post available here. This section is quite different from the previous post. Here we present a series of propositions. The proofs of the propositions are not present here. I may include them later. However the proofs of the propositions (except proposition 4) are not necessary in our proof.

In the previous post, we introduced Hermite polynomials and wave functions. We noticed the Hermite polynomials are orthogonal polynomials. There are a plethora of results involving orthonormal polynomials. We merely state only few of them that we will be needing in the our proof.

Proposition 1: Let $H_n(x)$ be the Hermite polynomial of degree $n$. Let $\psi_n(x)$ be the corresponding wave function. Let $\displaystyle K^{(n)}(x,y)=\sum_{k=0}^{n-1} \psi_k(x)\psi_k(y)$. We have the following results.

(a) $H_n(x+t)=\sum_{k=0}^{n} \binom{n}{k}H_k(x)t^{n-k}$

(b) $\displaystyle \frac1{\sqrt{n}}K^{(n)}(x,y)= \frac{\psi_n(x)\psi_{n-1}(y)-\psi_{n-1}(x)\psi_n(y)}{x-y}$

(c) $\displaystyle K^{(n)}(x,y)=\frac{\psi_n(x)\psi_n'(y)-\psi_n'(x)\psi_n(y)}{x-y}-\frac12\psi_n(x)\psi_n(y)$

(d) $\displaystyle \frac{d}{dx}K^{(n)}(x,x)=-\sqrt{n}\psi_n(x)\psi_{n-1}(x)$

Proof: Not included

Let us recall Theorem 3 once again.

Theorem 3: For any measurable subset $A$ of $\mathbb{R}$,

$\displaystyle P\left(\bigcap_{i=1}^n \{\lambda_i \in A\}\right) =1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!}\int_{A^c}\cdots\int_{A^c} \det_{i,j=1}^k K^{(n)}(x_i,x_j)\prod_{i=1}^k dx_i$

Let us replace $A$ by $A^c$ and rewrite the above result as

$\displaystyle P\left(\bigcap_{i=1}^n \{\lambda_i \not\in A\}\right) =1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!}\int_{A}\cdots\int_{A} \det_{i,j=1}^k K^{(n)}(x_i,x_j)\prod_{i=1}^k dx_i$

Now it is right time to see what exactly are we trying to show if we want to prove weak convergence. For that we need some basic idea about Airy function.

# Tracy-Widom Law: The starting point

Hi there!

Introduction: RMT deals with matrices with random variables as its entries. RMT is a fascinating field which is expanding rapidly in recent years. In this series of posts, I will be proving the vague convergence of the celebrated Tracy-Widom Law. I will be mostly following the book ‘An Introduction to Random Matrices’ by Anderson et.al.

Ensembles:

Let $T,W$ be independent $N(0,1/2)$ variables. Define $Z=T+iW$ to be a complex random variable. We say that $Z$ follows standard complex normal distribution and denote it as $Z \sim CN(0,1)$, since $E(Z)=0$ and $E(Z\bar{Z})=1$. Let $U$ be a $n\times n$ random matrix with iid $CN(0,1)$  entries and $V$ be a $n\times n$ random matrix with iid $N(0,1)$  entries. Define

$\displaystyle X=\frac{V+V^T}{\sqrt{2}},\ \ Y=\frac{U+U^*}{\sqrt2}$.

Let $P_n^{(1)}$ denote the law of $X$.  The random matrix $X$ is said to belong to the Gaussian orthogonal ensemble [$\mathcal{H}_n^{(1)}$] (GOE(n)). Observe that $X$ is symmetric and $X_{1,1}$ is a $N(0,2)$ random variable while $X_{1,2}$ is $N(0,1)$ random variable.

Let $P_n^{(2)}$ denote the law of $Y$. Then the random matrix $Y$ is said to belong to the Gaussian unitary ensemble [$\mathcal{H}_n^{(2)}$] (GUE(n)). Observe that $Y$ is hermitian and $Y_{1,1}$ is a $N(0,1)$ random variable while $Y_{1,2}$ is $CN(0,1)$ random variable.

Why the law $P_n^{(\beta)}$ special?

Because the density at $H \in \mathcal{H}_N^{(\beta)}$ is proportional to $\exp\left(-\mbox{tr}\frac{\beta H^2}{4}\right)$. Thus for the law $P_n^{(1)}$, the density at $H$ and $OHO^T$ is same for any orthogonal matrix $O$ and for law $P_n^{(2)}$, the density at $H$ and $UHU^{*}$ is same for any unitary matrix $U$ (and hence the names GOE and GUE). This also enables us to find the joint distribution of the eigenvalues exactly!

Target: Let $\lambda_1^n\le \lambda_2^n \le \cdots \le \lambda_n^n$ be the eigenvalues of a matrix $X$ from GUE/GOE. Then it is known that the empirical distribution of the eigenvalues $L_n:=\frac1n\sum_{i=1}^n \delta_{\lambda_i^n/sqrt{n}}$ converges weakly, in probability to the semicircle distribution $\sigma(x)\,dx$ with density $\displaystyle \sigma(x) =\frac1{2\pi}\sqrt{4-x^2}\mathbb{I}_{|x|\le 2}$. We also have that $\displaystyle \frac1{\sqrt{n}}\lambda_n^n \stackrel{a.e.}{\to} 2$. What we are interested in the limiting distribution of this largest eigenvalue. In fact what we intend to show is that $\displaystyle n^{2/3}\left(\frac{\lambda_n^n}{\sqrt{n}}-2\right)$ converges weakly. In the initial series of posts we will be only showing the vague convergence of above expression.

Let us adapt this convention: If we are using the notation $\lambda_1^n, \ldots \lambda_n^n$ for eigenvalues then they are ordered. If we use $\lambda_1, \ldots, \lambda_n$ for eigenvalues then they are unordered.

Notations: Let $A=((a_{ij}))_{i,j=1}^n$ be a square matrix of order $n$. The determinant of $A$ will be denoted as

$|A|=\det(A)=\det\limits_{i,j=1}^n a_{ij}$

By laplace expansion we also have the following formula for the determinants.

$\det\limits_{i,j=1}^n a_{i,j}=\sum\limits_{\sigma \in S_n} \epsilon_{\sigma} \prod\limits_{i=1}^{n} a_{i,\sigma(i)}$

Theorem 1: Let $X\in \mathcal{H}_n^{(\beta)}$ be a random matrix with law $P_n^{(\beta)}$ $(\beta=1,2)$. Let $\lambda_1^n \le \lambda_2^n \le \ldots \le \lambda_n^n$ denotes the ordered eigenvalues of the random matrix $X$. The joint distribution of the ordered eigenvalues has density with respect to Lebesgue measure which equals

$\displaystyle \hat\rho_{n}^{(\beta)}(x_1,x_2,\ldots,x_n):=n!C_n^{(\beta)}|\Delta(x)|^{\beta}\prod_{i=1}^n e^{-\beta x_i^2/4}\mathbb{I}_{x_1\le x_2\le \ldots \le x_n}$

where $C_n^{(\beta)}$ is the normalizing constant and $\Delta(x)$ is Vandermonde determinant associated with $x=(x_1,x_2,\ldots,x_n)$ which equals to $\displaystyle \det_{i,j=1}^n x_i^{j-1}=\prod_{1\le i.

Proof: The proof will be skipped. Essentially the proof involves the change of variable formula. It takes a certain suitable transformation with a smooth inverse. Then we obtain the required density by integrating out the extra variables. The $\Delta(x)$ part emerges as  the jacobian of the transformation and $e^{-\beta x_i^2/4}$ appears by making the change of variables. However to justify it clearly, lot more mathematics is needed.

Remark: What we are interested in, is the distribution of the unordered eigenvalues which is obtained by symmetrizing the density $\hat\rho_n^{(\beta)}$. We say the joint density of the unordered eigenvalues is given by

$\displaystyle \rho_n^{(\beta)}:=C_n^{(\beta)}|\Delta(x)|^{\beta}\prod_{i=1}^n e^{-\beta x_i^2/4}$

Technically speaking, this is just a symmetrized valid density. This should not be called as the joint density of the unordered eigenvalues as integrating this joint density $n-1$ times gives us a marginal density. But that marginal density is the density of which eigenvalue? The best we can say is that if we choose one eigenvalue randomly, it is the density of that randomly chosen eigenvalue.

# Symmetric Inequalities

Symmetric Sums: Let $x_1,x_2, \ldots, x_n$ be $n$ real numbers. We define symmetric sum $s_k$ as the coefficient of $x^{n-k}$ in the polynomial

$P(x)=(x_1+x)(x_2+x)(x_3+x)\cdots (x_n+x)$

and symmetric average is defined by $d_k=\frac{s_k}{\binom{n}k}$. For example for $4$ variables we have

$s_1=x_1+x_2+x_3+x_4$
$s_2=x_1x_2+x_2x_3+x_3x_4+x_4x_1+x_1x_3+x_2x_4$
$s_3=x_2x_3x_4+x_1x_3x_4+x_1x_2x_4+x_1x_2x_3$
$s_4=x_1x_2x_3x_4$

$\boxed{1}$ Newton’s Inequality: Let $x_1,x_2,\cdots,x_n$ be $n$ non-negative reals. Then for all $k \in \{1,2,\cdots,n-1\}$ we have

$d_k^2 \ge d_{k-1}d_{k+1}$

$\boxed{2}$ Maclaurin’s Inequality: Let $x_1,x_2,\ldots,x_n$ be $n$ nonnegative reals. Let their symmetric averages be $d_1,d_2,\ldots,d_n$. Then we have

$d_1 \ge \sqrt{d_2} \ge \sqrt[3]{d_3} \ge \cdots \ge \sqrt[n]{d_n}$

Proof: We will use induction and Newton’s Inequality. Let us prove that $d_1\ge \sqrt{d_2}$ holds. This inequality is equivalent to

$\displaystyle (x_1+x_2+\cdots+x_n)^2 \ge \frac{2n}{n-1}\underset{i

$\displaystyle \Longleftrightarrow (n-1)\left[\sum_{i=1}^n x_i^2+2\underset{i

$\displaystyle \Longleftrightarrow (n-1)\sum_{i=1}^n x_i^2 \ge 2\underset{i

$\displaystyle \Longleftrightarrow \underset{i

which is obvious.

Suppose the chain of inequalities is true upto $\sqrt[k-1]{d_{k-1}} \ge \sqrt[k]{d_k}$. Then by applying Newton’s inequality we have

$d_{k-1}^k \ge d_k^{k-1} \ge \left[\sqrt{d_{k-1}d_{k+1}}\right]^{k-1}\implies d_{k-1}^{2k} \ge d_{k-1}^{k-1}d_{k+1}^{k-1} \implies d_{k-1}^{k+1} \ge d_{k+1}^{k-1}$

Hence

$d_k^{2(k+1)} \ge d_{k-1}^{k+1}d_{k+1}^{k+1} \ge d_{k+1}^{k-1+k+1} \implies \sqrt[k]{d_k} \ge \sqrt[k+1]{d_{k+1}}$

Thus by induction the proof is complete.

$\boxed{3}$ (own): Let $s_k$ denote the arithmetic mean of $k$-th degree symmetric sum of $n$ (positive) variables. If $k \ge 2$ and $m+1 > k$ for $B$ and $m+k-1 \le n$ for $A$ we have the following inequalities:

$s_m^k \ge s_{m-1}^{k-1}s_{m+k-1} \ \ \ \left[A_{(k,m)}\right], \ \ \ s_m^k \ge s_{m+1}^{k-1}s_{m-k+1} \ \ \ \left[B_{(k,m)}\right]$

Proof: To prove this we use induction on $k$.
For $k=2$, both inequalities are equivalent to $s_m^2 \ge s_{m-1}s_{m+1}$ which is Newton’s inequality. Let us suppose both holds for all $k \le t$
Then for $k=t+1$

$s_m^{t} \stackrel{A_{(t,m)}}{\ge} s_{m-1}^{t-1}s_{m+t-1} = s_{m-1}^{t-1}\sqrt[t]{s_{m+t-1}^t} \stackrel{B_{(t,m+t-1)}}{\ge} s_{m-1}^{t-1}\sqrt[t]{s_{m+t}^{t-1}s_m}$

$\implies s_m^{t+1} \ge s_{m-1}^ts_{m+t} \ \ \ (A_{(t+1,m)})$

And

$s_m^t \stackrel{B_{(t,m)}}{\ge} s_{m+1}^{t-1}s_{m-t+1}=s_{m+1}^{t-1}\sqrt[t]{s_{m-t+1}^t} \stackrel{A_{(t,m-t+1)}}{\ge} s_{m+1}^{t-1}\sqrt[t]{s_{m-t}^{t-1}s_{m}}$

$\implies s_m^{t+1} \ge s_{m+1}^ts_{m-t} \ \ \ (B_{(t+1, m)})$

Hence by induction the two inequalities are true.