Stable Distributions: Generalised CLT and Series Representation Theorem

This is a continuation of the previous post that can be found here. We discuss the Generalised CLT in this post. We give a sloppy proof of the theorem. Then we end with a series representation theorem for the Stable distributions.

The Generalised Central Limit Theorem

Theorem 9: {X_1,X_2,\ldots} are iid symmetric random variables with {P(X_1>\lambda) \sim k\lambda^{-\alpha}} as {\lambda \rightarrow\infty} for some {k>0} and {\alpha \in (0,2)}. Then

\displaystyle \frac{X_1+X_2+\cdots+X_n}{n^{1/\alpha}} \stackrel{d}{\rightarrow} Y

where {Y\sim S\alpha S} with appropiate constants.`

Motivation for proof: Recall the proof of usual CLT. We assume {X_i} are iid random variables mean {0} and variance {\sigma^2} and characteristic function {\phi}. We use the fact that under finite variance assumption we have

\displaystyle \frac{1-\phi(t)}{t^2} \rightarrow \frac{\sigma^2}{2}

and hence using this we get

\displaystyle \begin{aligned} E\left[e^{i\frac{t}{\sqrt{n}}(X_1+X_2+\cdots+X_n)}\right] = \left[E\left(e^{i\frac{t}{\sqrt{n}}X_1}\right)\right]^n & = \left[\phi\left(\frac{t}{\sqrt{n}}\right)\right]^n \\ & = \left[1-\left(1-\phi\left(\frac{t}{\sqrt{n}}\right)\right)\right]^n \\ & \rightarrow \exp(-t^2\sigma^2/2) \end{aligned}

We will this idea in our proof. We will leave some of the technical details of the proof. The proof presented here is not rigourous. We primarily focus on the methodology and tricks used in the proof.

Continue reading

Stable Distributions: Properties

This is a continuation of the previous post that can be found here. We will discuss some of the properties of the Stable distributions in this post.

Theorem 2. Y is the limit of \displaystyle\frac{X_1+X_2+\cdots+X_n-b_n}{a_n} for some iid sequence X_i and sequence a_n >0 and b_n if and only if Y has a stable law.

Remark: This kind of explains why Stable laws are called Stable!

Proof: The if part follows by taking X_i to be stable itself. We focus on the only if part.

Let \displaystyle Z_n=\frac{X_1+X_2+\cdots+X_n-b_n}{a_n}. Fix a k\in \mathbb{N}. Let

\displaystyle S_n^{j}= \sum_{i=1}^n X_{(j-1)n+i}= X_{(j-1)n+1}+X_{(j-1)n+2}+\cdots+X_{jn} \ \ \mbox{for} \ (1\le j \le k)

Now consider Z_{nk}. Observe that

\displaystyle Z_{nk} = \frac{S_n^1+S_n^2+\cdots+S_n^k-b_{nk}}{a_{nk}}

\displaystyle \implies \frac{a_{nk}Z_{nk}}{a_n} = \frac{S_n^1-b_n}{a_n}+\frac{S_n^2-b_n}{a_n}+\cdots+\frac{S_n^k-b_n}{a_n}+\frac{kb_n-b_{nk}}{a_n}

As n\to \infty,

\displaystyle \frac{S_n^1-b_n}{a_n}+\frac{S_n^2-b_n}{a_n}+\cdots+\frac{S_n^k-b_n}{a_n} \stackrel{d}{\to} Y_1+Y_2+\cdots+Y_k

where Y_i‘s are iid copies of Y. Let Z_{nk}=W_n and let

\displaystyle \begin{aligned} W_n' & :=\frac{a_{nk}Z_{nk}}{a_n}-\frac{kb_n-b_{nk}}{a_n} \\ & = \alpha_nW_n+\beta_n \end{aligned}

where \displaystyle\alpha_n:=\frac{a_{nk}}{a_n} and \displaystyle\beta_n:=-\frac{kb_n-b_{nk}}{a_n}.

Now W_n \stackrel{d}{\to} Y. and W_n'\stackrel{d}{\to} Y_1+Y_2+\cdots+Y_k. If we can show that \alpha_n \to \alpha and \beta_n \to \beta, then this will ensure

\alpha Y+\beta \stackrel{d}{=} Y_1+Y_2+\cdots+Y_k

Note that \alpha and \beta depends only on k. Hence the law of Y is stable.

\square

Continue reading

Stable Distributions: An introduction

In this series of posts I will introduce the stable distributions and discuss some of its properties. The theorems and proofs presented here are written in a lucid manner, so that it can be accessible to most of the readers. I tried my best to make this discussion self contained as far as possible so that the readers don’t have to look up to different references to follow the proofs. It is based on the presentation that I did in Measure Theory course in MStat 1st year.

Introduction

In statistics while modelling continuous data we assume that the data are iid observations coming from some ‘nice’ distribution. Then we do statistical inference. The strongest statistical argument we usually use is the Central Limit Theorem, which states that the sum of a large number of iid variables from a finite variance distribution will tend to be normally distributed. But this finite variance assumption is not always true. For example many real life data, such as financial assets exhibit fat tails. Such finite variance assumption does not hold for heavy tails. So there is a need for an alternative model. One such alternative is the Stable distribution. Of course there are other alternatives. But one good reason to use Stable distribution as an alternative is that they are supported by General Central Limit Theorem.

Notations: U\stackrel{d}{=} V means U and V have same distribution. Throughout this section we assume

\displaystyle X,X_1,X_2,\ldots \stackrel{iid}{\sim} F \ \ \ \mbox{and} \ S_n=\sum_{i=1}^n X_i

Motivation: Suppose F= \mbox{N}(0,\sigma^2). In that case we know

S_n\stackrel{d}{=} \sqrt{n}X

Motivated from the above relation we question ourselves that can we get a distribution such that the above relation holds with some other constants like n or \log n or n^{1/3} instead of \sqrt{n}? Hence we generalise the above relation in the following manner.

Definition: The distribution F is stable (in the broad sense) if for each n there exist constants c_n >0 and \gamma_n such that

S_n \stackrel{d}{=} c_nX+\gamma_n

and F is non degenerate. F is stable in the strict sense if the above relation holds with \gamma_n=0.

Continue reading

Tracy-Widom Law: Vague convergence and Ledoux Bound

This is a continuation of the previous post available here. In the previous post, we develop the ingredients required for the vague convergence proof.  Let us now return to the random matrix scenario.

Proof of Theorem 4: X_n be a sequence of n\times n GUE matrices. Let \lambda_1,\ldots, \lambda_n be the eigenvalues of X_n. Fix -\infty < t< t' < \infty. Let us quickly evaluate the limit

\displaystyle \lim_{n\to \infty} P\left[n^{2/3}\left(\frac{\lambda_i}{\sqrt{n}}-2\right) \not\in (t,t'), \ \ i=1,2, \ldots, n\right]

Observe that by using Theorem 3, we have

\displaystyle \begin{aligned} & P\left[n^{2/3}\left(\frac{\lambda_i}{\sqrt{n}}-2\right) \not\in (t,t'), \ \ i=1,2, \ldots, n\right] \\ & = P\left[\lambda_i \not\in \left(2\sqrt{n}+\frac{t}{n^{1/6}},2\sqrt{n}+\frac{t'}{n^{1/6}}\right), \ \ i=1,2, \ldots, n \right] \\ & = 1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!}\int\limits_{2\sqrt{n}+\frac{t}{n^{1/6}}}^{2\sqrt{n}+\frac{t'}{n^{1/6}}}\cdots\int\limits_{2\sqrt{n}+\frac{t}{n^{1/6}}}^{2\sqrt{n}+\frac{t'}{n^{1/6}}} \det_{i,j=1}^k K^{(n)}(x_i,x_j)\prod_{i=1}^{k}dx_i \\ & = 1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!} \int_{t}^{t'}\cdots\int_{t}^{t'} \det_{i,j=1}^k \left[\frac1{n^{1/6}}K^{(n)}\left(2\sqrt{n}+\frac{x_i}{n^{1/6}},2\sqrt{n}+\frac{x_j}{n^{1/6}}\right)\right]\prod_{i=1}^k dx_i \end{aligned}

where in the last line we use change of variable formula. Let us define

\displaystyle A^{(n)}(x,y):=\frac1{n^{1/6}}K^{(n)}\left(2\sqrt{n}+\frac{x}{n^{1/6}},2\sqrt{n}+\frac{y}{n^{1/6}}\right)

Note that A^{(n)} are kernels since K^{(n)} is a kernel. Let us also define \displaystyle \phi_n(x) = n^{1/12}\psi_n\left(2\sqrt{n}+\frac{x}{n^{1/6}}\right). Then using proposition 1 (c) we have

\displaystyle A^{(n)}(x,y)=\frac{\phi_n(x)\phi_n'(y)-\phi_n'(x)\phi_n'(y)}{x-y}-\frac1{2n^{1/3}}\phi_n(x)\phi_n(y)

Note that Proposition 4 implies that  for every C>1, \phi_n \to Ai uniformly over a ball of radius C in the complex plane. \phi_n are entire functions. Hence \phi_n' \to Ai' and \phi_n'' \to Ai''. Hence for t\le x,y \le t' we have

\displaystyle ||A^{(n)}(x,y)-A(x,y)|| \to 0

  Hence by continuity lemma for fredholm determinants we have

\displaystyle \begin{aligned} & \lim_{n\to \infty} P\left[n^{2/3}\left(\frac{\lambda_i}{\sqrt{n}}-2\right) \not\in (t,t'), \ \ i=1,2, \ldots, n\right] \\ & = \lim_{n\to\infty} \Delta(A^{(n)}) \\ & = \lim_{n\to\infty} \Delta(A) \\ & = 1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!} \int_{t}^{t'}\cdots\int_{t}^{t'} \det_{i,j=1}^k A(x_i,x_j) \prod_{i=1}^k dx_i \end{aligned}  

where the measure is the lebesgue measure on the bounded interval \displaystyle (t,t'). This completes the proof of Theorem 4

\square

Continue reading

Tracy-Widom Law: Fredholm determinants and Airy functions

This is a continuation of the previous post available here. This section is quite different from the previous post. Here we present a series of propositions. The proofs of the propositions are not present here. I may include them later. However the proofs of the propositions (except proposition 4) are not necessary in our proof.

In the previous post, we introduced Hermite polynomials and wave functions. We noticed the Hermite polynomials are orthogonal polynomials. There are a plethora of results involving orthonormal polynomials. We merely state only few of them that we will be needing in the our proof.

Proposition 1: Let H_n(x) be the Hermite polynomial of degree n. Let \psi_n(x) be the corresponding wave function. Let \displaystyle K^{(n)}(x,y)=\sum_{k=0}^{n-1} \psi_k(x)\psi_k(y). We have the following results.

(a) H_n(x+t)=\sum_{k=0}^{n} \binom{n}{k}H_k(x)t^{n-k}

(b) \displaystyle \frac1{\sqrt{n}}K^{(n)}(x,y)= \frac{\psi_n(x)\psi_{n-1}(y)-\psi_{n-1}(x)\psi_n(y)}{x-y}

(c) \displaystyle K^{(n)}(x,y)=\frac{\psi_n(x)\psi_n'(y)-\psi_n'(x)\psi_n(y)}{x-y}-\frac12\psi_n(x)\psi_n(y)

(d) \displaystyle \frac{d}{dx}K^{(n)}(x,x)=-\sqrt{n}\psi_n(x)\psi_{n-1}(x)

Proof: Not included

Let us recall Theorem 3 once again.

Theorem 3: For any measurable subset A of \mathbb{R},

\displaystyle P\left(\bigcap_{i=1}^n \{\lambda_i \in A\}\right) =1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!}\int_{A^c}\cdots\int_{A^c} \det_{i,j=1}^k K^{(n)}(x_i,x_j)\prod_{i=1}^k dx_i

Let us replace A by A^c and rewrite the above result as

\displaystyle P\left(\bigcap_{i=1}^n \{\lambda_i \not\in A\}\right) =1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!}\int_{A}\cdots\int_{A} \det_{i,j=1}^k K^{(n)}(x_i,x_j)\prod_{i=1}^k dx_i

Now it is right time to see what exactly are we trying to show if we want to prove weak convergence. For that we need some basic idea about Airy function.

Continue reading

Tracy-Widom Law: The starting point

Hi there!

Introduction: RMT deals with matrices with random variables as its entries. RMT is a fascinating field which is expanding rapidly in recent years. In this series of posts, I will be proving the vague convergence of the celebrated Tracy-Widom Law. I will be mostly following the book ‘An Introduction to Random Matrices’ by Anderson et.al.

Ensembles: 

Let T,W be independent N(0,1/2) variables. Define Z=T+iW to be a complex random variable. We say that Z follows standard complex normal distribution and denote it as Z \sim CN(0,1), since E(Z)=0 and E(Z\bar{Z})=1. Let U be a n\times n random matrix with iid CN(0,1)  entries and V be a n\times n random matrix with iid N(0,1)  entries. Define

\displaystyle  X=\frac{V+V^T}{\sqrt{2}},\ \ Y=\frac{U+U^*}{\sqrt2}.

Let P_n^{(1)} denote the law of X.  The random matrix X is said to belong to the Gaussian orthogonal ensemble [\mathcal{H}_n^{(1)}] (GOE(n)). Observe that X is symmetric and X_{1,1} is a N(0,2) random variable while X_{1,2} is N(0,1) random variable.

Let P_n^{(2)} denote the law of Y. Then the random matrix Y is said to belong to the Gaussian unitary ensemble [\mathcal{H}_n^{(2)}] (GUE(n)). Observe that Y is hermitian and Y_{1,1} is a N(0,1) random variable while Y_{1,2} is CN(0,1) random variable.

Why the law P_n^{(\beta)} special?

Because the density at H \in \mathcal{H}_N^{(\beta)} is proportional to \exp\left(-\mbox{tr}\frac{\beta H^2}{4}\right). Thus for the law P_n^{(1)}, the density at H and OHO^T is same for any orthogonal matrix O and for law P_n^{(2)}, the density at H and UHU^{*} is same for any unitary matrix U (and hence the names GOE and GUE). This also enables us to find the joint distribution of the eigenvalues exactly!

Target: Let \lambda_1^n\le \lambda_2^n \le \cdots \le \lambda_n^n be the eigenvalues of a matrix X from GUE/GOE. Then it is known that the empirical distribution of the eigenvalues L_n:=\frac1n\sum_{i=1}^n \delta_{\lambda_i^n/sqrt{n}} converges weakly, in probability to the semicircle distribution \sigma(x)\,dx with density \displaystyle \sigma(x) =\frac1{2\pi}\sqrt{4-x^2}\mathbb{I}_{|x|\le 2}. We also have that \displaystyle \frac1{\sqrt{n}}\lambda_n^n \stackrel{a.e.}{\to} 2. What we are interested in the limiting distribution of this largest eigenvalue. In fact what we intend to show is that \displaystyle n^{2/3}\left(\frac{\lambda_n^n}{\sqrt{n}}-2\right) converges weakly. In the initial series of posts we will be only showing the vague convergence of above expression.

Let us adapt this convention: If we are using the notation \lambda_1^n, \ldots \lambda_n^n for eigenvalues then they are ordered. If we use \lambda_1, \ldots, \lambda_n for eigenvalues then they are unordered.

Notations: Let A=((a_{ij}))_{i,j=1}^n be a square matrix of order n. The determinant of A will be denoted as

|A|=\det(A)=\det\limits_{i,j=1}^n a_{ij}

By laplace expansion we also have the following formula for the determinants.

\det\limits_{i,j=1}^n a_{i,j}=\sum\limits_{\sigma \in S_n} \epsilon_{\sigma} \prod\limits_{i=1}^{n} a_{i,\sigma(i)}

Theorem 1: Let X\in \mathcal{H}_n^{(\beta)} be a random matrix with law P_n^{(\beta)} (\beta=1,2). Let \lambda_1^n \le \lambda_2^n \le \ldots \le \lambda_n^n denotes the ordered eigenvalues of the random matrix X. The joint distribution of the ordered eigenvalues has density with respect to Lebesgue measure which equals

\displaystyle \hat\rho_{n}^{(\beta)}(x_1,x_2,\ldots,x_n):=n!C_n^{(\beta)}|\Delta(x)|^{\beta}\prod_{i=1}^n e^{-\beta x_i^2/4}\mathbb{I}_{x_1\le x_2\le \ldots \le x_n}

where C_n^{(\beta)} is the normalizing constant and \Delta(x) is Vandermonde determinant associated with x=(x_1,x_2,\ldots,x_n) which equals to \displaystyle \det_{i,j=1}^n x_i^{j-1}=\prod_{1\le i<j\le n} (x_j-x_i).

Proof: The proof will be skipped. Essentially the proof involves the change of variable formula. It takes a certain suitable transformation with a smooth inverse. Then we obtain the required density by integrating out the extra variables. The \Delta(x) part emerges as  the jacobian of the transformation and e^{-\beta x_i^2/4} appears by making the change of variables. However to justify it clearly, lot more mathematics is needed.

Remark: What we are interested in, is the distribution of the unordered eigenvalues which is obtained by symmetrizing the density \hat\rho_n^{(\beta)}. We say the joint density of the unordered eigenvalues is given by

\displaystyle \rho_n^{(\beta)}:=C_n^{(\beta)}|\Delta(x)|^{\beta}\prod_{i=1}^n e^{-\beta x_i^2/4}

Technically speaking, this is just a symmetrized valid density. This should not be called as the joint density of the unordered eigenvalues as integrating this joint density n-1 times gives us a marginal density. But that marginal density is the density of which eigenvalue? The best we can say is that if we choose one eigenvalue randomly, it is the density of that randomly chosen eigenvalue.

Continue reading

Symmetric Inequalities

Symmetric Sums: Let x_1,x_2, \ldots, x_n be n real numbers. We define symmetric sum s_k as the coefficient of x^{n-k} in the polynomial

P(x)=(x_1+x)(x_2+x)(x_3+x)\cdots (x_n+x)

 

and symmetric average is defined by d_k=\frac{s_k}{\binom{n}k}. For example for 4 variables we have

s_1=x_1+x_2+x_3+x_4
s_2=x_1x_2+x_2x_3+x_3x_4+x_4x_1+x_1x_3+x_2x_4
s_3=x_2x_3x_4+x_1x_3x_4+x_1x_2x_4+x_1x_2x_3
s_4=x_1x_2x_3x_4

 

\boxed{1} Newton’s Inequality: Let x_1,x_2,\cdots,x_n be n non-negative reals. Then for all k \in \{1,2,\cdots,n-1\} we have

d_k^2 \ge d_{k-1}d_{k+1}

Proof: See http://www.artofproblemsolving.com/Wiki/index.php/Newton%27s_Inequality

\boxed{2} Maclaurin’s Inequality: Let x_1,x_2,\ldots,x_n be n nonnegative reals. Let their symmetric averages be d_1,d_2,\ldots,d_n. Then we have

d_1 \ge \sqrt{d_2} \ge \sqrt[3]{d_3} \ge \cdots \ge \sqrt[n]{d_n}

Proof: We will use induction and Newton’s Inequality. Let us prove that d_1\ge \sqrt{d_2} holds. This inequality is equivalent to

\displaystyle (x_1+x_2+\cdots+x_n)^2 \ge \frac{2n}{n-1}\underset{i<j}{\sum_{i=1}^n\sum_{j=1}^n} x_ix_j

\displaystyle \Longleftrightarrow (n-1)\left[\sum_{i=1}^n x_i^2+2\underset{i<j}{\sum_{i=1}^n\sum_{j=1}^n} x_ix_j\right] \ge 2n\underset{i<j}{\sum_{i=1}^n\sum_{j=1}^n} x_ix_j

\displaystyle \Longleftrightarrow (n-1)\sum_{i=1}^n x_i^2 \ge 2\underset{i<j}{\sum_{i=1}^n\sum_{j=1}^n} x_ix_j

\displaystyle \Longleftrightarrow \underset{i<j}{\sum_{i=1}^n\sum_{j=1}^n} (x_i-x_j)^2 \geq 0

which is obvious.

Suppose the chain of inequalities is true upto \sqrt[k-1]{d_{k-1}} \ge \sqrt[k]{d_k}. Then by applying Newton’s inequality we have

d_{k-1}^k \ge d_k^{k-1} \ge \left[\sqrt{d_{k-1}d_{k+1}}\right]^{k-1}\implies d_{k-1}^{2k} \ge d_{k-1}^{k-1}d_{k+1}^{k-1} \implies d_{k-1}^{k+1} \ge d_{k+1}^{k-1}

Hence

d_k^{2(k+1)} \ge d_{k-1}^{k+1}d_{k+1}^{k+1} \ge d_{k+1}^{k-1+k+1} \implies \sqrt[k]{d_k} \ge \sqrt[k+1]{d_{k+1}}

 Thus by induction the proof is complete.

\boxed{3} (own): Let s_k denote the arithmetic mean of k-th degree symmetric sum of n (positive) variables. If k \ge 2 and m+1 > k for B and m+k-1 \le n for A we have the following inequalities:

s_m^k \ge s_{m-1}^{k-1}s_{m+k-1} \ \ \ \left[A_{(k,m)}\right], \ \ \ s_m^k \ge s_{m+1}^{k-1}s_{m-k+1} \ \ \ \left[B_{(k,m)}\right]

Proof: To prove this we use induction on k.
For k=2, both inequalities are equivalent to s_m^2 \ge s_{m-1}s_{m+1} which is Newton’s inequality. Let us suppose both holds for all k \le t
Then for k=t+1

s_m^{t} \stackrel{A_{(t,m)}}{\ge} s_{m-1}^{t-1}s_{m+t-1} = s_{m-1}^{t-1}\sqrt[t]{s_{m+t-1}^t} \stackrel{B_{(t,m+t-1)}}{\ge} s_{m-1}^{t-1}\sqrt[t]{s_{m+t}^{t-1}s_m}

\implies s_m^{t+1} \ge s_{m-1}^ts_{m+t} \ \ \ (A_{(t+1,m)})

And

s_m^t \stackrel{B_{(t,m)}}{\ge} s_{m+1}^{t-1}s_{m-t+1}=s_{m+1}^{t-1}\sqrt[t]{s_{m-t+1}^t} \stackrel{A_{(t,m-t+1)}}{\ge} s_{m+1}^{t-1}\sqrt[t]{s_{m-t}^{t-1}s_{m}}

\implies s_m^{t+1} \ge s_{m+1}^ts_{m-t} \ \ \ (B_{(t+1, m)})

Hence by induction the two inequalities are true.

Continue reading