# Tracy-Widom Law: The starting point

Hi there!

Introduction: RMT deals with matrices with random variables as its entries. RMT is a fascinating field which is expanding rapidly in recent years. In this series of posts, I will be proving the vague convergence of the celebrated Tracy-Widom Law. I will be mostly following the book ‘An Introduction to Random Matrices’ by Anderson et.al.

Ensembles:

Let $T,W$ be independent $N(0,1/2)$ variables. Define $Z=T+iW$ to be a complex random variable. We say that $Z$ follows standard complex normal distribution and denote it as $Z \sim CN(0,1)$, since $E(Z)=0$ and $E(Z\bar{Z})=1$. Let $U$ be a $n\times n$ random matrix with iid $CN(0,1)$  entries and $V$ be a $n\times n$ random matrix with iid $N(0,1)$  entries. Define

$\displaystyle X=\frac{V+V^T}{\sqrt{2}},\ \ Y=\frac{U+U^*}{\sqrt2}$.

Let $P_n^{(1)}$ denote the law of $X$.  The random matrix $X$ is said to belong to the Gaussian orthogonal ensemble [$\mathcal{H}_n^{(1)}$] (GOE(n)). Observe that $X$ is symmetric and $X_{1,1}$ is a $N(0,2)$ random variable while $X_{1,2}$ is $N(0,1)$ random variable.

Let $P_n^{(2)}$ denote the law of $Y$. Then the random matrix $Y$ is said to belong to the Gaussian unitary ensemble [$\mathcal{H}_n^{(2)}$] (GUE(n)). Observe that $Y$ is hermitian and $Y_{1,1}$ is a $N(0,1)$ random variable while $Y_{1,2}$ is $CN(0,1)$ random variable.

Why the law $P_n^{(\beta)}$ special?

Because the density at $H \in \mathcal{H}_N^{(\beta)}$ is proportional to $\exp\left(-\mbox{tr}\frac{\beta H^2}{4}\right)$. Thus for the law $P_n^{(1)}$, the density at $H$ and $OHO^T$ is same for any orthogonal matrix $O$ and for law $P_n^{(2)}$, the density at $H$ and $UHU^{*}$ is same for any unitary matrix $U$ (and hence the names GOE and GUE). This also enables us to find the joint distribution of the eigenvalues exactly!

Target: Let $\lambda_1^n\le \lambda_2^n \le \cdots \le \lambda_n^n$ be the eigenvalues of a matrix $X$ from GUE/GOE. Then it is known that the empirical distribution of the eigenvalues $L_n:=\frac1n\sum_{i=1}^n \delta_{\lambda_i^n/sqrt{n}}$ converges weakly, in probability to the semicircle distribution $\sigma(x)\,dx$ with density $\displaystyle \sigma(x) =\frac1{2\pi}\sqrt{4-x^2}\mathbb{I}_{|x|\le 2}$. We also have that $\displaystyle \frac1{\sqrt{n}}\lambda_n^n \stackrel{a.e.}{\to} 2$. What we are interested in the limiting distribution of this largest eigenvalue. In fact what we intend to show is that $\displaystyle n^{2/3}\left(\frac{\lambda_n^n}{\sqrt{n}}-2\right)$ converges weakly. In the initial series of posts we will be only showing the vague convergence of above expression.

Let us adapt this convention: If we are using the notation $\lambda_1^n, \ldots \lambda_n^n$ for eigenvalues then they are ordered. If we use $\lambda_1, \ldots, \lambda_n$ for eigenvalues then they are unordered.

Notations: Let $A=((a_{ij}))_{i,j=1}^n$ be a square matrix of order $n$. The determinant of $A$ will be denoted as

$|A|=\det(A)=\det\limits_{i,j=1}^n a_{ij}$

By laplace expansion we also have the following formula for the determinants.

$\det\limits_{i,j=1}^n a_{i,j}=\sum\limits_{\sigma \in S_n} \epsilon_{\sigma} \prod\limits_{i=1}^{n} a_{i,\sigma(i)}$

Theorem 1: Let $X\in \mathcal{H}_n^{(\beta)}$ be a random matrix with law $P_n^{(\beta)}$ $(\beta=1,2)$. Let $\lambda_1^n \le \lambda_2^n \le \ldots \le \lambda_n^n$ denotes the ordered eigenvalues of the random matrix $X$. The joint distribution of the ordered eigenvalues has density with respect to Lebesgue measure which equals

$\displaystyle \hat\rho_{n}^{(\beta)}(x_1,x_2,\ldots,x_n):=n!C_n^{(\beta)}|\Delta(x)|^{\beta}\prod_{i=1}^n e^{-\beta x_i^2/4}\mathbb{I}_{x_1\le x_2\le \ldots \le x_n}$

where $C_n^{(\beta)}$ is the normalizing constant and $\Delta(x)$ is Vandermonde determinant associated with $x=(x_1,x_2,\ldots,x_n)$ which equals to $\displaystyle \det_{i,j=1}^n x_i^{j-1}=\prod_{1\le i.

Proof: The proof will be skipped. Essentially the proof involves the change of variable formula. It takes a certain suitable transformation with a smooth inverse. Then we obtain the required density by integrating out the extra variables. The $\Delta(x)$ part emerges as  the jacobian of the transformation and $e^{-\beta x_i^2/4}$ appears by making the change of variables. However to justify it clearly, lot more mathematics is needed.

Remark: What we are interested in, is the distribution of the unordered eigenvalues which is obtained by symmetrizing the density $\hat\rho_n^{(\beta)}$. We say the joint density of the unordered eigenvalues is given by

$\displaystyle \rho_n^{(\beta)}:=C_n^{(\beta)}|\Delta(x)|^{\beta}\prod_{i=1}^n e^{-\beta x_i^2/4}$

Technically speaking, this is just a symmetrized valid density. This should not be called as the joint density of the unordered eigenvalues as integrating this joint density $n-1$ times gives us a marginal density. But that marginal density is the density of which eigenvalue? The best we can say is that if we choose one eigenvalue randomly, it is the density of that randomly chosen eigenvalue.

### Hermite Polynomials

From now on, we will concentrate only on GUE. We now introduce Hermite Polynomials and associated wave function.

The $n$-th Hermite polynomial is defined as $H_n(x): =(-1)^ne^{x^2/2}\frac{d^n}{dx^n}e^{-x^2/2}$. Hermite polynomials are actually polynomials! In fact $H_n(x)$ is a monic polynomial of degree $n$. The associated $n$-th normalized wave function is defined as

$\displaystyle \psi_n(x):=\frac{e^{-x^2/4}H_n(x)}{\sqrt{\sqrt{2\pi}n!}}$

That weird $\sqrt{\sqrt{2\pi}n!}$ scalling factor is used so that $\{\psi_n(x)\}$ forms an orthonormal basis for $Sp\{x^k\exp(-x^2/4)\}$, that is, we have

$\int_{\mathbb{R}}\psi_k(x)\psi_l(x)\,dx=\delta_{kl}$

Let us define  $\displaystyle K^{(n)}(x,y)=\sum_{k=0}^{n-1}\psi_k(x)\psi_k(y)$ to be the Hermite Kernel. With this $K^{(n)}$ in hand, we have a different expression for $\rho_n^{(2)}$.

Theorem 2: Consider the density $\rho_n^{(\beta)}(x_1,x_2,\ldots,x_n)$ stated in the remark of the previous theorem. We have

$\displaystyle \rho_{n}^{(2)}(x_1,x_2,\ldots,x_n)= \frac1{n!}\det_{k,l=1}^n K^{(n)}(x_k,x_l)$

where $\displaystyle K^{(n)}(x,y):=\sum_{k=0}^{n-1}\psi_k(x)\psi_k(y)$.

Proof: We need the following lemma to prove this theorem.

Lemma 1: For any square-integrable functions $f_1,f_2,\ldots,f_n$ and $g_1,g_2,\ldots,g_n$ on the real line, we have

$\displaystyle \frac1{n!}\int\cdots\int \det_{i,j=1}^n \left(\sum_{k=1}^n f_k(x_i)g_k(x_j)\right)\prod_{i=1}^n dx_i$
$\displaystyle =\frac1{n!}\int\cdots\int\det_{i,j=1}^n f_i(x_j)\det_{i,j=1}^n g_i(x_j)\prod_{i=1}^n dx_i=\det_{i,j=1}^n\int f_i(x)g_j(x)\,dx$

Proof of Lemma: The first equality follows easily due to the fact that $|AB|=|A||B|$. Enough to prove the last equality. By laplace expansion we have

\displaystyle \begin{aligned} & \int\cdots\int\det_{i,j=1}^n f_i(x_j)\det_{i,j=1}^n g_i(x_j)\prod_{i=1}^n dx_i\\ & =\int\cdots\int \sum_{\sigma,\tau\in S_n} \epsilon_{\sigma} \epsilon_{\tau} \prod_{i=1}^n f_{\sigma(i)}(x_i)\prod_{i=1}^n g_{\tau(i)}(x_i)\prod_{i=1}^n dx_i \ \ \ \mbox{(Laplace Expansion)} \\ & = \sum_{\sigma,\tau\in S_n} \epsilon_{\sigma} \epsilon_{\tau} \int\cdots\int \prod_{i=1}^n f_{\sigma(i)}(x_i)g_{\tau(i)}(x_i)\prod_{i=1}^n dx_i\\ & = \sum_{\sigma,\tau\in S_n} \epsilon_{\sigma} \epsilon_{\tau} \prod_{i=1}^n\int f_{\sigma(i)}(x_i)g_{\tau(i)}(x_i) dx_i \ \ \ \mbox{(The expression factors out)} \\ & = \sum_{\sigma,\tau\in S_n} \epsilon_{\sigma} \epsilon_{\tau} \prod_{i=1}^n\int f_{\sigma(i)}(x)g_{\tau(i)}(x)\,dx\ \ \ (x_i \mbox{ is a dummy variable}) \\ & = n!\sum_{\tau\in S_n} \epsilon_{\tau} \prod_{i=1}^n\int f_{i}(x)g_{\tau(i)}(x) dx \\ & = n!\det_{i,j=1}^n\int f_i(x)g_j(x)\,dx \ \ \ \mbox{(Laplace expansion in reverse direction)}\end{aligned}

In the penultimate line, we permutated each product such that the  $f_i$‘s are in order and then we take the over $\sigma$ which gives rise to $n!$. Note that the $\tau$ is different from the one in above line.

$\square$

Proof of Theorem 2: The Vandermonde matrix is given by

$\begin{pmatrix} 1 & x_1 & x_1^2 & \ldots & x_1^{n-1}\\ 1 & x_1 & x_1^2 & \ldots & x_1^{n-1}\\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & x_1 & x_1^2 & \ldots & x_1^{n-1} \end{pmatrix}$

Note that if we replace $i$-th column ($C_i$ say) of the matrix by $[C_i + \mbox{some linear combination of first (i-1) columns}]$, then the value of the determinant will remain same. So, we can replace $i$-th column by a monic polynomial of degree $i$. We will replace the columns with Hermite polynomials.

So, we have $\displaystyle |\Delta(x)|^2=\left(\det_{i,j=1}^n x_i^{j-1} \right)^2 = \left(\det_{i,j=1}^n H_{j-1}(x_i)\right)^2$.

Hence $\displaystyle \rho_n^{(2)}(x_1,x_2,\ldots,x_n) \propto \left(\det_{i,j=1}^n H_{j-1}(x_i)\right)^2\prod_{i=1}^n e^{-x_i^2/2}=\left(\det_{i,j=1}^n H_{j-1}(x_i)e^{-x_i^2/4}\right)^2$.

Again, $\displaystyle \left(\det_{i,j=1}^n H_{j-1}(x_i)e^{-x_i^2/4}\right)^2 \propto \left(\det_{i,j=1}^n \psi_{j-1}(x_i)\right)^2=\det_{i,j=1}^n \sum_{k=0}^{n-1} \psi_k(x_i)\psi_k(x_j)=\det_{i,j=1}^n K^{(n)}(x_i,x_j)$.

where the penultimate equality follows from the fact that $|AB|=|A||B|$.

Now if we plug $f_i=g_i=\psi_{i-1}$ in the lemma, then we have

\displaystyle \begin{aligned} & \frac1{n!}\int\cdots\int \det_{i,j=1}^n K^{(n)}(x_i,x_j)\prod_{i=1}^n dx_i\\ & =\frac1{n!}\int\cdots\int \det_{i,j=1}^n \left(\sum_{k=1}^n \psi_{k-1}(x_i)\psi_{k-1}(x_j)\right)\prod_{i=1}^n dx_i \\ & = \det_{i,j=1}^n \int \psi_{i-1}(x)\psi_{j-1}(x)\,dx =\det_{i,j=1}^n \delta_{ij}=1\end{aligned}

Hence $\frac1{n!}$ is the normalising constant. This completes the proof of theorem 2.

$\square$

Theorem 3: For any measurable subset $A$ of $\mathbb{R}$,

$\displaystyle P\left(\bigcap_{i=1}^n \{\lambda_i \in A\}\right) =1+\sum_{k=1}^{\infty} \frac{(-1)^k}{k!}\int_{A^c}\cdots\int_{A^c} \det_{i,j=1}^k K^{(n)}(x_i,x_j)\prod_{i=1}^k dx_i$

where $\lambda_i$‘s are the unordered eigenvalues of $X$ belonging to GUE. [Note that it doesn’t matter whether it is ordered or unordered.]

Proof:  By theorem 2, we have

\displaystyle \begin{aligned} P_n^{(2)}\left(\bigcap_{i=1}^n \{\lambda_i \in A\}\right) & = \int_A\cdots \int_A \rho_n^{(2)}(x_1,x_2,\ldots,x_n)\prod_{i=1}^n\,dx_i \\ & = \frac1{n!}\int_A\cdots\int_A \det_{i,j=1}^n K^{(n)}(x_i,x_j) \prod_{i=1}^n dx_i\\ & = \frac1{n!}\int_A\cdots\int_A \det_{i,j=1}^n \left(\sum_{k=0}^{n-1} \psi_k(x_i)\psi_k(x_j)\right)\prod_{i=1}^n dx_i\\ & = \det_{i,j=0}^{n-1} \int_A \psi_i(x)\psi_j(x)\,dx\end{aligned}

The last equality above follows from Lemma 1.

======================================================================
Let us now pause for a bit. We digress a bit and discuss a formula involving determinants. Consider a square matrix $A=((a_{ij}))_{i,j=1}^n$. Suppose we want write $\det(I-A)$ in increasing degrees $a_{ij}$‘s. Indeed, $\det( I - A)$ is an inhomogeneous polynomial in the $a_{ij}$. The term of degree $0$ is obtained from the term of degree $0$ in the product of the diagonal elements of $I-A$, therefore it is $1$.

The term of degree $1$ is obtained by taking the $1$ from $n-1$ diagonal entries and multiply them by the variable $a_{ii}$ in the remaining diagonal entry, in all the possible ways (that are $\binom{n}{1} = n$); this gives rise to

$\displaystyle (-1)^1 \sum\limits_{1\leq v_1\leq n} a_{v_1 v_1}$

The term of degree $2$ is obtained by taking $n-2$ diagonal $1$‘s multiplied by the $2 \times 2$ minor obtained from the remaining two rows/columns; you have $\binom{n}{2}$ possible ways to do that, as many as the $2 \times 2$ minors centered on the diagonal; this gives rise to

$\displaystyle (-1)^2 \sum\limits_{1\leq v_1

In general, the term of degree $k$ is obtained by taking $n-k$ diagonal $1$‘s multiplied by the complementary $k\times k$ minor centered on the diagonal. This gives rise to

$\displaystyle (-1)^k \sum\limits_{1\leq v_1

So we have the following formula

$\displaystyle \det(I-A)=1+\sum_{k=1}^{n} (-1)^k\sum\limits_{1\leq v_1

======================================================================

Coming back to the proof of the theorem, let us observe that using the above formula we have

\displaystyle \begin{aligned} & \det_{i,j=0}^{n-1} \int_{A} \psi_{i}(x)\psi_j(x)\,dx \\ & = \det_{i,j=0}^{n-1} \left(\delta_{ij}-\int_{A^c} \psi_i(x)\psi_j(x)\,dx\right)\\ & = 1+\sum_{k=1}^{n} (-1)^k \sum\limits_{0\leq v_1

where the last equality follows again from Lemma 1.

\displaystyle \begin{aligned} & 1+\sum_{k=1}^n (-1)^k \sum\limits_{0\leq v_1

Note that $((K^{(n)}(x_i,x_j)))_{i,j=1}^k=((\psi_j(x_i)))_{\stackrel{i=1,\ldots,k}{j=1,\ldots,n}}((\psi_i(x_j)))_{\stackrel{i=1,\ldots,n}{j=1,\ldots,k}}$. The above expression inside the integral is just the Cauchy-Binet’s expansion of the $\displaystyle \det_{i,j=1}^k K^{(n)}(x_i,x_j)$. Hence we have

\displaystyle \begin{aligned} & 1+\sum_{k=1}^n \frac{(-1)^k}{k!} \int_{A^c}\cdots\int_{A^c} \sum\limits_{0\leq v_1

where the sum can be taken to infinity because the matrix $((K^{(n)}(x_i,x_j)))_{i,j=1}^k$ is of rank at most $n$. Hence the determinant is zero for $k>n$. This completes the proof.

$\square$

Why this nasty expression of the Theorem 3 is useful at all? It turns out that the right hand side of the theorem 2 is very closely related to the Fredholm determinant of $K^{(n)}$ which has been studied extensively. We will look into some of the key properties of Fredholm determinant in next post.