<< Go back to Posts

Rules for Nice t-SNE embeddings





Introduction

\(t\)-SNE is a nice embedding algorithm, where it helps to show up existing clusters. There are a few rule of thumb to make it better. I will present them shortly. Then, I will present a method to tackle the problem of low contrast, i.e., when all clusters seems connected to each others.

Rules of Thumb

It is recommended to create an embedding following these two steps:

  1. If possible, use PCA to initalize tSNE with the two first eigenvectors (or more if your embedding has more dimensions)
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

X1 = PCA().fit_transform(X)[:,:2]
Y = TSNE(init=X1).fit_transform(X)

When using \(t\)-SNE with an array, this is used by default (scikit-learn tSNE Documentation - see “init”).

  1. Do two successive embeddings: one with a high perplexity followed by another with a low perplexity. The first embedding will focus on large-scale structure while the second will refine item location.

Y0 = TSNE(perplexity=300).fit_transform(X)
Y1 = TSNE(perplexity=30, init=Y0).fit_transform(X)

Problem of Contrast

Sometimes, clusters do not separate from each others and we get a kind of fully connected component. This can be clustered but with low accuracy.

Low Perplexity Embedding

  1. Step 1 with a low perplexity

Without the Trick

  1. Step 2 with a low perplexity

Trick Description

As you can see, there is a continuum between the different high-density locations.

In some way, this is “the truth”, what you really get from your data. However, \(t\)-SNE is a tool for data analysis / data visualization. The data will not be reused afterwards, therefore modifying the outcome to amplify the clusters is not a big deal.

The idea of the trick is the following: \(t\)-SNE is unable to separate correctly the items, because the distance between them are not sufficiently discriminative.

To improve contrast, we will modify distances. Norammly, \(t\)-SNE works with a matrix composed of:

\[D_{i, j} = d(\mathbf{x}_i, \mathbf{x}_j)\]

To enhance the difference, the second embedding (the one with the small perplexity) will search for a new embedding using the modified distance function:

\[D'_{i, j} = d(\mathbf{x}_i, \mathbf{x}_j) \times d^{tSNE}(\mathbf{y}_i, \mathbf{y}_j)\]

where \(d^{tSNE}(\mathbf{y}_i, \mathbf{y}_j) = \|\mathbf{y}_i - \mathbf{y}_j\|^2\) is the square Euclidian distance between items measured over the first high-perplexity embedding.

This way, thanks to \(Y\), we have a rough notion of proximity that we can take into account.

Code

# D: Square distance matrix

Y0 = TSNE(perplexity=300, metric="precomputed", init="random").fit_transform(D)

D2 = ((Y0[:, np.newaxis] - Y0)**2).sum(axis=2)

# Enhanced Embedding
Y2  = TSNE(perplexity=30, init=Y0, metric="precomputed").fit_transform(D * D2)

Disadvantage

The drawback of this method is its slowness. Indeed, when giving to TSNE() an \(n\)-D array, it can use an optimization trick. Here, because we feed a distance matrix, first there is a need for memory to store this matrix, and next the optimization trick cannot be used.

As a suggestion, it is possible to concatenate the first embedding to the dataset to transform the following way:

# Initial embedding
Y0 = TSNE(perplexity=300).fit_transform(X)

k = 10 # scaling factor
X1 = np.concatenate([X, Y0 * k], axis=1)

Here, k should vary with the number of dimension in X and the desired effect.

Result With the Trick

After Clustering (using Mean-Shift)

Enhanced Embedding

Low Perplexity

You can see that the clusters found in the enhanced embedding are coherent with what exists in the primary embedding.

High Perplexity (without trick)

You can see that main dense areas are well recognized. However, it depends for items near the boundary.

Adjusting the Contrast

Instead of D * D2, we could modify the strength of D2. Here are a few examples where D2 is replaced by D2**x:

When increasing x above 1, the effect are limited



>> You can subscribe to my mailing list here for a monthly update. <<