Locally linear embedding

Contrary to Isomap, which works with the pairwise distances, this algorithm is based on the assumption that a high-dimensional dataset lying on a smooth manifold can have local linear structures that it tries to preserve during the dimensionality reduction process. Locally Linear Embedding (LLE), like Isomap, is based on three steps. The first one is applying the k-nearest neighbor algorithm to create a directed graph (in Isomap, it was undirected), where the vertices are the input samples and the edges represent a neighborhood relationship. As the graph is direct, a point xi can be a neighbor of xj, but the opposite could be false. It means that the weight matrix can be asymmetric.

The second step is based on the main assumption of local linearity. For example, consider the following graph:

Graph where a neighborhood is marked with a shaded rectangle

The rectangle delimits a small neighboorhood. If we consider the point x5, the local linearity assumption allows us to think that x5 = w56x6 + w53x3without considering the cyclic relationship. This concept can be formalized for all N P-dimensional points through the minimization of the following function:

In order to address the problem of low-rank neighborhood matrices (think about the previous example, with a number of neighbors equal to 20), Scikit-Learn also implements a regularizer that is based on a small arbitrary additive constant that is added to the local weights (according to a variant called Modified LLE or MLLE). At the end of this step, the matrix W that better matches the linear relationships among neighbors will be selected for the next phase.

In the third step, locally linear embedding tries to determine the low-dimensional (Q < P) representation that best reproduces the original relationship among nearest neighbors. This is achieved by minimizing the following function:

The solution for this problem is obtained through the adoption of the Rayleigh-Ritz method, an algorithm to extract a subset of eigenvectors and eigenvalues from a very large sparse matrix. For further details, read A spectrum slicing method for the Kohn–Sham problem, Schofield G. Chelikowsky J. R.; Saad Y., Computer Physics Communications. 183. The initial part of the final procedure consists of determining the matrix D:

It's possible to prove the last eigenvector (if the eigenvalues are sorted in descending order, it's the bottom one) has all components v1(N)v2(N), ..., vN(N) = v, and the corresponding eigenvalue is null. As Saul and Roweis (An introduction to locally linear embedding, Saul L. K., Roweis S. T.) pointed out, all the other Q eigenvectors (from the bottom) are orthogonal, and this allows them to have zero-centered embedding. Hence, the last eigenvector is discarded, while the remaining Q eigenvectors determine the embedding vectors φi.

For further details about MLLE, please refer to MLLE: Modified Locally Linear Embedding Using Multiple Weights, Zhang Z., Wang J., http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.382.