gNE

Python package for geometric Neighbour Embeddings

This is work in progress and the descriptions below are subject to change. The package development is still at an early stage and the github repo will remain private for now. But I’m always happy to share my progress!

Let \(X\) be a finite subset of a Riemannian manifold \((M,g_M)\) and let \((N,g_N)\) be a chosen target Riemannian manifold. Denote by \(\phi, \psi: \mathbb{R}_{\geq 0} \to [0,1]\) two proximity functions, i.e. functions that map distances \(d_M(x_i,x_j)\) and \(d_N(y_i,y_j)\) to proximities/similarities.1

gNE (/ˈdʒ:ɪni/, like Genie from Aladdin) then finds an embedding

\[f: X \hookrightarrow (N,g_N)\]

that tries to preserve neighbour proximities: \(\phi( d_M(x_i, x_j) ) \simeq \psi(d_N(y_i,y_j))\) by minimizing a specified Loss function \(\mathcal{L}(\phi(X), \psi(Y))\), where \(Y=f(X)\). By default this is cross entropy (or equivalently Kullback-Leibler divergence, since \(\phi(d_M(x_i,x_j))\) is constant)

\[\mathcal L_{\mathrm{KL}}(X,Y) = - \sum_{i,j} \phi(d_M(x_i,x_j)) \log \psi(d_N(y_i,y_j))\]

\((X,\phi)\) can be viewed as a complete weighted graph. Instead, as is typical in neighbourhood embeddings, gNE by default considers only the kNN-Graph weighted by the proximities determined by \(\phi\).2

Features

  • Model geometries (Euclidean, Hyperbolic, Sphere, SPD, and their products)
  • Custom geometries via specification of Riemannian metric \(g_M\), specified in some high dimensional \(\mathbb{R}^n\). Get geodesic distances from this. (available: Sasakian metric on tangent bundle)
  • Finite metric spaces \((X,d_X)\)
  • Weighted graphs \((G,w_G)\)
  • Standard maps from finite metric space\((X,d_X)\) to e.g. weighted kNN-Graph, VR-Graph, …
  • Loss functions for graphs and complexes (available: Cross Entropy, Kullback-Leibler, their symmetrization, …)

  • Choice of proximity function \(\phi\) (available: Gaussian, Laplace, t-distribution, …)
  • Choice of point on attraction-repulsion spectrum (available: t-SNE and UMAP-like choices on attraction-repulsion spectrum)

Standard pipeline of gNE

model(source geometry, target geometry)
model()

Further Ideas

  • provide an implementation for weighted complexes! In the loss function can e.g. use generalization from sum over edges \(\sum_{i,j} = \sum_{e}\) to a sum over all simplices. In that case minimizing the loss function should not only keep pairwise proximities similar, but also keep n-ary proximities. Note that in analogy of loss function for edges, one can ask about geometric properties of higher simplices, e.g. area or volume of the simplex in target space. Needs implementation of
    • Weighted complexes \((C,w_C)\)
    • Standard maps to produce kNN-Complex, VR-Complex, … from \((X,d_X)\)
    • Loss function for weighted complexes
  • For future purposes want to implement Hamiltonian dynamics on a given geometry. For this need tensors and differential forms.

  • parametric gNE, i.e. use \((X, d_X) \subset (M,g_M)\) to learn a map \(f:M \to N\) such that one can calculate \(f(x)\) for any \(x \in M\) that was not originally in \(X\). As far as I understand it, this is usually done with neural nets and relies on a completely different Ansatz. Currently I do not plan to follow up on this idea.

  1. According to Böhm-Berens-Kobak (2022), the choice doesn’t really matter. Could probably hard-code it to be the Cauchy similarity kernel \(\phi(d(x_i,x_j)) = \frac{1}{1+a d(x_i, x_j)^{2b}}\) 

  2. Again, the choice of proximity kernel seems to be ultimately irrelevant. Typically one either uses a Gaussian kernel or binary adjacencies (in either case normalized such as to obtain a probability distribution). 

References

2022

  1. Attraction-Repulsion Spectrum in Neighbor Embeddings
    Jan Niklas Böhm, Philipp Berens, and Dmitry Kobak
    2022

2021

  1. Symmetric Spaces for Graph Embeddings: A Finsler-Riemannian Approach
    Federico López, Beatrice Pozzetti, Steve Trettel, and 2 more authors
    In International Conference on Machine Learning, 2021

2018

  1. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
    Leland McInnes, John Healy, and James Melville
    2018

2017

  1. Poincaré Embeddings for Learning Hierarchical Representations
    Maximillian Nickel, and Douwe Kiela
    In Advances in Neural Information Processing Systems 30, 2017

2008

  1. Visualizing Data Using T-SNE.
    Laurens Maaten, and Geoffrey Hinton
    2008

2002

  1. Stochastic Neighbor Embedding
    Geoffrey E Hinton, and Sam Roweis
    In Advances in Neural Information Processing Systems, 2002