A tensor method for semi-supervised learning

Data Seminario: 
Ora Inizio: 
Ora Fine: 
Francesco Tudisco

Semi-supervised learning is the problem of finding clusters in a graph
or a point-clould dataset where we are given "few" initial input labels.
Label Spreading (LS) is a standard technique for this problem, which can
be interpreted as a diffusion process of the labels on the graph. While
there are many variants of LS, nearly all of them are linear models
which, for every node, only account for the information incoming from
its direct neighbors.

Recent work in network science has shown that in many graph algorithms a
great advantage can be obtained when accounting directly for
higher-order features. Such features may be built from the point-cloud
data or the adjacency matrix, for example by considering a triangle
involving nodes i, j and k. In other contexts, higher-order information
appears explicitly, for example, in a coauthorship network, a document
with three authors forms a natural triangle.

After reviewing the original LS algorithm I will present our proposed
variation of it, which takes directly advantage of higher-order
information. A key point of the proposed method is that we replace the
standard Laplacian matrix with a nonlinear Laplacian-inspired map which
is defined in terms of a order-three tensor.

Just like standard LS, we can show convergence of the new nonlinear
diffusion process to the gloabal minimum of a constrained
semi-supervised loss function that enforces local and global consistency
with the input labels. We demonstrate the efficiency and efficacy of our
approach on a variety of point cloud and network datasets, where the
proposed model outperforms classical label spreading, hypergraph
clustering methods, and graph neural networks. I will also point out
some open problems related to both the modeling part and the
computational one.