cherubino

A tensor method for semi-supervised learning – Francesco Tudisco (GSSI)

Abstract

Semi-supervised learning is the problem of finding clusters in a graph or a point-clould dataset where we are given “few” initial input labels. Label Spreading (LS) is a standard technique for this problem, which can be interpreted as a diffusion process of the labels on the graph. While there are many variants of LS, nearly all of them are linear models which, for every node, only account for the information incoming from its direct neighbors. Recent work in network science has shown that in many graph algorithms a great advantage can be obtained when accounting directly for higher-order features. Such features may be built from the point-cloud data or the adjacency matrix, for example by considering a triangle involving nodes i, j and k. In other contexts, higher-order information appears explicitly, for example, in a coauthorship network, a document with three authors forms a natural triangle. After reviewing the original LS algorithm I will present our proposed variation of it, which takes directly advantage of higher-order information. A key point of the proposed method is that we replace the standard Laplacian matrix with a nonlinear Laplacian-inspired map which is defined in terms of a order-three tensor. Just like standard LS, we can show convergence of the new nonlinear diffusion process to the gloabal minimum of a constrained semi-supervised loss function that enforces local and global consistency with the input labels. We demonstrate the efficiency and efficacy of our approach on a variety of point cloud and network datasets, where the proposed model outperforms classical label spreading, hypergraph clustering methods, and graph neural networks. I will also point out some open problems related to both the modeling part and the computational one.

Torna in cima