Non-affine parametric dependencies, nonlinearities, and advection-dominated regimes of the model of interest can result in a slow Kolmogorov n-width decay, which precludes the realization of efficient reduced-order models based on Proper Orthogonal Decomposition. Among the possible solutions, there are purely data-driven methods that leverage nonlinear approximation techniques such as autoencoders and their variants to learn a latent representation of the dynamical system, and then evolve it in time with another architecture. Despite their success in many applications where standard linear techniques fail, more has to be done to increase the interpretability of the results, especially outside the training range and not in regimes characterized by an abundance of data. Not to mention that none of the knowledge on the physics of the model is exploited during the predictive phase. In this talk, in order to overcome these weaknesses, I introduce a variant of the nonlinear manifold method introduced in previous works with hyper-reduction achieved through reduced over-collocation and teacher-student training of a reduced decoder. We test the methodology on different problems with increasing complexity.