Federico Landi

I am a Ph.D. Candidate at AimageLab at the University of Modena and Reggio Emilia, Italy.

My research interests lie at the intersection of Computer Vision, Deep Learning, and Robotics, in the fascinating topic of Embodied AI. I work under the supervision of Prof. Rita Cucchiara.

As part of my Master Thesis, I was a visiting student at University of Amsterdam (UVA) where I worked under the supervision of Prof. Cees Snoek.

Email  /  CV  /  Google Scholar  /  Github  /  LinkedIn

profile photo

In the first part of my Ph.D. I tackled the recent task of Vision-and-Language Navigation. More recently, I developed a strong interest in Embodied exploration and navigation, as well as in Recurrent Neural Networks.

Focus on Impact: Indoor Exploration with Intrinsic Motivation
Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
Under review, 2021
arXiv  /  bibtex

We devise an impact-based intrinsic reward to train embodied exploration agents in photorealistic indoor environments.

Working Memory Connections for LSTM
Federico Landi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
NEUNET, 2021
arXiv  /  bibtex

A simple heuristic improvement boosts LSTM performance on a variety of tasks. Our approach shows more stable training dynamics and faster convergence time when compared to vanilla LSTM and peephole LSTM.

Multimodal Attention Networks for Low-Level Vision-and-Language Navigation
Federico Landi, Lorenzo Baraldi, Marcella Cornia, Massimiliano Corsini, Rita Cucchiara
CVIU, 2021
arXiv  /  bibtex  /  code

The first fully-attentive approach to Vision-and-Language Navigation (VLN). We achieve state-of-art performance on low-level VLN, a setting which is rarely considered in the literature given its additional difficulty.

Out of the Box: Embodied Navigation in the Real World
Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
CAIP, 2021
arXiv  /  bibtex  /  code  /  slides

We detail how to deploy a navigation policy trained on the Habitat simulator on a LoCoBot. Additionally, we study the performance on five different PointGoal navigation episodes in a real-world, challenging setting.

Transform, Warp, and Dress: A New Transformation-Guided Model for Virtual Try-On
Matteo Fincato, Marcella Cornia, Federico Landi, Fabio Cesari, Rita Cucchiara
TOMM, 2021

We present a new dataset of upper-body clothes for virtual try-on with high-resolution images. We also propose a new model for virtual try-on that can generate high-quality images using a three-stage pipeline.

VITON-GT: An Image-based Virtual Try-On Model with Geometric Transformations
Matteo Fincato, Federico Landi, Marcella Cornia, Fabio Cesari, Rita Cucchiara
ICPR, 2020
paper  /  bibtex  /  poster  /  slides  /  presentation (video)

We propose a new image-based virtual try-on model that can generate high-quality images. We exploit learnable affine and thin-plate spline transformations, combined with a generative network, to create new images of a person wearing different clothes.

Explore and Explain: Self-supervised Navigation and Recounting
Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara
ICPR, 2020 (Oral presentation)
arXiv  /  bibtex  /  poster  /  slides  /  presentation (video)

A novel setting for Embodied AI in which an agent needs to explore a previously unknown environment while describing what it sees. The proposed model employs a self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation.

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
BMVC, 2019 (Oral presentation)
arXiv  /  bibtex  /  code  /  poster  /  slides  /  talk (video)

In Vision-and-Language Navigation, an agent needs to reach a target destination with the only guidance of a natural language instruction. We exploit dynamic convolutional filters to ground the lingual description into the visual observation in an elegant and efficient way.

Anomaly locality in video surveillance
Federico Landi, Cees Snoek, Rita Cucchiara
ArXiv, 2019
arXiv  /  bibtex  /  dataset

We explore the impact of considering spatiotemporal tubes instead of whole-frame video segments for anomaly detection in video surveillance. We create UCFCrime2Local: the first dataset for anomaly detection with bounding box supervision in both its train and test set.

Reviewing Service


  • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  • IEEE Robotics and Automation Letters (RAL)

  • ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)

  • Pattern Recognition Letters (PRL)


  • ACM International Conference on Multimedia (ACMMM)

  • IEEE International Conference on Robotics and Automation (ICRA)

  • IEEE International Conference on Pattern Recognition (ICPR)

Teaching Activities
  • Computer Architecture - Prof. Rita Cucchiara, Prof. Simone Calderara, 2020-2021

  • Machine Learning and Deep Learning - IFOA, 2020

  • Deep Learning - Nuova Didactica, 2020

Courses and Summer Schools
  • Advanced Course on Data Science and Machine Learning - ACDL 2020, Remote (certificate)

  • International Computer Vision Summer School - ICVSS 2019, Scicli (RG), Italy (certificate)

  • In 2019, I carried on the 3D acquisition of the museum in Galleria Estense, in Modena (see below). One year later, the virtual spaces created for research purpose allowed to offer free guided tours to schools and young students during the Covid-19 lockdown in Italy.

  • I like practicing Shuai Jiao (摔跤) and lifting weights.

I like this website.