What can Street View images tell us about how we can make our cities more sustainable? Quite a lot, explains HPI PhD student Marco Cipriano.
In a joint research project between HPI and the MIT Morningside Academy of Design (MAD), the research team consisting of Marco, MIT doctoral student Liu Liu, HPI doctoral student Alexandra Kudaeva, and led by Prof. Andres Sevtsuk (MIT) and Prof. Gerard de Melo (HPI) are working on understanding what makes some streets busier than others. The HPI's Artificial Intelligence and Intelligent Systems department and the Massachusetts Institute of Technology's Department of Urban Studies and Planning are involved in the project.
"We want to understand what factors and aspects lead people to have social interactions," Marco explains the focus of the research. "Understanding how people gather, linger, and interact in public spaces provides critical insights into urban vitality. The objectives of this study will provide critical insights into how to design more sustainable cities."
The team is building on an approach to urban planning primarily pioneered by sociologist William H. Whyte in New York in the 1970s and 1980s. In studies conducted on site and manually at the time, researchers like Whyte observed how people use public spaces. From this, they drew conclusions for better urban design.
Whyte's work continues to shape New York today, explains Liu Liu, PhD student for Computational Urban Design at MIT: "His research revealed that small design details make a big difference – sunlight, the sound of water, street vendors, and even how benches are arranged. Over a decade of observation, his work directly shaped New York City’s urban space guidelines. For example, public spaces are now required to include a proportion of ‘movable’ seating, reflecting his finding that people value flexibility in arranging seats for socializing."
The team wants to revolutionize this type of urban research. They are working on AI-powered tools that analyze freely accessible Street View images on a large scale and use them to draw conclusions not only about the number, but also the type of social activities. They are currently pursuing three main goals, which Liu explains in more detail:
- Define the technical boundaries: "Determine what can currently be detected (e.g., activity recognition) and what can be extended (e.g., detecting social groups) through State-of-the-Art methods."
- Define the data boundaries: "Understand what information can be captured from street-level images and when those images were taken, recognizing that each one is only a snapshot of a specific place and time."
- Prepare results and make them available to researchers: "Develop a process to map all ‘detectable’ and meaningful elements – not just simple metrics like human counts, which are easy but not very insightful. This map will be shared openly with urban design researchers to support their studies."
HPI and MIT are working together on solutions centered on "Designing for Sustainability"
The interdisciplinary composition of the team, consisting of computer scientists from HPI and urban researchers from MIT, was made possible by the research collaboration between the two institutes, which promotes joint projects on sustainable design and digital innovation. This allows the team to benefit from diverse expertise, even across an ocean.
"The Department of Urban Studies and Planning at MIT has urban planners with excellent experience with computer vision and artificial intelligence," Marco reports. "The collaboration was great. Both parties were engaged in the project and committed to delivering results. Of course, personal meetings made a difference. The short research stay at MIT that I had in April was a great bonding experience, which also translated into a more intense and efficient workflow."
The research stays are supplemented by regular joint workshops, which are held alternately at HPI and MIT.
Current AI models still have difficulty recognizing groups of people who belong together
The research team recently submitted a paper on their project to the Association for the Advancement of Artificial Intelligence. In it, they tested existing analysis models for their ability to recognize areas in Street View images where socially connected people can be seen. In other words: Does AI recognize which people in the image belong together and form a group?
The results were sobering at first. Alexandra Kudaeva explains: "Object detection and image segmentation have existed for a very long time and have reached extremely high accuracy. Our problem of defining socially affiliated groups may sound easy, but it is really not. Our tests showed that none of the existing models are able to detect regions in the image that are specified by complex and sometimes abstract criteria. The main takeaway was that there is still a lot of room for improvement building multimodal algorithms that require understanding semantically complex concepts and visual grounding. We really hope that the methods we proposed in the paper will lead to interesting insights for urban planning."
Looking to the future, the team says that improving these algorithms could make it possible to explore which physical environments promote social group interactions. However, this goes beyond the scope of their project at this stage.
The paper is available to read here: https://arxiv.org/pdf/2509.13484v2
We wish the project team continued success in their work and are keeping our fingers crossed for their paper!
More news
Last change: 11/06/2026, Patrick Lenz