SMERF

Katie Z Luo¹, Xinshuo Weng², Yan Wang², Shuang Wu², Jie Li², Kilian Q. Weinberger¹, Yue Wang^2,3 Marco Pavone^2,4,

¹Cornell University, ²NVIDIA, ³University of Southern California, ⁴Stanford University

Code | Paper | Video

In order for self-driving to plan safe routes, they need to understand the topology of the road. In this work, we tackle the lane-topology task, which aims to predict lane centerlines and relationships between each other and traffic elements. We introduce a novel framework that leverages Standard Definition (SD) maps for real-time lane topology prediction in autonomous driving. By integrating SD maps into online map prediction with our Transformer-based embedding, we've significantly improved lane detection and topology prediction by up to 60%, marking a substantial advancement in this field. To the best of our knowledge, SMERF (SD Map Encoder Representations from transFormers) is the first work to leverage SD maps for map prediction.

Motivation

SD maps include crucial information for road topology which can complement onboard cameras for lane-topology reasoning. For example, intersections that are not visible in the camera images due to occlusion can be seen in an SD map instead. In the SD map, orange lines represent roads and teal lines represent walkways.

Method

Our task is to detect the lane centerlines of the road and the traffic elements of the scene such as the traffic lights and stop signs, as well as infer the connectivity of the lane centerlines and how they relate to each traffic element. In contrast with prior work, we additionally assume that we have access to the SD map of the region, queried using an onboard GPS system.

The proposed SMERF (lower half) augments an existing lane-topology model (upper half) with priors from SD maps in order to better detect lane centerlines and relational reasoning. There are three key components, as shown in above figure.

SD Map inputs, containing road-level topology and type-of-road information, are extracted. The map segments are transformed into \( M \) polylines relative to the ego vehicle's coordinates.
Sinusoidal embeddings encode the polyline point locations, which is then passed to a Transformer encoder to extract the global geometric and semantic information from the SD map input.
The SD map representation lastly fuses with the intermediate lane-topology model using multi-head cross-attention.

Qualitative Results

We compare the baseline model from openlane-v2 with and without our method, SMERF applied. Observe that adding the SD map as a prior improves far away lane recognition, since such information is present in the map but not in the images. The areas marked by red arrows show that adding SMERF improves lane-topology performance.

This effect is observed across models; we visualize the comparison between Toponet with and without SMERF.

Reference

If this work is helpful for your research, please consider citing us!

@article{luo2023augmenting,
    title={Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps},
    author={Luo, Katie Z and Weng, Xinshuo and Wang, Yan and Wu, Shuang and Li, Jie and Weinberger, Kilian Q and Wang, Yue and Pavone, Marco},
    journal={arXiv preprint arXiv:2311.04079},
    year={2023}
}