Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames (ACM ISS 2023 - Papers)

Who

Kotaro Oomori, Wataru Kawabe, Fabrice Matulic, Takeo Igarashi, Keita Higuchi

Track

ACM ISS 2023 Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 8 Nov 2023 11:50 - 12:15 at Schenley Ballroom - Session 6: Immersion, Audio, and Multimedia Chair(s): Tigmanshu Bhatnagar

Abstract

Segmenting and determining the 3D bounding boxes of objects of interest in RGB videos is an important task for a variety of applications such as augmented reality, navigation, and robotics. Supervised machine learning techniques are commonly used for this, but they need training datasets: sets of images with associated 3D bounding boxes manually defined by human annotators using a labelling tool. However, precisely placing 3D bounding boxes can be difficult using conventional 3D manipulation tools on a 2D interface. To alleviate that burden, we propose a novel technique with which 3D bounding boxes can be created by simply drawing 2D bounding rectangles on multiple frames of a video sequence showing the object from different angles. The method uses reconstructed dense 3D point clouds from the video and computes tightly fitting 3D bounding boxes of desired objects selected by back-projecting the 2D rectangles. We show concrete application scenarios of our interface, including training dataset creation and editing 3D spaces and videos. An evaluation comparing our technique with a conventional 3D annotation tool shows that our method results in higher accuracy. We also confirm that the bounding boxes created with our interface have a lower variance, likely yielding more consistent labels and datasets.

DOI

https://doi.org/10.1145/3626476

Kotaro OomoriAuthor

The University of Tokyo

Japan

Wataru KawabeAuthor

The University of Tokyo

Japan

Fabrice MatulicAuthor

Preferred Networks

Japan

Takeo IgarashiAuthor

The University of Tokyo

Japan

Keita HiguchiAuthor

Preferred Networks

Japan

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 8 Nov
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:15	Session 6: Immersion, Audio, and MultimediaPapers at Schenley Ballroom Chair(s): Tigmanshu Bhatnagar

11:00 25m Talk		Embodied Provenance for Immersive Sensemaking Papers A: Yidan Zhang Monash University, A: Barrett Ens Monash University, A: Kadek Satriadi Monash University, A: Ying Yang Monash University, A: Sarah Goodwin Monash DOI Media Attached
11:25 25m Talk		Hum-ble Beginnings: Developing Touch- and Proximity-Input-Based Interfaces for Zoo-Housed Giraffes’ Audio EnrichmentHonorable Mention Papers A: Alana Grant University of Glasgow, A: Vilma Kankaanpää University of Glasgow, A: Ilyena Hirskyj-Douglas University of Glasgow DOI
11:50 25m Talk		Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames Papers A: Kotaro Oomori The University of Tokyo, A: Wataru Kawabe The University of Tokyo, A: Fabrice Matulic Preferred Networks, A: Takeo Igarashi The University of Tokyo, A: Keita Higuchi Preferred Networks DOI