Scene Classification of Images and Video via Semantic Segmentation

Project Details

Background: Scene classification is used to categorize images into different classes, such as urban, mountain, beach, or indoor. This project tackled scene classification of television shows and feature films. These types of media bring unique challenges that are not present in photographs, as many shots are close-ups in which few characteristics of the scene are visible.

Solution: The video was first segmented into shots and scenes, and key frames from each shot were analyzed before aggregating the results. Each key frame was classified as indoor or outdoor. Outdoor frames were further broken down by a semantic segmentation that provided a label to each pixel. These labels were then used to classify the scene type by describing the arrangement of scene components with a spatial pyramid. The predicted keyframes labels were then summarized for each shot and scene.

Results: We tested our method on a large database of videos and compared with prior work on photographs. Validation of the semantic segmentation was shown on a set of hand-labeled images. Our work improved the semantic segmentation and scene classification of images and, to the best of our knowledge, was the first working system on video.

Additional Applications: The methods developed in this project are applicable when classification of an image depends on the proportion of its components. Semantic segmentation is now commonly used for self-driving cars and other autonomous navigation applications. Classifying the overall content of an image or video sequence can provide keywords for search and retrieval.

Published Article
Presentation