Home / Papers / Object Detection on the Android

Object Detection on the Android

88 Citations•2010•

A. Mohan, S. Kumar

journal unavailable

The focus is primarily on improving the performance of the joint object detection and pose estimation framework outlined in [1] using multiple frames and the use of short video sequences of the object of interest.

Abstract

Object detection has been a formidable challenge in computer vision. Although there has been considerable work that demonstrates good results when either the object was observed in training or when there is limited intra-class variability, a universal inter-class, intra-class soltuion remains unseen. The case for further work to re ne and improve object detection is transparent. Object Detection is a necessary step towards performing a myriad of other vision and machine automation tasks. In vision, applications include accurately understanding scenes, as a prior to interactions between objects/subjects (such as a person holding a microphone might be giving a speech), and others. In the broader problem of machine automation, object detection enables machines to understand and thereby interact (or not) with its environment. Our object detection framework is largely based on the frameworks proposed in [1] and [4], which are both based on the idea of using the Generalized Hough Transform to allow patches in the image vote for possible locations of the object. In [4] , the authors propose to use Random Forests to learn a discriminative codebook of part appearances that maps image patches to probabilistic votes about the object location. [1] extends this idea to incorporate scene depth into the framework by using depth maps in training, to build a mapping between patch scale and scene depth. This allows them to simultaneously detect objects and recover object shape. The advantages of deriving these two quantities at once are plenty. Depth information of a detected object aids as a base for applications in Augmented Reality, motion planning, scene understanding and others. However, our immediate goal is focused on object detection alone. The challenges involving incorporating depth extraction have been tackled for single frame queries in [1]. In this project, the focus is primarily on improving the performance of the joint object detection and pose estimation framework outlined in [1] using multiple frames. Towards this end, we propose the use of short video sequences (1-2s) of the object of interest. In addition to this, we will also implement a client-server framework between a mobile platform and a computationally more powerful xed system.