Image parsing using ontology and data semantics (IPODS)
Nwogu, Ifeoma O.
MetadataShow full item record
Image parsing continues to be a challenging research task in the field of computer vision. In this dissertation, we have developed a hybrid image parser which accounts for different vision-related phenomena (i) the perception of objects, their parts and the relationships between them; (ii) the use of semantic, spoken language to describe attributes of objects in images to be parsed and a heterogeneous computational model for object/part recognition; (iii) an image segmentation process which uses multiple visual cues and (iv) an optimization technique which reduces the solution space for scene identification. The parser is built on an image grammar-based framework. Because the patterns to be analyzed are often multifarious, with one element having numerous diverse parts to it, we have developed a general symbol-based ontology paradigm that describes complex image patterns in terms of a hierarchical composition of simpler subpatterns. The relationships between objects, and with their parts, are presented using first order formal logic. In order to perform rapid scene parsing and identification, the input image is over-segmented to yield "superpixels", which are a locally, coherent grouping of pixels that preserve the structure necessary for image parsing. Their usage greatly reduces the computational complexity of the parser. In the thesis, we also present an algorithm for performing a (near) global Markov Random Field (MRF) optimization for labeling segmented images. Our previous results of labeling images using the pairwise and local interactions only, are also presented. The labeling results are produced primarily using the semantic attributes of objects, such as, blue skies, green vegetation etc. Because semantic attributes alone are not sufficient to fully describe objects and their parts, we develop/use different computational models such as a human skin color model, the output of a probabilistic classifier for detecting the presence of buildings, a face detector etc. These different models result in the hybrid nature of the image parser. Going forward, we intend to include the use of more visual perception cues such as depth and motion to further constrain the labeling process. Also, although the MRF optimization algorithm greatly reduces the search space for a (near) global solution, it still runs in an order exponential in the number of nodes present in the graph. We can further investigate methods to reduce its computational complexity. Lastly, this method will be useful for 3-dimensional data labeling, where many different spatial constraints can be better enforced than in 2-dimensional images. We therefore intend to apply this labeling technique to 3-D medical data in the future. The symbol-based ontology was developed for the natural images domain, specifically for outdoor images. Several segmentation algorithms including our in-house technique were compared using image benchmark data found in the Berkeley Database System (BDS). The image parser was tested on natural images from the BDS, from the Lotus Hill dataset and on natural photographs from flickr.com.