Graph analytic techniques in uncertain environments: Graph matching and link analysis
MetadataShow full item record
Complex environments such as counterinsurgency (COIN), air traffic control and disaster relief have historically been settings in which a domain-wide understanding of the current state of the world has proven difficult to obtain. Traditionally, information fusion approaches have been implemented in an attempt to provide situational awareness to the overseeing individual(s) (e.g., intelligence analysts, air traffic controllers, crisis center decision makers). There are many difficulties in these domains including: a wide variety of disparate data sources, many layers of uncertainty within the raw information and higher level inferences, high rate of change of the environment, necessity for rapid understanding and sheer volume of accumulated knowledge. This dissertation focuses on graph analytic techniques which aid the achievement of situational awareness within these uncertain environments. Observations in these environments are converted to a graphical format representing entities, events and relationships between them. The accumulated knowledge forms a cumulative data graph. The uncertainties, size and complexity of the connections within this graph make accurate assessments difficult for the intelligence analyst. To overcome this difficulty an automated stochastic graph matching approach was developed to identify situations of interest within the accumulated data. This approach consists of three main processes: uncertainty alignment, graph matching result initialization and graph matching result maintenance. Uncertainty alignment associates with raw incoming observations a bias adjusted uncertainty representation representing the true value containing spread of the observation. The graph matching initialization step provides template graph to data graph matches for a newly initialized situation of interest (template graph). Finally, the graph matching result maintenance algorithm continuously updates graph matching results as incoming observations augment the cumulative associated data graph. Throughout these processes the uncertainties present in the original observations and the template to data graph matches are preserved, ultimately providing an indication of the uncertainties present in the current situation assessment. The stochastic graph matching techniques developed in the first section consider a single situation of interest (template graph). In an applied environment, multiple (potentially overlapping) situations of interest must be considered simultaneously. The knowledge of overlapping template sections forms the motivation for precedence tree guided search and AND/OR template graphs. Through the recognition of the overlapping sections, a single AND/OR template can be created to answer many information requirements. By intelligently traversing this AND/OR template with the help of evidentiary graph characteristics, increased algorithmic efficiency can be attained. These results are demonstrated through a numerical study. The final section considers the scalability of a query for the identification of potentially distant connections within a very large graph (millions-billions of nodes and edges). The parallel link analysis algorithm developed is designed for an operational environment in which graph traversal queries are time consuming, but can be performed in parallel. Scalability tests are performed to evaluate the algorithms ability to scale to very large graphs and identify ideal cluster sizes based on query response times.