ITR: Unapparent Information Revelation - Creation, Visualization and Mining of Concept Chain Graphs
Rohini Srihari Principal Investigator
MetadataShow full item record
There are potentially valuable nuggets of information hidden in document collections generated by multiple authors, working independently at various times. Such information is not explicit, but can be inferred by following chains of concepts and associations. Users surfing the web may need to be monitored with the goal of deriving their true information need, which could be motivated by malicious intent. The problem of unintended information revelation (UIR) is a special case of text mining where the documents represent some pre-selected subset of interest to a user, generated through purposeful querying or surfing. The goal is to quantify the information revealed by this subset and to detect significant chains of concepts and associations. <br/><br/>This effort focuses on the development of a UIR framework and toolkit that covers the following areas: (i) probabilistic frameworks for concept chain graphs (CCG): a new information representation conducive to text mining; (ii) automatic construction of CCGs from representative document collections using pre-existing ontologies and machine learning techniques in information extraction; (iii) discovery tools that quantify information revealed and reveal hidden, information rich paths within the CCG, and (iv) interactive visualization tools for the CCG. This new framework facilitates better visualization and analysis of information than existing information retrieval (IR) representations. <br/><br/>This project should impact several applications, most notably homeland defense applications. The UIR toolkit has the potential to expose sensitive information available on unclassified websites. It can also be used to ascertain whether that information is benign or safe to disseminate. Applications in discovery from scientific documents are also enabled.