Workflow pattern mining using e-mail communications
MetadataShow full item record
The objective of this work is to utilize the abundant "unstructured" information and convert it into a logical and structured representation. This not only results in useful data representation to discover hidden patterns but also assists in precise decision making and optimization of the workflows. The problem today is not lack of data, but instead lack of structured information and data overload. In this study, we consider organizational emails as the source of data, since they are recognized to be a good source for inter organizational communication and workflows. Emails capture people's communication history that provides valuable insight regarding the infrastructure of an organization. We considered threaded emails as the basic entity and basis for our pattern recognition algorithm. After exploring many classical graph matching approaches, we developed a method to measure the similarity among threaded emails. The algorithm of similarity measure is developed on the foundation of edge matching distance. The similarity measure is then utilized for efficient clustering of isomorphic and sub-isomorphic email representations. We validated the clustering efficiency by implementing and analyzing Silhouette index. Workflow and communication patterns have been developed after combining the graphs contained in distinct clusters. The software development is done in Java utilizing Jung (Java API for graphical representation). The open source Pajek software is used to collect the network statistics. The Graphs are represented in universal Pajek format and result can be visualized using any Pajek reader software. This provides the opportunity to explore the results even in great detail. Users have ability to visualize the patterns at each stage: consolidated communication pattern, threaded email communication and identified workflow or communication patterns. The visualization gives the user a better sense of email archive and social networks. These patterns also represent distinct networks within the organization based on their communication interaction irrespective of their organizational or functional responsibilities.