Optimal Strategies for Controlling Cascades in Social Networks: An Influence Maximization Approach
MetadataShow full item record
Influence maximization is a target set selection problem that aims to select such a set of nodes in a social network that will maximally influence the network. This dissertation manuscript focuses on gaining an understanding about the nature of social influence, through mathematical modeling, and exploring the optimal strategies to control influence (information) cascades over time. Chapter 1 presents a survey of the literature on influence maximization in social networks, paying special attention to the practical applications of this problem. In particular, the literature focuses on the identification of new and recently proposed applications that may have received little attention so far in the broad research circles. More than 3200 papers in the field are inspected and classified from the application perspective. The results allow us to identify and describe a number of valuable applications, which suggests that the influence maximization work has a true practical appeal and motivates the work in the subsequent chapters of this dissertation. Additionally, the extracted citation network of the studied papers is studied closely using the social network analysis toolbox: clustering of the citation network allows for distinguishing the distinct research communities that exploit the influence maximization methodologies; the most impactful papers in the field are recognized through centrality analyses and discussed separately. In Chapter 2, the ideas of Bayesian inference are employed to model the human decision-making in social networks as a binary hypothesis testing. A new time dependent diffusion model – the Parallel Cascade model – is developed, which allows for modeling the influence maximization problem as a mixed-integer program. Due to the NP-hardness of the resulting seed selection problem, the commercial solvers turn out inefficient at finding the optimal/near-optimal solutions for the problem instances over medium and large-sized networks. To tackle this issue, an efficient heuristic methodology is presented, using Lagrangian relaxation, that also returns guaranteed bounds on quality of the attained solutions. The heuristic algorithm is successfully applied with the publicly available Facebook datasets and shown to outperform commercial solvers. The computational investigations throughout this chapter showcase that when the seed selection budget is limited, the network density and the presence of the opposite opinions in the network significantly affect the optimal seed selection strategies. Chapter 3 explores the temporal aspects of the seeded cascade propagation over social networks. First, the notion of mid-term and long-term cascade stability is introduced, and then, the long-term behavior of cascades is theoretically investigated. In the presence of the evidence discount factors, the cascade time horizon to be used for a given problem turns out to be the parameter that has to be strategically set, since it significantly affects the corresponding solution. A Cplex-based algorithm is designed to find the stable time horizons of cascades, and then, the insights about the impact of the time horizon setting are extracted through a fractional factorial experiment. Chapter 4 begins by offering a critique of the original influence maximization formulation that requires that all the seeding budget be spent at the initial time period for the optimal cascade ignition. The benefits of scheduling the seed activations over the time horizon of the cascade are explained, calling for a systematic methodology for controlling cascades over time. A partial activation model is presented for the spread of evidence in a network, and the seed activation scheduling problem is solved in two-level social networks: in the context of blogger-centric marketing campaigns, the first network level contains the influential bloggers with large sets of followers, and the second level contains the regular bloggers which are the main targets of a planned campaign. The problem aims to optimally spend the limited seed activation budget over time for activating sponsored bloggers to inject the positive evidence into the second-level network. An efficient heuristic algorithm, using the column generation method, is presented for the problem, and its scalability is experimentally validated over the real social media user networks. The computational analyses in this chapter show what impact the target influence time setting has on the optimal seed activation strategies. Finally, Chapter 5 summarizes the dissertation and presents the areas for future research.