Application-Specific, Agile and Private (ASAP) platforms for federated computing services over WDM networks
The aim of this dissertation is to develop innovative network-centric approaches to the problem of joint allocation of computing and networking resources under the emerging paradigm of federated computing services (FCS). In this FCS paradigm, users submit distributed computing "jobs" to a service provider called FCSP, and the FCSP will try to complete the jobs using its own computing and optical networking resources (or resources that belong to a third party for which the FCSP is a broker). FCS may be considered as the future generation of Cloud Computing as FCS is capable of integrating more computing and networking resources, and providing a stronger Service Level Agreements (SLAs) than existing ones. In this dissertation, we have focused on the case where each job requires both distributed computing facilities for concurrent processing in a networked computing environment, and high-bandwidth networking resources to support communications or data exchanges among these distributed computing facilities. The FCS paradigm is suitable for both types of distributed computing jobs or applications: virtual infrastructure (VI) and Workflow (WF). A VI job is usually represented by an undirected graph, which specifies a set of computing resources and their connectivity required to run a set of computing tasks for a specific period of time. On the other hand, a WF job can be represented using a directed acyclic graph (DAG), where directed edges also imply precedence among the tasks. Depending on the bandwidth requirements of communications, computing clusters are connected using dedicated circuits with either wavelength or sub-wavelength granularities, or by opportunistically reserving bandwidth between them. Both the distributed computing facilities (for the execution of tasks) and high-bandwidth networking resources (for the communication between tasks) will be reserved in advance for a limited period of time specified by the job description. Subject to the above constraints, the primary challenge of supporting these applications is to find an optimal mapping from the task graph (an overlay) to the substrate network such as aWDM network connecting many computing clusters. For each accepted job request, the set of clusters chosen to execute the tasks, and the lightpaths established among them during the task execution, together form what we call an Application Specific and Agile Private (ASAP) network. The main research issues addressed in this dissertation include the following: (1) Design and analyze SLA-driven cost-effective task assignment and scheduling algorithms; (2) Develop survivable approaches based on joint optimization of computing and networking resources; (3) Study advanced topics related to the impact/tradeoffs of traffic grooming, optical circuit switching (OCS) vs optical burst switching (OBS), programmable nodes, (4) Use analysis and simulations to evaluate the proposed solutions.