2016 IEEE International Conference on Cloud Engineering Workshops
Software-Defined Networking-Based Enhancements to Data Quality and QoS in Multi-Tenanted Data Center Clouds Pradeeban Kathiravelu Supervised by: Prof. Lu´ıs Veiga INESC-ID Lisboa / Instituto Superior T´ecnico Universidade de Lisboa, Portugal
[email protected]
Abstract Tenants assume various roles in the enterprise data center networks, requiring a differentiated Quality of Service (QoS), data quality and isolation guarantees among them. Traditionally, data storage and processing are handled in either distributed, or centralized manner. While distributed execution offers a higher horizontal scalability, it often comes with a trade-off of lack of centralized control, and hence often with a decreased accuracy and management efficiency. Software-Defined Networking (SDN) offers a global view of the entire data center network to a logically centralized controller. Hence, it provides the best of both worlds with minimal compromises: (i) scalability of the large-scale distributed systems. (ii) unified management capabilities of the traditional centralized systems. By deploying an extended SDN controller architecture, we attempt to enhance the data quality of stored and processed data and increase the QoS of the multi-tenanted data center network clouds. I. Introduction
Fig. 1.
tributed controller architectures are proposed for a scalable SDDC networks. Distributed file systems and SDN compliment each other. Hyperflow [5] is a distributed control plane for OpenFlow SDN southbound protocol. Hyperflow is event-based, and leverages WheelFS [6] distributed file system for its distributed control plane. SDN controller with optical switching has been exploited to enhance the BigData applications [7].
SDN [1] has been exploited in developing various networking systems since its inception. SDN paradigm has further been adopted into other aspects of distributed computing such as Software-Defined Storage [2] and SoftwareDefined Environments [3], where the control layer is separated from the data layer. Data centers are leveraging SDN for better QoS and management.
We research and construct an extended SoftwareDefined Data Center (SDDC) architecture for data isolation and QoS guarantees in data center clouds. The extended SDN-based solution developed as part of this PhD dissertation composed of a set of projects and components. These project components consume and interact with each other to enhance data center clouds in QoS and data quality aspects, in a multi-tenanted environment. This paper discusses the development efforts of the proposed projects, our current achievements, as well as the future research directions leading to the completion of the proposed research.
Figure 1 shows a generic Software-Defined Systems deployment. The network is controlled by a logically centralized and physically distributed SDN controller, with switches routing traffic across the switches and hosts based on policies. Data flows are routed between the switches in the data plane. Control flows are exploited by the OpenFlow [4] southbound API of the controller to provide a global view of the network to the controller. Upon violation of the policies set by the controller in the routing table, the violating packets are forwarded to the controller. Control flows are independent of the data flows and hence do not deteriorate the throughput or the performance of the data flows. Controller may modify the routing table, policies, and rules dynamically. Software-defined systems and Software-Defined Cloud Network (SDCN) applications are deployed on top of the controller, communicating with the controller and underlying network through the controller northbound API. Data is transformed and stored across multiple distributed nodes in a data center. Moreover, data is consumed by multiple tenants consecutively. Various dis/16 $31.00 © 2016 IEEE DOI 10.1109/IC2EW.2016.19
SDN Systems Deployment
II.
Research Approach
This dissertation has been structured as multiple interrelated lines of work, that essentially leads towards the goal of my PhD. Figure 2 gives a higher level view of the work indicating the works that are published as well as the ongoing research activities. The core PhD research has been structured around the networking and data domains, and their intersection. On top of the network domain, controller extensions are built for scalability, multi-tenancy, and data management. Further, multiple use cases are considered on top of the core research. Messaging Oriented Systems and Biomedical Informatics are two of the currently implemented research use cases. Along with the data and networking research, 201
data center network simulations and emulations are researched and implemented in parallel to efficiently evaluate the architectures and algorithms. At the higher level, SDNbased Data Quality and QoS enhancements to data center networks are built, based on all these core research and implementations.
not suit to simulate higher level data centers. Hence, following the API and design model of CloudSim, xSDN [16], a network flow simulator was built with capabilities of integration with SDN controllers. Moreover, as the evaluated system is scaled in size, often the emulators fail to execute due to resource constraints. We propose and build SENDIM as an adaptive simulation, emulation, and deployment integration middleware. SENDIM [17] extends xSDN and incorporates with emulators such as Mininet to offer an adaptive transition and migration between simulations and emulations based on the scale of the evaluated system and the available resources in the host. The integrated simulation and emulation capabilities offered means of visualizing SDDC networks, architectures, and algorithms. C. Federated Data Sources in Data Centers Near duplicate detection plays a major role in data quality in the federated data sources. ∂u∂u [18] is a distributed near duplicate detection framework, that leverages in-memory data grids as a distributed shared memory. Data Caf´e [19] offers an integration solution for warehouse creation from heterogeneous data sources.
Fig. 2.
Based on the core data research, a biomedical informatics use case has been built as multiple projects. MEDIator [20] is a data sharing synchronization platform for heterogeneous medical image archives. MediCurator [21] extends Data Caf´e and ∂u∂u to offer near duplicate detection for medical imaging archives and data warehouses. Further extension points leveraging SDN as well as data partitioning and multi-tenancy are researched for different use cases including messaging oriented systems and biomedical informatics.
Projects and Components of the Dissertation
A. SDN Control Plane Extensions Controller extensions are considered for tenantawareness in SDN controllers. Alternative paths for various data flows are proposed through adaptive strategies. RMPTCP extends Multi-Path TCP (MPTCP) [8] with SDN for data aware adaptive flow routing with selective redundancy in routing. SMART [9] uses selective tagging to the flows for prioritizing the flows adaptively.
On top of the data, network, and simulation research implementations, efficient SDN-based data center architectures are built. SDN-based QoS and Data Quality Enhancements to Data Center Clouds (SDQ) integrates multi-domain distributed SDN controller for data quality and QoS in data centers. FIRM [22] offers SoftwareDefined Service Composition by extending SDN for a large scale service composition workflow. I am currently researching and building SDQ and FIRM on top of our published work.
OpenDaylight SDN controller [10] has been extended with message-oriented middleware protocol implementations for a scalable and multi-tenanted execution in a large data center network. Multiple use cases of SDN with messaging oriented systems are built. Cassowary [11] is a middleware platform for context-aware smart buildings with Software-Defined Sensor Networks. CHIEF [12] is a controller farm enabled by an orchestrated distributed deployment of SDN controllers that have limited protected access to each other. As an extended use case, CHIEF has been proposed as a controller for community network clouds.
III.
Development Strategy
Development efforts on the PhD dissertation is based on both the core research as well as the application of the research findings to the common use case scenarios. As a result, we have published research and application papers. The code is developed following a modular architecture, such that different bundles can be deployed into an OSGi container, providing various functionality. We develop the core of the message-oriented middleware protocol implementation as an incubation project in OpenDaylight, named Messaging4Transport1 . The other bundles are often open sourced as development progresses2 .
B. Simulations and Emulations The early research decisions and data center network implementations were evaluated on simulation and emulation environments. Mininet [13] has been leveraged as the network emulator to evaluate the network algorithms and architectures, while OpenDaylight has been leveraged as the core SDN controller. For larger architectures, simulations were used, as emulations require more resources to run 10,000s of nodes. CloudSim [14] and NS-3 [15] were leveraged as the simulators at the early stages of research. CloudSim fails to simulate SDN with focus at network level, while NS-3 does
1 https://wiki.opendaylight.org/view/Messaging4Transport:Main 2 https://sourceforge.net/projects/s2dn/
202
It was shown in our previous work [18] that enterprise data applications can scale well, leveraging the in-memory data grids to function in a distributed environment. Further, we showed that SDN combined with the distributed shared memory and message-oriented middleware applications can be leveraged to create an ubiquitous computing environment [11]. Specific use cases [20], [21] indicated the applicability of our research into real-world application scenarios.
[5]
[6]
[7]
While enterprise systems, including near duplicate detection in integrated data sources, scale seamlessly in distributed environments, they often fail short in accuracy, when centralized control is sacrificed in favor of horizontal scalability. We propose SDN-based enhancements to the data solutions such that while the application is distributed, the network control layer remains logically centralized in order to retain the accuracy that a centralized management can offer. IV.
[8]
[9]
[10]
[11]
Conclusion
We propose a scalable SDN-based architecture for enterprise data center solutions of BigData. The preliminary focus and research targets have been developed and early experiments showed promising results on data isolation guarantees in a multi-tenanted network, with increased data quality and QoS guarantees. Leveraging SDN, the dissertation proposes scalable data center clouds without compromising the management abilities. The findings are currently being implemented iteratively, with more research to improve QoS in data center clouds.
[12]
[13]
[14]
Acknowledgements: The PhD dissertation started on the September, 2014, and tentatively the defense will be on the September, 2018. The work is carried out at the distributed systems group at INESC-ID Lisboa. The biomedical informatics use cases have been implemented in collaboration with Emory University School of Medicine, Department of Biomedical Informatics. Part of the application scenario developments were supported by OpenDaylight and Google Summer of Code internships.
[15]
[16]
[17]
This work is supported by national funds through Funda¸ c˜ ao para a Ciˆ encia e a Tecnologia with references UID/CEC/50021/2013, PTDC/EEI-SCR/6945/2014, and a PhD grant offered by the Erasmus Mundus Joint Doctorate in Distributed Computing (EMJD-DC).
[18]
References [1] [2]
[3]
[4]
N. McKeown, “Software-defined networking,” INFOCOM keynote talk, vol. 17, no. 2, pp. 30–32, 2009. E. Thereska, H. Ballani, G. O’Shea, T. Karagiannis, A. Rowstron, T. Talpey, R. Black, and T. Zhu, “Ioflow: A softwaredefined storage architecture,” in Proceedings of the TwentyFourth ACM Symposium on Operating Systems Principles. ACM, 2013, pp. 182–196. C. Li, B. Brech, S. Crowder, D. M. Dias, H. Franke, M. Hogstrom, D. Lindquist, G. Pacifici, S. Pappe, B. Rajaraman et al., “Software defined environments: An introduction,” IBM Journal of Research and Development, vol. 58, no. 2/3, pp. 1–1, 2014. N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Openflow: enabling innovation in campus networks,” ACM SIGCOMM Computer Communication Review, vol. 38, no. 2, pp. 69–74, 2008.
[19]
[20]
[21]
[22]
203
A. Tootoonchian and Y. Ganjali, “Hyperflow: A distributed control plane for openflow,” in Proceedings of the 2010 internet network management conference on Research on enterprise networking. USENIX Association, 2010, pp. 3–3. J. Stribling, Y. Sovran, I. Zhang, X. Pretzer, J. Li, M. F. Kaashoek, and R. Morris, “Flexible, wide-area storage for distributed systems with wheelfs.” in NSDI, vol. 9, 2009, pp. 43–58. G. Wang, T. Ng, and A. Shaikh, “Programming your network at run-time for big data applications,” in Proceedings of the first workshop on Hot topics in software defined networks. ACM, 2012, pp. 103–108. S. Barr´e, O. Bonaventure, C. Raiciu, and M. Handley, “Experimenting with multipath tcp,” ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 443–444, 2011. P. Kathiravelu and L. Veiga, “Not every flow is equal: Smart discrimination in redundancy,” arXiv preprint arXiv:1512.08646, December 2015. J. Medved, A. Tkacik, R. Varga, and K. Gray, “Opendaylight: Towards a model-driven sdn controller architecture,” in A World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2014 IEEE 15th International Symposium on. IEEE, 2014, pp. 1–6. P. Kathiravelu, L. Sharifi, and L. Veiga, “Cassowary: Middleware platform for context-aware smart buildings with softwaredefined sensor networks,” in Proceedings of the 2nd Workshop on Middleware for Context-Aware Applications in the IoT. ACM, 2015, pp. 1–6. P. Kathiravelu and L. Veiga, “Chief: Controller farm for software-defined community clouds,” in Cloud Engineering (IC2E), 2016 IEEE International Conference on, April 2016, p. To Appear. B. Lantz, B. Heller, and N. McKeown, “A network in a laptop: rapid prototyping for software-defined networks,” in Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks. ACM, 2010, p. 19. R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. De Rose, and R. Buyya, “Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms,” Software: Practice and Experience, vol. 41, no. 1, pp. 23–50, 2011. T. R. Henderson, M. Lacage, G. F. Riley, C. Dowell, and J. Kopena, “Network simulations with the ns-3 simulator,” SIGCOMM demonstration, vol. 14, 2008. P. Kathiravelu and L. Veiga, “An expressive simulator for dynamic network flows,” in Cloud Engineering (IC2E), 2015 IEEE International Conference on, March 2015, pp. 311–316. ——, “Sendim for incremental development of cloud networks: Simulation, emulation & deployment integration middleware,” in Cloud Engineering (IC2E), 2016 IEEE International Conference on, April 2016, p. To Appear. P. Kathiravelu, H. Galhardas, and L. Veiga, “∂u∂u multitenanted framework: Distributed near duplicate detection for big data,” in On the Move to Meaningful Internet Systems: OTM 2015 Conferences. Springer International Publishing, 2015, pp. 237–256. P. Kathiravelu and A. Sharma, “Data caf´e — a platform for creating biomedical data lakes,” AMIA Summits on Translational Science Proceedings, vol. 2016, p. To Appear, March 2016. ——, “Mediator: A data sharing synchronization platform for heterogeneous medical image archives,” in Workshop on Connected Health at Big Data Era (BigCHat’15) , co-located with 21 st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015), no. 2015. ACM, 2015, p. 6 pages. ——, “Near duplicate detection for medical data warehouse construction,” AMIA Summits on Translational Science Proceedings, vol. 2016, p. To Appear, March 2016. P. Kathiravelu, T. G. Grbac, and L. Veiga, “A firm approach to software-defined service composition,” arXiv preprint arXiv:1601.02131, January 2016.