International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com
Dependent Reliability of Distributed Cloud Computing Network (DCCN) 1 1&4
Department of Mechatronics Engineering, (Computer System & Software Development), Federal University of Technology, Owerri, Imo State, Nigeria. 2
5
K.C. Okafor*, 2C.C. Udeze, 3A.A.Obayi, 4G.C.Ononiwu, 5J. A. Okoye
Department of Electronic Engineering, University of Nigeria, Nsukka, Nigeria. 3 Department of Computer Science, University of Nigeria, Nsukka, Nigeria.
Department of Electrical and Electronics Engineering, Chukwuemeka Odumegwu, Ojukwu University, ULI, Anambra State, Nigeria. (*Corresponding Author:
[email protected])
Abstract
hosted on the cloud. They include [11]: Amazon Web Service's Lambda, IBM Bluemix's OpenWhisk, Google Cloud Platform's Cloud Functions, and Microsoft Azure functions.
To achieve reliability, scalability, fault tolerance, and high network capacity for remote cloud application users, data center correlation failures must be prevented. Without system reliability of data center components and systems, an incremental traffic growth and high bandwidth requirements will constantly crash the system. In this paper, a new DCCN reliability ontology is presented for Non-recursive OpenFlow datacenter (NROD) network structure. The design exhibits acceptable performance enhancement with network component isolation and hazard function or failure rates. This behavior is of special significance for UNN shippingcontainer data centers, since once the container is encased and operational, troubleshooting or replacement of components becomes difficult. Using the ICT innovation center at the University of Nigeria-Nsukka, recursive BCube and DCell datacenters are identified and analyzed in the context of system reliability. Mathematical characterization on failure rate time curves/bathtub, reliability hazard function, failure rate modeling and mean time before failure (MTBF) are presented with respect to NROD proposal. Monte-Carlo sampling approach was employed in an experimental study with event scripts of Riverbed Modeller 17.5. In this case, recursive UNN DCell and BCube are compared with the proposed reliable NROD. These experiments covered several scenario-based cases to validate the achievement of reliability in DCCN. Using established network metrics, the work observed satisfactory QoS metrics for efficient deployment in any cloud based data center network.
Failure of the cloud hosted data center network normally results in huge loss of revenue, about USD 5,600 per minute [12]. The various types of failures that pose threat to the reliability of a cloud based network and its services include [13]: overflow, timeout, data resource missing, computing resource missing, software failure, database failure, hardware failure, and network failure. Besides, there various reasons for most network downtime. Primarily, change management processes could grossly affect uptime. This is because every change applied to the network operating environment has risks components. Other reasons are single point of hardware failure, power systems collapse, network routing challenges, and human error factors (e.g., network configuration errors during upgrades) [14]. Excellent planning with impact analysis, training, testing, and contingency plans will assist immensely to mitigate implosive and explosive network collapse. With the enormous online services, end-to-end solutions and increasing presence of online users, the need for highly reliable systems have continued to raise concerns. Using components that are ten times reliable will reduce downtime to less than 0.01% [CCIT-recommendation]. Also, by deploying redundant components and systems whose capacity can sustain the network running 247 even when the primary component fails, this can solve the downtime problems. Reliability is the probability of successful and satisfactory performance of a functional system at any time t. The system may be a traffic generator, vehicle, network, etc, which is expected to start and stop, maybe run for years without failure; or a robot which is expected to stay in a standby position for a long period of time and then respond just once without failure. The successful performance of any system outlined above will depend on the extent to which its reliability is designed.
Keywords: DCCN, Reliability, Correlated Failure, MTBF. INTRODUCTION Everything in life is bound to eventually fail including computer networks and its systems. Over 95% of today’s computing activities run on the cloud data center network infrastructure. The common cloud computing platforms readily available include [1]: Microsoft Azure, Amazon, Oracle, Google, HP, IBM, Salesforce, and Kaavo, etc. Obviously, website domain hosting, navigation systems/maps, social media platforms [2], industry 4.0, streaming media (e.g., Netfix [3], Spotify [4], Youtube [5]), file storage (e.g., google drive [6], Dropbox [7], iCloud [8], Microsoft’s Skydrive [9], Hulu [10] etc), are all provisioned from the federated cloud networks. There are other services that are
A reliable network design facilitates efficient QoS metrics which could be used to benchmark the performance of any consolidated datacenter network [14]. In this regard, recursive data center architectures have lower latency but often unreliable. Enterprise applications and services will hardly survive in such networks. This is as a result of zero dynamics
6551
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com in unreliable systems in general. In the case of practical DCCN [14], it was observed that its seemingly identical server systems under heavy traffic/load conditions fail at different times if not for a back-to-back redundancy introduced. The failure of such system is probabilistically described. Since like most engineering systems, the DCCN comprises several components, the relationship between its component reliability, and system reliability must be understood. Google reported that 37% of its storage failures could be traced to correlated failure. A proactive model can prevent such failures. Existing mechanisms for dealing with failures has been outlined in [15], [16], [17], [18], [19], [20], and [21].
DCCN NROD in relation to service reliability, availability, queuing delays, throughput and resource utilization are discussed while showing why the proposed model offers best reliability function. The remainder of this paper is organized as fellows. Section II focused on the related research efforts in literature. Section III outlined the system model. Section IV discussed the DCCN validation model from the experimental setup. Section V concluded the work with highlighted future work.
RELATED WORK Significant research has been carried out on the design of datacenter network topologies with reliability as the key focus. Such efforts are usually focused on how to improve their performance so that enterprises can derive investment gains.
To the best of our experience, the work in [22] and [23] are the first state-of-the art that made effort to proactively prevent/combat correlated failures of cloud data center network applications. While INDaaS [22] compares the reliability of an application’s given deployment plans, and selects the most reliable plan for deployment. Each deployment plan highlights the positions for deploying the application instances in the cloud. The issues with this model include lack of QoS auditing, weak reliability profiling, focuses on monolithic components with no attention on the internal features, and poor scalability in large scaled domains. ReCloud [23] is another satisfactory model that focuses on reliability assessment and deployment plan for cloud applications. The limitations of this model include: recursive optimization search overhead, network transformation convergence and implosive quality of service conditioning.
In [15], a scalable, graphical, web-based cloud simulation software (ReliaCloud-NS) was presented while using a nonsequential Monte Carlo Simulation (MCS) to evaluate the reliability of cloud computing system. In [16], the authors proposed an elastic design of the message bus and a hierarchical fault management as two cloud-based solutions for increasing cloud framework availability and reliability. INDaaS [22] and ReCloud [23] are very close systems to Non-recursive OpenFlow datacenter network structure in terms of reliability framework. They compare the reliability of given deployment plans for a cloud data center application, and selects a reliable plan for deployment. But, it does not quantitatively assess the reliability of a deployment plan, cannot search for deployment plans to fulfill the developers’ requirements, cannot consider an application’s internal structures, and scales poorly in a large cloud infrastructure.
Hence, this paper focused on reliability analysis of NROD structure (components and systems), including the challenge of reliability allocation in multi-component design. NROD is a novel system developed for cloud service data center users and providers to address the issues of legacy models. The proposed system can easy be deployed in a large cloud environment to satisfy expected QoS metrics. It addresses the problems of common dependencies causing correlated failures and hazard functions. The proposed NROD is implemented leveraging the capabilities of INDaaS and ReCloud while making the following contributions: 1.
This is the first system built for SGEMS [14] which proactively locates a cloud application’s reliable deployment map that satisfies the application reliability requirements, and quantitatively uses QoS to access the system reliability with hazard functions.
2.
Takes cognizance of frequent dependences that causes correlated failures and introduces available dependency in the internal structures of the DCCN for reliability assessment.
3.
Addresses application service deployment scheme that satisfies between reliability and other QoS metrics.
4.
The system supports large scale computing which enhances QoS metrics.
The work in [24] presented a placement optimization for cloud services that uses primary and backup VMs to minimize the consumed network resources during recovery (i.e., when a backup replaces a primary). In the work, there is no provision for quantitative reliability assessment. System and component failures caused by dependencies are not considered. Recently, the authors in [25] focused on computing the approximate reliability of a service or a job in a data center as well as schedule the job with reliability constraints. The work focused only on power outages and network component failures. The failure domains used for job scheduling are non-trivial to identify in the presence of common dependencies, and it does not consider internal structures of applications [23]. Bonvin et al. [26] propose a fault-tolerant key value store using finegrained geographical tags for components (e.g., region, center, rack, and host), where components with dissimilar tags are assumed to fail independently. Shared dependencies across these tags, however, were not considered. NetPilot [27] had the objective of minimizing the effect of disruptions in the network by differentiating failure types to take different actions. Net-Pilot mainly considered network component failures whereas ReCloud integrates any available information about the cloud infrastructure to assess the reliability of application deployments.
The work investigates the influence of hazard function, logical isolation on Non-recursive OpenFlow datacenter network structure. To validate the work, a comprehensive analysis on
In [28], recent perspectives on distributed system reliability
6552
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com were highlighted, especially the reliability of individual components for producing a generic model, as well as for predicting or measuring reliability of such systems. In [29], the work focused on the impact of interdependencies and common cause failures (CCFs) in large repairable networks with possibility of “re-configuration” after a fault and the consequent disconnection of the faulted equipment. Similarly, the work in [30] proposed the β-factor model for analyzing CCF of the transmission system while establishing a new Bayesian network fused CCF for reliability analysis.
DCell scales doubly exponentially as the node degree increases. It has fault tolerant feature and provides higher network capacity than the traditional tree-based structure for various types of services. Figure 1 shows a DCell network comprising of 5 DCell0 networks. By considering each DCell0 as a virtual node, these virtual nodes then form a complete graph. The design goals include infrastructural scalability for incremental expansion, fault tolerant against various types of system failures and high network capacity to support bandwidth intensive services. The advantages of DCell recursive model include [37]: Doubly exponential scaling, High network capacity, Large bisection width, Small diameter, Fault-tolerance, requires only commodity network components, Supports an efficient and scalable routing algorithm. The challenges includes: load balancing issues among the links in all-to-all communication, and large latency presence owing to its recursive structure [40].
An effort was made in [31] to study server failures and hardware repairs for large datacenters while carrying out detailed analysis of failure characteristics as well as a preliminary analysis on failure predictors. The work in [32] focused on achieving web service reliability via a middleware approach (MA). The work proposed a reliable service architecture using middleware (RSAM) that achieves the reliable web services consumption. Similar works on distributed systems reliability including hardware reliability are detailed in [33], [34] and [35].
NON-RECURSIVE OPENFLOW NETWORK STRUCTURE
DATACENTER
In this section, we briefly describe a cloud data center’s architecture to aid the understanding of reliability concepts in practical context. Then, we propose the hazard function characterization for reliable deployment in the cloud while using QoS metrics to quantitatively carry out reliability assessment of the system.
Data Center Systems In general, there are a large number of data center architectures (server-centric or switch-centric models) [36], and [37]. The focus of the reliability study is on DCell [38] and the BCube [39] as well as other scalable architectures.
Figure 1: Recursive DCell1 network when n=4 [38].
Recursive BCube Architecture Similar to DCell, BCube is a recursively defined structure shown in Figure 2. It offers a server-centric structure with multiple ports having a switch that connects a constant number of servers. A BCube1 is constructed from n BCube0s and n n-port switches.
Recursive DCell Architecture The DCell model [38] is a recursively defined structure, in which a high-level DCell is constructed from many low-level DCells at the same level fully connected with one another.
Figure 2: BCube is a leveled structure [39].
6553
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com The work explains that BCube offers fault tolerant, load balancing and significantly improves bandwidth-intensive applications but suffers from memory, speed and redundant overhead. Similarly, these characteristics are also found with DCell. Some major demerits of BCube include: graceful performance degradation as the server and/or switch failure rate increases; path adaptation overhead, complex wiring during deployment, overhead resulting from the introduction or more load balancer and aggregators.
tolerant architecture be re-engineered to prevent continuous component failures
CLASSICAL THEORIES DCCN Failure-Rate Time Curve The failure characteristics of previous DCCN follow the bathtub curve shown in Figure 3. The component constituent of the DCCN include: load balancer, OpenFlow gateway, multilayer switches, type-1 cloud servers, gigabyte Ethernet, and configuration logical files. This curve is very significant in the study of reliability of NROD design. From Figure 3, the initial failures during the time to are caused due to reasons such as defective network elements, poor design configuration, poor quality control, poor resource allocation and segmentation. This period is referred to as the DCCN infant mortality/bun-in period, and is characterized by a quick declining failure rate.
Owing to reliability issues in the above networks, the traditional tree structure used for connecting servers in datacenters creates correlation vulnerability failures for distributed cloud computing applications. Hence, a novel Non-recursive design is needed to address scalability, availability and QOS requirements in a large network. The reliability issues is also applicable to other data center designs such FAT-tree[41],VL2[42],Monsoon[43],Dipillar [44], Ficonn [45], DRWeb [46], SVLAN [47], Scafida[48] and MDCube [49]. Therefore, it is imperative that a reliable fault
Figure 3: DCCN Bathtub Curve.
By replacing the defective components with newer ones, the system reliability improves. The next phase of failure in the DCCN occurs in the normal (operating) period to . The failures are known as random failures and occur due to random overloads and other extreme operating conditions. The operating period is characterized by a relatively constant failure rate. After , ageing or wear-out failures occur due to excessive wear after the expected design life has been exceeded. The wear-out period is characterized by a rapidly increasing failure rate. From the DCCN view point, the operating period is the focal point for reliability study.
DCCN Reliability Hazard Function Consider a fixed number of DCCN identified components tested simultaneously under similar operating conditions j. Let
= Number of identical components tested.
( ) = Number of components surviving at time t ( ) = Number of components failed at time t As the verification test continues, the number of surviving components gets smaller and the number of failed components
6554
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com gets larger. Therefore, the reliability of the component at time t is the given by ( )
( )= Since ( )
( )
=
( )
=
This implies that the hazard function or hazard rate is given by Equ .11
(1)
ℎ( ) =
( ) ( )
( )
=
(11)
( )
is fixed, Equ 2.1 becomes ( )
= −
Modeling DCCN Failure Rates
(2)
Consider the portion of the DCCN failure-rate curve in Figure 3, over the systems component life between times to . During this time, chance or random failures occurs and failures cannot be attributed to deterioration of the strength of the DCNN component over time t.
The rate at which component fail can be found as ( )
( )
= −
(3)
The failure rate during this period can be assumed as a constant( ).
The failure rate also known as the instantaneous rate of failure, hazard function, or hazard rate h(t) is designated as ℎ( ) =
( )
Divding both sides of Equ 3 by rate as ℎ( ) =
( )
.
( )
Using the relation ∫ ℎ( )
= −∫
ℎ( ) = ( ) gives failures per unit time.
(4)
( )
=−
.
( )
Now, from Equ 11, we now have
( ), this gives the failure
( ) = ℎ( )[1 − =
( ) ( )
( )
= − ln ( )
=( )
The notation ( ) and ( ) are used in place of ( ) and ( ) for simplicity to denote the probability density and distribution function of T. (6)
∫ ℎ( )
− ∫ ℎ( )
(13)
Now, ( ) =
Thus, the reliability of the component at time t can be expressed as ( )=
( )] = ℎ( )
(5)
(0) = 1, integration of Equ. 5 gives ( )
(12)
− ∫ ℎ( )
=
(14)
Equ. 13 denote an exponential density function for failures per unit time. Again, let’s consider the wear-out region of the failure-rate curve, (after time ). In general, this is a nonlinear curve. If the variation is assumed for simplicity, we can have,
(7)
Also, the failure rate can be expressed in terms of the probability distribution function of the systems components life as follows.
Equ. 13 denote an exponential density function such that
The distribution function of the time to failure T of DCCN components is given by Equ.8
if and denote the failure rates at respectively, we have
( )=∫
( )
ℎ( ) =
(15) , and
=
(8)
= (16)
=
( )= (9)
(17)
The probability density function of the failure time can be determined using Equ. 11 and 9 as
This gives Equ.10 ( ) = ℎ( ) In [1 −
( )] = − ∫ ℎ( )
( ) (
− ∫ ℎ( )
=(
(10)
+
)
−
+ (18)
and
From Equ 10, upon differentiation, this yields −
,
and
The reliability of the DCCN components is then given by ( ) = ( > ) = 1 − ( ≤ )1 − − ∫ ℎ( )
+
( )=
= −ℎ( ) )
6555
( ) ( )
=
−
−
(19)
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com The density function gives by Equ.18 degenerates to Rayleigh density function if = 0.
Modeling DCCN Mean Time Before Failure (MTBF) The MTBF i.e., the expected life of the DCCN component and systems is defined as the average value of T.
Similarly, the early portion of the failure-rate function, corresponding to failures due to vendor defects, and traffic issues, can be approximated for similarity by a linear relation as: ℎ( ) =
+
Now, the MTBF is derived as ∞
MTBF= ∫ − ( )∏
(20)
+ ∫
( ) = 0 at = ∞. Also, reduces to
(21)
∞
and
MTBF= ∫
=
∞
= −∫ ()
= (23)
Since all DCCN system fail after a finite time t under classified conditions (vendor device failure mode, huge traffic workload, resource allocation wreck, etc), we have ( ) → 0 as → ∞, and hence,
Where and are constants. If and denote the failure rates at times and respectively, the constants and can be evaluated as =
( )
∞
()
( ) = 0 at = 0 . Thus, Equ. 23
()
(24)
Table 1 shows the computed parameters for DCCN system characteristics. It is feasible to obtain their failure rate of a DCCN component for which the time to failure is described by an exponential density function.
(22)
The probability density and reliability functions of the failure time can be expressed as in Equ 18 &19 with and determined from Equ 21 and 22.
Table 1. DCCN Components Failure Rate. Time of Observation t(hours) 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000
No of components operating at time t 10,000 8,880 8,300 7,918 7,585 7,274 6,968 6,668 6,375 6,088 5,808 5,535 5,269 5,011 4,760 4,517 4,237 3,864 3,396 2,819 2,219
Reliability ( ) =
1.0000 0.8880 0.8300 0.7918 0.7585 0.7274 0.6968 0.6668 0.6375 0.6088 0.5808 0.5535 0.5269 0.5011 0.4760 0.4517 0.4237 0.3864 0.3396 0.2819 0.2219
6556
( )
Rate of Change of ( ) Reliability − = 0,000000 0.002240 0.001160 0.000764 0.000666 0.000622 0.000612 0.000600 0.000586 0.000574 0.000560 0.000546 0.000532 0.000516 0.000502 0.000486 0.000560 0.000746 0.000936 0.001154 0.001200
∆ ∆
Failure rate ℎ( ) () − . () 0.000000 0.002240 0.001306 0.000920 0.000842 0.000820 0.000842 0.000862 0.000878 0.000900 0.000920 0.000940 0.000962 0.000980 0.001002 0.001022 0.001240 0.001760 0.002422 0.003398 0.004256
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com ( )=
;
≥ 0;
access point functions. It has other features for interconnecting networks. Figure 4 serves as the native hotspot system for the wireless link connectivity. The RouterBoard OS is licensed in service level-1 for X86 servers used in the network. Essentially, the network have three types of infrastructure components: hardware (e.g., servers, switches, power supplies, and cooling systems), software (e.g., software and firmware deployed at hardware components), and network (e.g., network connectivity across hardware components). Failures in the network components (IaaS) lead to failures in the cloud. Also, the Infrastructure components could share common dependencies which can lead to correlated failures. In effect, once the common dependencies of the network fail due to implosive factors, all components relying on these dependencies will naturally fail also. A cloud management tool- VMware, is used to manage the infrastructure. It offers a rich set of management features, such as monitoring the states and dependencies of infrastructure components, and how they are connected in the cloud. As shown in Figure 4, 5 and 6, the UNN Wi-fi connectivity to datacenter network is presented.
>0
From Equ. 9, the reliability function can found as ( ) =1−
( )= ∫
=
;
≥0
From Equ. 11, the failure given as ℎ( ) =
( ) ( )
=
=
(25)
DATA CENTER CASE STUDY UNN ICT Innovation Center In this Section, we present the case of a very large University network model in Nigeria, i.e., UNN. The reliability study in UNN datacenter in Figure 4 focused on the server-centric model driven by MikroTik (a Linux-based operating system i.e., MikroTik RouterOS) deployed on a proprietary hardware (Router-Board). This turn the system into a network router with supported features, such as firewalling, virtual private network (VPN) service and client, bandwidth shaping and quality of service, wireless
Figure 4: UNN Wi-fi Connectivity to Datacenter Network (Authors Field survey with permission, 2017).
6557
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com
Reliability Point of Failure
Figure 5: UNN Wi-fi Connectivity to DataCenter Network (Authors Field survey with permission, 2017)
Figure 6: UNN Wi-fi Connectivity to DataCenter Network (Authors Field survey with permission, 2017).
Figure 5 shows the points of failure viz: Main-one domain, MTN and UNN domains (Nsukka and Enugu campuses). Without integrating reliability in the network components and systems, the system will not be viable in the long run. From Figure 6, Microsoft windows application called Winbox provides a graphical user interface for the RouterOS configuration and monitoring. The RouterOS also allows access via FTP, telnet, and secure shell (SSH). The Winbox was used as an Application Programming Interface (API) for direct access from the pilot monitor applications into the server for management and monitoring. The Wireshack protocol analyzer filter provided an excellent tracking tool on the UNN DCN server. The filter as shown in Figure 6 was used to filter TCP and HTTP traffic pattern on real time basis. The filter enabled extraction of different data flows that occur during an HTTP connection. Though the DCN support a wide range of applications such as search, video streaming, instant messaging, map-reduce, and web
applications, the network analyzer too provided aggregate traffic statistics from all devices. As indicated in Figure 6, the wireshack tool is configured to poll a device every 5 minutes (for daily data collection) and record the captured data-points for an active interface. These data-points include: inoctect (the number of bytes received), outoctect (the number of bytes sent), indiscards (the number of input packets discarded), in errors (the number of input error packets), outerrors (the number of output error packets), and outdiscards (the number of output packets discarded). Packet discards usually indicate congestion. Data was collected from all of the devices in the various VLAN interfaces of the server centric network.
Macroscopic Traffic Analysis From the field study, the UNN ICT network satisfies the requirements of DCell and BCube models. The tie-three architecture was used to capture network frames tabularized in
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com Table 2. The operational network has: frame number, frame data capture time, source address, destination address, throughput and latency. From the captured data, network evaluation was carried out. It was observed that the traffic trend in the data center was quite bursty with undulating oscillations, which accounts for the unreliability, unpredictability of idle links, low bandwidth scalability and
makes network less applicable and insufficient to serve the users of web services. This throughput plot shows a high level of fluctuation and irregularity as frame data capture time increases. Such fluctuations and irregularities together with the sharp fall in the throughput at 160seconds are totally unacceptable for efficient web application integration in enterprise organizations.
Table 2: Operational Network Frame Data Capture (Authors Field survey, 2017). Frame Number
Time (s)
Source
Destination
Latency (seconds)
Throughput
1
10.000000
192.168.0.1
239.255.255.250
0.000000000
304.8
2
20.000004
192.168.0.1
239.255.255.250
0.000004000
296.8
3
30.000006
192.168.0.1
239.255.255.250
0.000002000
252.8
4
40.000315
192.168.0.1
239.255.255.250
0.000309000
295.2
5
50.000318
192.168.0.1
239.255.255.250
0.000003000
296.8
6
60.000319
192.168.0.1
239.255.255.250
0.000001000
304.8
7
70.000745
192.168.0.1
239.255.255.250
0.000426000
300.0
8
80.000747
192.168.0.1
239.255.255.250
0.000002000
252.8
9
90.000749
192.168.0.1
239.255.255.250
0.000002000
309.6
10
100.000905
192.168.0.1
239.255.255.250
0.000156000
284.0
11
110.000974
192.168.0.1
239.255.255.250
0.000069000
252.8
12
120.001041
192.168.0.1
239.255.255.250
0.000067000
296.8
13
130.001397
192.168.0.1
239.255.255.250
0.000356000
303.2
14
140.001399
192.168.0.1
239.255.255.250
0.000002000
252.8
15
150.001401
192.168.0.1
239.255.255.250
0.000002000
245.6
Galaxy Backbone Traffic Analysis
result of these distortions, the average received throughput was very low with 93.2824Mbps. It was therefore concluded that for an aggregate throughput T of 535.7003 Mbps, the transmitted throughput Tx of 442.4179Mbps represents 82.59%, while the received throughput Rx of 93.2824Mbps only represented 17.41%.
Another DCN model study was carried out at the galaxy backbone Plc, Abuja-Nigeria. From their network operating center (NOC) as a DCell/BCube three tier topology, the plots of Figure 7 (throughput behavior) and Figure 8 (bandwidth utilization) were obtained. The analysis was carried out on the server SolarWinds Orion Network performance monitor 9.1 SP4. Considering Figure 7, between the 6am to 9am in the morning, the transmitted throughput was appreciably satisfactorily, but suffered heavy traffic distortion between 12.30pm to 3:00pm before dropping and stabilizing from 6:00pm to 4am. At the 95 th percentile, the transmitted throughput was 442.4179Mbps. Also, from the graph, as a
As a result, the bandwidth utilization for incoming traffic into the server is much higher than for the outgoing traffic from the server as shown in Figure 8. This type of network cannot scale efficiently for any distributed cloud application. Hence, need for a reliable failure proof scheme at the server end. This reliability should also be built in a load balancer at the server gateway and firewalls respectively.
6559
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com
Figure 7: Throughput response in Galaxy three-tier DCN, (Authors testbed with permission, 2014).
Figure 8: Bandwidth Utilization in Galaxy three-tier DCN, (Author’s testbed with permission, 2014).
Figure 8 shows a traffic reliability trend. Evidently, the bandwidth utilization for incoming traffic into the server is much higher than for the outgoing traffic from the server. This is as a result of none-spreading of the traffic workload by a reliable load balancer in a recursive two layered architecture. The response plot of the graph in Figure 7 and 8 can result in a very unstable network. This is as a result absence of a localized fault tolerance feature in the network.
heterogeneous expansion, while enabling high capacity, short paths, and resilience to failures.
From the discussions so far, this work summarized the following observations from the analysis of UNN and galaxy backbone DCNs as the case study under the following [14]:
Three tier DCN models is highly inflexible and unsuitable for handling complex web applications as the plots shows lots of service instability and high utilization of resources. The problems with this type of model will lead to the significant incremental and
The study on the predictability of a large number of packet traces of different traffic classes collected on DCNs shows that the network behavior can change considerably over time and space. Enterprise cloud application services, Big data services, etc., cannot scale well in traditional DCNs since these models lack efficient load balancing, poor resource allocation and scheduling as well as poor server consolidation.
Non Recursive DCCN In this paper, we focus on deploying a NROD algorithm in the DCCN. The system first absorbs the reliability requirement
6560
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com for QoS performance. After obtaining the reliability requirement, the DCCN provider the available dependency components are accessed for deployment reliability. All the component dependencies are tested for correlated failures. Table 3 shows the algorithm for NROD which supports endto-end communication among the different components of the
DCCN. Also, it isolates correlated failures, and provisions services on-demand. Google approach is used for external connectivity similar to ReCloud model. Six power supplies are added into the DCCN as additional dependencies. Logical integration was used for other components and systems in the DCCN (NROD).
Table 3. A robust Non-Recursive Algorithm Algorithm 1: NoN-Recursive OpenFlow DCCN Construction Algorithm. /* l stands for the level of NROD.DCCNs subnet links, n is the number of nodes in a cluster DCCNlb , pref is the network prefix of DCCNlb ; s is the number of servers in a Non-Rescursive.DCCNlb cluster*/ Build Non_recursive.DCCNs (l, n, s) Section I: /* build DCCNs */ If (l = = 0) do For (int i = 0; i < n; i++) /* where n is=4*/ Connect node [pref, i] to its switch; Call Non-recursive OpenFlow load balancer ( )//logical Isolation Return; Section II: /*build Non_recursive.DCCNs servers*/ Call Hazard function ( ); For (int i = 0; i < s; i++) Build Non_recursive.DCCNs ([pref, i], s) Connect DCCNs (s) to its switch; Check for reachability ( ); Call reliability auditing ( ) Return; End if End
SYSTEM VALIDATIONS
prototyping, and fault tolerance schemes in its design approach.
Reliable NROD Experimental Setup
Evaluation Results
For validation of the NROD, this work used the design parameters obtained from selected production testbed (UNN DCN and Galaxy Plc. Nigeria) to develop a generic template in Riverbed Modeller version 17.5. Based on the experimental measurement carried out on the testbed, the work obtained the metrics used for performance validations of DCCN in comparison with two closely related datacenter network architectures, (i.e., recursive_DCell and recursive_BCube). A #C-code event trace file in Riverbed Modeler was used to configure the three instance scenarios which were mapped to Non-recursive DCCN, recursive DCell, and BCube respectively. For each of the scenarios, the attributes of the three architectures were configured on the template before the simulation execution. Afterwards, the Riverbed Modeller engine generates the respective data for each of the QoS investigated in this work. On the application configuration, the hazard function, MTBF, etc was implemented. The NROD, recursive_BCube and recursive_DCell simulated in Server built on Intel Xeon quad-core 2.26Hz and 48GHz RAM share. NROD DCCN uses only a low-end OpenFlow load balancer and provides better one-to-x support at the cost of multi-ports per-server. The system executed job dispatch on the wings of the server clusters. The proposed Nonrecursive OpenFlow DCCN uses the discrete algorithm in Table 3. It features logical isolations, OpenFlow load balancer, server virtualization, and cluster server convergence to enhance its performance. On the other hand, recursive_BCube uses active probing for load-balancing. Recursive_DCell employed neighbor maintenance, link state management,
Three experiments were carried out to investigate the impact of logical isolation, and reliability hazard functions on the NROD DCCN. To access the reliability of the respective models, there is need to produce the failure status for the network components in all cases and perform a route- andcheck process in the scenarios. Monte-Carlo sampling approach [51] was used in the validation run metrics such as queuing delays, throughput, fault-tolerance, network capacity, resource utilization, service availability, resource availability, delay, scalability were analyzed. The results between the recursive models and the Non-recursive architecture using the above metrics were compared. The simulation completed successfully and the results were collated in the global and object statistics reports. The validation plots of the three architectures under study are discussed below. Figure 9 shows the reliability dynamics of Non-recursive queuing delays on the DCCN. By introducing logical isolation, lower queuing delay was observed in the NROD. In this experiment, five servers were attached to the Nonrecursive clusters for instance; servers are 0000, 0001, 0010, 0100, and 1000. Four TCP connections were enabled viz: A B, BC, CD DE, EF. On the network, all the connections send data at a great speed. Server A needs to forward packets for all the other four servers. By varying the MTU size of the NICs and measure the network delay and CPU overhead at server A, lower mean delay was observed for NROD.
6561
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com
Non-Recursive Queuing Delays (Sec)
1.20E-07
Non-Recursive OpenFlow DCCN with Logical Isolation: Avg. Queuing Delay (sec)
y = -5E-13x + 1E-07 R² = 0.0532
1.00E-07
Non-Recursive OpenFlow DCCN without Logical Isolation: Avg. Queuing Delay (sec)
8.00E-08
Linear (Non-Recursive OpenFlow DCCN with Logical Isolation: Avg. Queuing Delay (sec))
6.00E-08 y = 5E-12x + 4E-08 R² = 0.5599
4.00E-08
Linear (Non-Recursive OpenFlow DCCN without Logical Isolation: Avg. Queuing Delay (sec))
2.00E-08 0.00E+00 0
2000
4000
6000
8000
Observation Time T (secs) Figure 9: Reliability constraints of Non-recursive queuing delays.
Non-Recursive Utilization
3.50E-06
Non-Recursive OpenFlow DCCN with Logical Isolation: Avg. Utilization Non-Recursive OpenFlow DCCN without Logical Isolation: Avg. Utilization
3.00E-06 2.50E-06 2.00E-06 1.50E-06 1.00E-06 5.00E-07
0.00E+00 0
500
1000
1500
2000 2500 3000 Observation Time T (secs
3500
4000
4500
Figure 10: Reliability constraints of Non-recursive resource utilization.
Figure 10 demonstrates the result of reliability behavior of Non-recursive resource utilization (NRU) with Monte-Carlo sampling. With NRU, an efficient packet forwarding engine predicts the next hop of a packet by only one lookup-table. The NRU forwarding engine comprises two system components: a neighbor status table (NST) and a packet forwarding procedure (PFP). The NST is maintained by the neighbor maintenance protocol. Every entry in the table is logically isolated. Without reliability by logical isolation, the
Non-recursive OpenFlow model will suffer from unnecessary high resource utilization as shown in Figure 10. This could lead to correlated failure. Figure 11 shows the reliability trend line of NROD with Monte-Carlo sampling. This is very acceptable given the elastic nature of the cloud environment. An increase in the number of components creates redundancy and does not affect the reliability of the network.
6562
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com 1.2 y = 1E-04x + 3E-15 R² = 1
Reliability R(t)
1 0.8
Non-Recursive OpenFlow DCCN Reliability
0.6 0.4 0.2 0 0
2000
4000 6000 8000 No of components operating at time t
10000
12000
Figure 11: Reliability trend line of Non-recursive OpenFlow DCCN.
Non-Recursive Fault Tolerance
1.02 1
Non-Recursive OpenFlow DCCN with Logical Isolation: Fault Tolerance Non-Recursive OpenFlow DCCN without Logical Isolation: Fault Tolerance
0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0
2000
4000 6000 Observation Time t
8000
10000
Figure 12: Reliability performance of NROD with Fault Tolerance.
In Figure 12, Monte-Carlo sampling incurs a significant cumulative overhead for the reliable system deployment over recursive DCell and BCube deployment respectively. Hence, the absence of fault tolerance in NROD can lead to failure. However, the plot shows fault tolerance impact on the entire architecture leading to high reliability.
Non-recursive structure, 10 servers are divided into 5 groups and servers in each group are connected to a reliable firstlevel mini-multilayer-switch. Similar to BCube model, the reliable first-level multilayer-switch are connected to a second-level mini-multilayer-switch. Figure 13 summarizes the per-server throughput of the experiments. Compared with the recursive structure, NROD achieves satisfactory speedup/throughput compared with the other models for oneto-one, one-to-several, one-to-all, and all-to-all traffic patterns. Now, under Monte-Carlo sampling, the NROD maintained a reliable throughput of 38.30%, while the recursive BCube and DCell had 36.37% and 25.53% respectively. This result is similar to [14] which introduced a two-tier architecture, OpenFlow load balancer, and virtualization as reliability components. Hence, a dependable network that can withstand correlated failures.
Figure 13 shows an interesting throughput observation considering the reliability hazard function. The experiment demonstrate NROD support for one-to-one, one-to-several, one-to-all, and all-to-all traffic patterns. By putting maximum translation Unit (MTU) to 10KB, reliable server throughput is studied. In the one-to-i experiments, a source server sends 10GB data to all the node servers in a cluster. In the all-to-all experiment, each server sends a total 10GB data to all the other servers. We compare NROD with a two-level tree. In the
6563
Throughput Reliability Harzard Function (Bytes/Sec)
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com 0.00002 0.000018 0.000016 0.000014 0.000012 0.00001 0.000008 0.000006 0.000004 0.000002 0
Non-Recursive OpenFlow DCCN with Reliability Hazard Function: Avg. Throughput (Bytes/Secs) Recursive_BCube_DCCN without Reliability Hazard Function: Avg.Throughput (Bytes/Secs) Recursive_DCell__DCCN without Reliability Hazard Function: Avg.Throughput (Bytes/Secs)
0
10
20 Obseravtion tiime t
30
40
Figure 13. Reliability hazard function validation with Throughput Metric.
40000 Non-Recursive OpenFlow DCCN with Reliability Hazard Function: Avg.Scalability
35000 30000
Recursive_BCube_DCCN without Reliability Hazard Function: Avg.Scalability
DCCN Scalability
25000 20000
Recursive_DCell__DCCN without Reliability Hazard Function: Avg.Scalability
15000 10000 5000 0 0
2000
4000
6000
8000
10000
12000
Observation Time t Figure 14. Reliability hazard function validation with Scalability.
Figure 14 illustrates the scalability comparison plots. Usually, the recursive traditional designs deliberately incorporate excess capacity upfront since future expansion of power and cooling capacity is somewhat difficult and costly. These models are inefficient and unreliable. The proposed NROD leveraged reliability hazard function to achieve optimal performance. In this case, the work deployed facility modules that remove wasteful over-sizing inclination of the legacy models. The proposed NROD is standardized, modular architecture that makes scale-up or scale-down/reducing capacity to meet real-world demand much easier. With efficient, integrated power and cooling components, it offers significant cost compared to a typical oversized recursive data center models. From Figure 14, we adopted closed-couple cooling, package chiller with economizer, factory assembled integrated components/controls, smaller compact core modules, efficient UPS, and cooling controls in the production environment. Besides, OpenFlow load balancer and full scale
virtualization were employed to achieve a relative scale compared with the recursive DCell and BCube models as shown in Figure 14. Figure 15 illustrates DCCN resource availability estimation, NROD continuous-time markov chains (CTMC) and reliability determined the system availability. This offers Tier4 availability which supports traffic workload without failures. NROD-CTMC takes the snapshot of the interaction between DCCN Utility, backup diesel generator (BDG), UPS, Automatic Transfer Switch (ATS), Power Distribution Units) (PDU) units and creates component redundancy for failure tolerance. By introducing power-feed-topology, and component capacity as well as virtualization, reliability is realized. Also, component redundancy, and UPS placement leads to availability of the network utility, BDG, UPS subsystems. The tradeoff is that NROD configuration with redundancy has significant cost among other centralized
6564
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com configurations. As shown in Figure 15, NROD-CTMC with virtualization improves the resource availability by 40%. Recursive DCell and BCube had 33.33% and 26.67
respectively. With reliability hardware function, traffic workloads could be processed with near Tier-4 availability (0.999999). Non-Recursive OpenFlow DCCN with Reliability Hazard Function: System Availability Recursive_BCube_DCCN without Reliability Hazard Function: System Availability
0.05 0.045 System AVailability
0.04 0.035
Recursive_DCell__DCCN without Reliability Hazard Function: System Availability
0.03 0.025 0.02 0.015 0.01 0.005 0 0
2000
4000
6000 8000 Obseravtion time t
10000
12000
Figure 15. Reliability hazard function validation with System Availability.
0.00011
Non-Recursive OpenFlow DCCN with Reliability Hazard Function: Avg. Queuing Delay (Secs) Recursive_BCube_DCCN without Reliability Hazard Function: Avg. Queuing Delay(Secs) Recursive_DCell__DCCN without Reliability Hazard Function: Avg. Queuing Delay(Secs)
Queuing Delay (Secs)
0.000108 0.000106 0.000104 0.000102 0.0001 0.000098 0
2000
4000 6000 8000 Obseravtion time t
10000
12000
Resource Utilization
Figure 16. Reliability hazard function validation with Queuing Delay.
500 450 400 350 300 250 200 150 100 50 0
Non-Recursive OpenFlow DCCN with Reliability Hazard Function: Avg. Resource Utilization Recursive_BCube_DCCN without Reliability Hazard Function: Avg. Resource Utilization Recursive_DCell__DCCN without Reliability Hazard Function: Avg.Resource Utilization
0
0.0001
0.0002
0.0003 0.0004 0.0005 Obseravtion Time t
0.0006
0.0007
Figure 17. Reliability hazard function validation with Resource Utilization.
6565
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com From figure 16, the results of NROD-DCCN queuing delay is described. The simulation is executed with K = 55 servers connected using two-tier network topology. An oversubscription ratio of 1:5 is used at each level. The arrival of remote tenant requests is a Poisson process with a delay constraint. Usually, by varying the average arrival rate under Monte-Carlo sampling, the average datacenter delay is analyzed. Each tenant site runs a job that transfers a given amount of data between its VMs. From the tenants, every assigned job has an assigned computational time. Under a reliable network, a transactional job completes when its entire job flows finishes and the compute time exhausted. In this scenario, the queuing delay focuses on predictable message latency for DCCN traffic. The network needs guarantees for reliable network bandwidth, packet delay and flexibility. But from the plot, NROD offers a guaranteed network bandwidth which makes it easier to guarantee packet queuing delay. From the plot, reliability and locality aware-placement for multi-tenanted traffic makes NROD reasonable for handling time constrained tracks. The reliability plot shows lower latency for NROD.
capacity for on-demand traffic. The NRI routing algorithm improved reliability in the DCCN topology unlike the recursive switch based designs. Various validation cases, showed NROD as a reliable re-engineered model. Future work will investigate traffic workload and migration policy-aware availability model while comparing NROD with ReCloud and INDaaS using Monte-Carlo sampling approach in CloudSim [52]. ACKNOWLEDGEMENT This research is an extended work from SGEMS project supported/partially supported by Federal Government of Nigeria via TETFund (FUT/DVC (Acad.)/GEN 92/II/90). We thank our colleagues from Swift Networks, UNN Data Center Network Team, Galaxy backbone, and Cisco System Inc., who provided insight and expertise that greatly assisted this research. Special appreciation goes to Prof. O. U.Oparaku, for his guidance and encouragement in course of this research work. Finally, Dr. Rajkumar Buyya; a Redmond Barry Distinguished Professor and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia, deserve special recognition for all the extended mentorship and assistance via resourceful materials that greatly improved the manuscript
Figure 17 shows the reliability hazard function validation on resource utilization. Recall, that a datacenter network topology as shown in Figure 4, 5 and 6 fosters significant role in determining the level of failure resiliency, ease of incremental expansion, communication bandwidth and latency. For resource utilization, it was remarked that the traffic be uniformly distributed across all the links in the DCCN clusters. With a very low utilization for spitting the sending/receving and forwarding throughput into two links, this shows that NROD play a good role in balancing traffic. From the result, the propoed NROD with reliability harzard function counts the number of path probings and path switchings to retain redundancy. In this case, all the TCP connections probed the network with little delay. The result shows that NROD is highly efficient and robust in utilizing resources (minimal resource utilization).
REFERENCES
CONCLUSION This paper presents the design and implementation of NROD as novel network architecture for UNN enterprise scaled shipping-container-based modular data center as well as galaxy backbone models. It deploys components such as network ports at each server with mini-multilayer switches as crossbars. With the embedded Non-recursive routing intelligence (NRI) at the DCCN server side, NROD creates an intelligent server-centric model. NROD utilizes available dependency details (e.g., hardware, software and/or network dependencies) and monitors correlated failures to perform quantitative QoS reliability assessment for DCCN applications. This is done with careful error elimination even in a complex deployment context. NROD balances reliability and application performance in general. Experimental results derived from simulation implementation reveals that NROD with any number of components and hosts, offer excellent efficiency and reliability considering the Bathtub curve during service time. This work showed that NROD significantly accelerates one-to-i traffic patterns and provides high network
6566
[1].
Vladimir O. Safonov, “Trustworthy Cloud Computing” Wiley-IEEE Computer Society Pr; 1st edition ISBN-13: 978-1119113508, 2016.
[2].
Patricia Moloney Figliola, Eric A. Fischer, “Overview and Issues for Implementation of the Federal Cloud Computing Initiative: Implications for Federal Information Technology Reform Management, A report-Congressional Research Service, 2015, www.crs.gov
[3].
“Netflix”, online] Available:(http://www.netflix.com, Retrieved on 25th, March, 2018.
[4].
“Spotify”, online] Available: (http://www.spotify.com/us. Retrieved on 25th, March, 2018.
[5].
“YouTube”, online Available: http://www.youtube.com, Retrieved on 25th , March, 2018.
[6].
“Google Drive” [online] Available: https://drive.google.com), Retrieved on 25th, March, 2018.
[7].
“Dropbox”, [online] Available: (https://www.dropbox.com)
[8].
“Apple’s iCloud online] Available: http://www.apple.com/icloud, Retrieved on 25th, March, 2018.
[9].
“Microsoft’s Skydrive” [online] Available:(http://windows.microsoft.com/enUS/skydrive/home, Retrieved on 25th, March, 2018.
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com [10].
“Hulu” [online] Available: http://www.hulu.com, Retrieved on 25th, March, 2018.
[11].
G. McGrath, J .Short, S.Ennis, B. Judson, P.Brenner,“Cloud Event Programming Paradigms: Applications and Analysis, In Proc. 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), 2016: DOI:10.1109/CLOUD.2016.0060.
[12].
[13].
[14].
[15].
[16].
[17].
[18].
[19].
[24].
Phillipa Gill, N. Jain and N. Nagappan. “Understanding Network Failures in Data Centers: Measurement, Analysis and Implications”, In Proc. SIGCOMM 2011,Toronto, Aug. 18, 2011.
Ao Zhou, Shangguang Wang, Bo Cheng, Zibin Zheng, Fangchun Yang, Rong Chang, Michael Lyu, and Rajkumar Buyya, “Cloud Service Reliability Enhancement via Virtual Machine Placement Optimization”, In IEEE Transactions on Services Computing 2016.
[25].
Yuan-Shun Dai , Bo Yang, Jack Dongarra, Gewei Zhang, “Cloud Service Reliability: Modeling and Analysis”, [online] Available: www.netlib.org/utk/people/JackDongarra/PAPERS/C loud-Shaun-Jack.pdf.
Mina Sedaghat, Eddie Wadbro, John Wilkes, Sara de Luna, Oleg Seleznjev, and Erik Elmroth. 2016. DieHard: Reliable Scheduling to Survive Correlated Failures in Cloud Data Centers. In Proc. CCGrid. 52–59.
[26].
K. C. Okafor “Development of a Model for Smart Green Energy Management Using Distributed Cloud Computing Network”, Ph.D. Thesis, University of Nigeria Nsukka, 2017.
Nicolas Bonvin, Thanasis G. Papaioannou, and Karl Aberer. 2010. A self-organized,fault-tolerant and scalable replication scheme for cloud storage. In Proc. SoCC. 205–216..
[27].
Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Maltz, Xiaowei Yang, Lihua Yuan, and Ming Zhang. 2012. NetPilot: automating datacenter network failure mitigation. In Proc.SIGCOMM. 419–430.
[28].
W. Ahmed , Y.W. Wu, “ A survey on reliability in distributed systems”, Journal of Computer and System Sciences 79 (2013) 1243–1255
[29].
G. Guenzi, “Reliability evaluation of common-cause failures and other interdependencies in large reconfigurable networks. Ph.D thesis, University of Maryland, College Park, 2010.
[30].
Srikanth Kandula, Ratul Mahajan, Patrick Verkaik, Sharad Agarwal, Jitendra Padhye, and Paramvir Bahl. 2009. Detailed diagnosis in enterprise networks. In SIGCOMM. 243–254.
Mi J, Li Y, Huang H-Z, Liu Y, Zha ng X-L. Reliability analysis of multi-state system with common cause failure based on bayesian networks. Eksploatacja i Niezawodnosc – Maintenance and Reliability 2013; 15 (2): 169–175.
[31].
Vincent Liu, Daniel Halperin, Arvind Krishnamurthy, and Thomas E. Anderson.2013. F10: A FaultTolerant Engineered Network. In NSDI. 399–412.
K.V. Vishwanath and N. Nagappan, “Characterizing Cloud Computing Hardware Reliability”, SoCC’10, June 10–11, 2010, Indianapolis, Indiana, USA.
[32].
Amr S. Abdelfattah, Tamer Abdelkader, EI-Sayed M. EI-Horbaty, “RSAM: An enhanced architecture for achieving web services reliability in mobile cloud computing, In Journal of King Saud University – Computer and Information Sciences , 2017.
[33].
W. Ahmed, Y. W.Wu, A survey on reliability in distributed systems”, Journal of Computer and systems sciences, 2013, Pp. 1234-1255.
[34].
K.V.Vishwanath, N. Nagappan,”Characterizing cloud computing hardware reliability”, In Proc. SoCC’10, Proc of the 1st ACM Symposium on Cloud Computing, Pp.193-204, 2010.
[35].
M.Sipos, F.H.P. Fitzek, D.E.Lucani, M.V Pedersen, “ Distributed Cloud Storage Using network codeing”, In Proc. 11th IEEE, CCNC’ 2014, USA..
Paramvir Bahl, Ranveer Chandra, Albert G. Greenberg, Srikanth Kandula,David A. Maltz, and Ming Zhang. Towards highly reliable enterprise network services via inference of multi-level dependencies. In SIGCOMM. 13–24. 2007 Xu Chen, Ming Zhang, Zhuoqing Morley Mao, and Paramvir Bahl.” Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions”. In OSDI. 117–130. 2008 Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, and Peter Druschel.. Accountable Virtual Machines. In OSDI. 119–134, 2010.
[20].
Barry W. Peddycord III, Peng Ning, and Sushil Jajodia. 2012. On the Accurate Identi#cation of Network Service Dependencies in Distributed Systems. In LISA.181–194.
[21].
Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Maltz, Xiaowei Yang, Lihua Yuan, and Ming Zhang. 2012. NetPilot: automating datacenter network failure mitigation. In SIGCOMM. 419–430.
[22].
[23].
Reliable Application Deployment in the Cloud. In Proceedings of CoNEXT ’17 . ACM, New York, NY, USA, 14 pages . 2017: Available [Online] https://doi.org/10.1145/3143361.3143388.
Ennan Zhai, Ruichuan Chen, David Isaac Wolinsky, and Bryan Ford. 2014. Heading O! Correlated Failures through Independence-as-a-Service. In OSDI. 317–334 Ruichuan Chen, Istemi Ekin Akkus, Bimal Viswanath, Ivica Rimac, Volker Hilt.”Towards
6567
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 9 (2018) pp. 6551-6568 © Research India Publications. http://www.ripublication.com [36].
[37].
[38].
[39].
[40].
[41].
[42].
K.C. Okafor, I.E Achumba, G.A Chukwudebe, G.C Ononiwu, “Leveraging Fog Computing For Scalable IoT Datacenter Using Spine-Leaf Network Topology”, Journal of Electrical and Computer Engineering, Vol.2017, Article ID 2363240, Pp.1-11.
for Efficient Web Application Integration in Enterprise Organisations”, PhD thesis, Unizik, February, 2013.
C.C. Udeze, K.C. Okafor, C.C. Okezie, I.O.Okeke, G.C. Ezekwe, “Performance Analysis of R-DCN Architecture for Next Generation Web Application Integration”, In IEEE Xplore Digital Library, 6th IEEE International Conference on Adaptive Science & Technology (ICAST 2014), Covenant University Otta, 19th -31st, Oct,2014.Pp.1-12. (ICAST 2014). Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shiy, Yongguang Zhang, Songwu Luz,“DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers”, In Proc. of SIGGCOM Seattle, Washington, USA, Aug.17–22nd , 2008. C. Guo, G. Lu, D. Li, H.Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. “BCube: a high performance, server-centric network architecture for modular data centers”, In Proc. of the ACM SIGCOMM 2009 Conference on Data communication, (in SIGCOMM ’09), Pp.63–74, New York, NY, USA, 2009. ACM.
M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM, 2008. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. “VL2: a scalable and flexible data center network”, In Communications of the ACM, March 2011.vol. 54,no. 3.doi :10.1145/1897852.1897877. A. Greenberg, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Towards a next generation data center architecture: scalability and commoditization. In PRESTO ’08: Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow, Pp. 57–62, New York, NY, USA, 2008. ACM.
[44].
Yong Liao , Jiangtao Yin, Dong Yin , LixinGao, “DPillar: Dual-port server interconnection network for large scale data centers”, Elsevier Computer Networks 56 (2012) 2132–2147.
[45].
Dan Li, ChuanxiongGuo, Haitao Wu, Kun Tan, Yongguang Zhang, Songwu Lu, “FiConn: Using Backup Port for Server Interconnection in Data Centers”, In Proceedings of INFOCOM’ 09, 2009.
[46].
Udeze C.C, “Re-Engineering DataCenter Networks
K.C. Okafor and T.A Nwaodo: “A Synthesis VLAN Approach to Congestion Management in Datacenter internet Networks”, In International Journal of Electronics and Telecommunication System Research, (IJETSR), Vol.5, No.6, May 2012, Pp.8692.
[48].
LászlóGyarmati, Tuan Anh Trinh”, “Scafida: A Scale-Free Network Inspired Data Center Architecture”, InACM SIGCOMM Computer Communication Review, Vol.40, No 5, October 2010, Pp.5-12.
[49].
H. Wu, G. Lu, D. Li, C. Guo, and Y. Zhang. MDcube: a high performance network structure for modular datacenter interconnection”, In Proc. of the 5th international conference on Emerging networking experiments and technologies,( in CoNEXT ’09), Pp. 25–36, New York, NY, USA, 2009. ACM.
[50].
Riverbed Modeller- Available Online: https://supportstaging.riverbed.com/bin/support/static //doc/opnet/17.5.A/online/itguru_17.5.PL5/Models/w whelp/wwhimpl/common/html/wwhelp.htm#href=A pplications_Model_desc.02.05.html&single=true. Retrieved, 03/05/2018.
K.C.Okafor, G.N.Ezeh, I.E. Achumba, O.U. Oparaku, U.H. Diala, “DCCN: A Non-Recursively Built Data Center Architecture for Enterprise Energy Tracking Analytic Cloud Portal, (EEATCP), In AASCIT Computational and Applied Mathematics Journal, 2015, ISSN 2381 – 1218, 2015, USA,1(3): Pp.107-121.
[43].
[47].
[51].
Min Xu,, Xiuzhuang Zhou, Qirun Huo, And Haomin Liu, “ Efficient Stochastic Approximation Monte Carlo Sampling for Heterogeneous Redundancy Allocation Problem “, IEEE Access, Vol. 4, 2016.
[52].
P.Humane. J.N.Varshapriya, “Simulation of cloud infrastructure using CloudSim simulator: A practical approach for researchers” In IEEE Proc. ICSTM, 6-8 May 2015, Chennai, India. DOI: 10.1109/ICSTM.2015.7225415.
[53]
6568
CCIT Recommendations, Network Management: Concepts and tools, Arpege Group - 2012 Technology & Engineering, https://books.google.com.ng/books?isbn=940111290 8. Retrieved, 24/05/2018.