OpenCache: A Software-defined Content Caching ...

6 downloads 113064 Views 349KB Size Report
(CDNs) such as Akamai and Limelight to scale up their ability to deliver content in a timely fashion. In addition to this, Internet Service Providers (ISPs), including ...
OpenCache: A Software-defined Content Caching Platform Matthew Broadbent∗ , Daniel King∗ , Sean Baildon∗ , Nektarios Georgalas† , Nicholas Race∗ ∗ Lancaster University, United Kingdom {m.broadbent, d.king, s.baildon, n.race}@lancaster.ac.uk

Abstract—Network operators recognise that Content Delivery Networks are essential for meeting user Internet application and content demands. The infrastructure must be tightly integrated to provide request routing, content caching, load balancing, scalable, and reliable services whilst minimise deployment time and complexity. A major step towards achieving these goals is to embrace recent Software Defined Network and Network Function Virtualisation objectives and design principles. This paper outlines the motivation behind OpenCache, an experimental caching platform. It includes architectural design decisions, functional components and highlights the feasibility of virtualising the key processes of caching content and control functions. It also demonstrates how OpenCache is a logical extension to Software Defined Networking infrastructure. We provide examples of how OpenCache may be deployed to provide a flexible CDN infrastructure, and the benefits it can provide. These are demonstrated using load-balancing and fail-over use cases. Finally, we summarise future OpenCache research challenges and opportunities.

I.

I NTRODUCTION

Network operators have highlighted that delivery of content, especially video, is one of the major challenges to existing network operations. This is due to exponential growth of traffic and application demands, driven in part by the shift in demand away from terrestrial broadcast of television and video, to delivery via catch-up and video-on-demand services. These typically use unicast delivery over IP, which requires a unique flow to service each and every request. In parallel, the need to dramatically increase the amount of bandwidth to meet growing high-definition and ultra-high-definition consumption, places additional pressure on network resources. Deployment and integration of caching nodes, close to the user, is an effective and cost-efficient way to address the aforementioned challenges of content delivery. Content providers often use third-party Content Delivery Networks (CDNs) such as Akamai and Limelight to scale up their ability to deliver content in a timely fashion. In addition to this, Internet Service Providers (ISPs), including AT&T, BT Group and Telefonica, will often deploy additional equipment in their own networks to further satisfy the demand for content. Currently, CDN deployments use dedicated function nodes, which are typically built using physical appliances and coupled with proprietary software. Often, these physical nodes provide different capabilities and subsets of content, and as such, are deployed in parallel. These networks are used to deliver specific content services, but also to offer wholesale CDN services providing Over-The-Top (OTT) delivery (e.g. via transparent caching) of content, such as video and television. The timely and efficient delivery of content over the Internet is a long-standing area of research. Early web caching solutions became ubiquitous in many network infrastructures [5], [19]. However, since the introduction of these tools, Internet usage and applications have morphed into something quite

† British

Telecom, United Kingdom [email protected]

different. User-generated content and dynamic websites now dominate much of the traffic traversing the Internet. This has rendered these caches somewhat obsolete, as they were primarily designed for a static Web. Content delivery infrastructures have now become a ubiquitous element of the Internet, with hardware elements located in a multitude of different network locations. This is evidenced in the CORAL research project [13], which was a CDN platform hosted entirely using the world-wide PlanetLab experimental facility. However, it was a fixed infrastructure, and did not provide a means to experiment with the fundamentals of caching, nor interact directly with the behaviour of the cache itself. More recently, research has focused on how ISPs can work in conjunction with CDNs to improve the delivery of content to end-users [12]. Further research has shown the ever-shifting nature of content distribution [18], and how request patterns evolve over time. This highlights the continuing importance of improving content delivery and demonstrates the need for a tool that is configurable and flexible enough to meet the demands of a constantly evolving research field. In this paper, we discuss the key design objectives necessary to build an experimental content delivery platform. These are derived from existing state-of-the-art deployments, combined with consideration of emerging technologies. As a result of these requirements, we propose the OpenCache architecture in Section III. We also demonstrate the applicability of this using relevant industrially-driven use-cases in Section III-B. A core part of this architecture is the Application Programming Interface (API), which is described in Section IV. In order to highlight the flexibility afforded to this API, we have also developed a number of proof-of-concept applications, details of which are found in Section V. Furthermore, by utilising a large-scale cross-site testbed in conjunction with the OpenCache prototype [2], we evaluate these applications and demonstrate their effectiveness in implementing two important resiliency behaviours in Section V-C. We describe the future direction of research in this area in Section VII. Finally, we conclude and summarise the main contributions of this work in Section VI. II.

D ESIGN O BJECTIVES

In the current etymology of content delivery platforms, dedicated CDNs are increasingly used to provide scale distribution to match user demand. However, this is rarely a realtime capability and CDN platforms are over-engineered for worse-case scenarios. With the advent of Software Defined Networks (SDN) and Network Functions Virtualisation (NFV), the capability to scale given user demand becomes feasible, providing resilient and elastic CDN capability in response to both real-time and predicted demands. This section outlines the core requirements, design principles and enabling architecture, and interfaces for achieving our current and future objectives.

A. Content Delivery Network Requirements CDN is a generic term describing a set of common components, such as: Cache Controller, Cache Nodes, Surrogate Server, Load Balancer, Proxy, and Peering Gateway. Normally the Cache Controller will select a Cache Node (or a pool of Cache Nodes) for answering to the end-user request, and then redirect the end-user to the selected Cache Node. The Cache Node shall answer to the end-user request and deliver the requested content to the end user. The CDN Controller is a centralized component, and CDN Cache Nodes are distributed within the network or situated within a Data Centre [16]. For industry, core requirements when designing and deploying a CDN include: capital cost-efficiency, flexibility of content fulfilment, performance predictability, and bandwidth or latency guarantees. A final fundamental requirement is the need for the CDN to be resilient and reliable, beyond the capability to cope with a Distributed Denial of Service (DDOS) attack, the CDN must be capable of recovering from catastrophic failure that may affect the aforementioned CDN components [1]. B. Software Defined Networks Our objective is to detach the control plane from the underlying data plane and provide the forwarding functionality to be deployed based on the application profile and user demands. This allows the service or application (retrieving, forwarding and serving cached content for instance), and the control decisions behind the content selection and delivery process, to be tied together. The premise behind this decision is to ensure network and resource efficiency, creating data paths in response to the application, and instantiate application nodes (and function) at optimal locations within the network (where possible). It is important to provide this network and application control via a well defined interface (API), and grant these flexible capabilities to developers, third-party applications and optimisation heuristics. However, in opposition to other techniques, SDN allows this behaviour to be implemented quickly and cohesively: automation techniques may be used to setup end-to-end services, as we will demonstrate through experimentation in Section V-C. It is important that the flexibility described is available beyond the initial deployment, it should be possible for these paths and application nodes to be modified (torndown, resized, relocated) at any time, particularly in response to rapid changes in the operational environment. This includes revised network conditions, fluctuations in the resource location or availability, and in the event of partial or catastrophic failure. C. Function Virtualisation Virtualisation of CDN components is a core design principle necessary to create a content network that can be deployed rapidly and in a scalable way. Naturally, the first element to be virtualised would be the cache node itself. These can then be used to demonstrate the ability to expand in response to changes in demand, throughput, latency, etc. Deploying the various elements of a CDN asa collection of virtual appliances (Virtual Network Functions - VNFs) in a

Fig. 1: vCDNs Running on Standardised Commodity Hardware standardised commodity hardware environment, will create a virtual CDN (vCDN). This is demonstrated in Figure 1, and provides an architecture to address a number of the challenges mentioned in Section II-A. Furthermore, this virtualisation allows multiple isolated VNFs or unused resources to be allocated to other VNF-based applications during weekdays and business hours, facilitating overall IT capacity to be shared by all content delivery components, or even other network function appliances. Industry, via the European Telecommunications Standards Institute (ETSI), has defined a suitable architectural framework [9], including documented a number resiliency requirements [8] and specific objectives for virtualised CDN infrastructure [7]. Clearly, there is a need for a suitable experimental platform to drive and develop the next-generation of content delivery, driven by both academia and industry. D. Integration with Commodity Hardware The ability to deploy functions on virtualised infrastructure hosted on Intel x86 based commodity hardware provides a decoupling network function from the underlying hardware infrastructure. A function can then utilise a common resource pool, and exploit performance predictability where dimensioning remains stable whatever the use of virtualised hardware resources. Leveraging the virtualised hardware infrastructure with an intelligent hypervisor also allows the right balance of network I/O, CPU power and storage I/O performance (e.g., RAM and HDD). It also ensure compliance of cache nodes through hypervisor instrumentation, and facilitates the usage of monitoring and reporting tools. These can then be used to adjust IT resources to meet expected traffic demands and metrics, such as bandwidth throughput and latency. This architecture will also support the ability to softprovision and scale function elastically, based on real-time data plane traffic requirements and virtual machine or hypervisor performance metrics. To realise this capability the necessary internal and external APIs will also need to be developed and documented, both between the virtualisation infrastructure and applications themselves.

Load Balancer

Virtual Infrastructure Manager

Fail-over Monitor

OpenCache Controller

Application Layer

SDN Controller

Control Layer

Hypervisor OC Node

OC Node

HW

HW

OpenFlow Switch

Service 1

Service Layer

OC Node

Service 2

OpenFlow Switch

Network Layer

Service 3

Fig. 2: OpenCache Architecture E. Programmable Control Importantly, virtualised functions need to be configurable and flexible in order to meet the changing nature of their usage. Statically defining their runtime configuration significantly restricts the ability to react to any changes in their own context, or the resources they are managing. A clear and well-defined control mechanism grants operators the ability to interact with a function and define its behaviour according to changing requirements. Forcing this interaction through an API enables authentication, authorisation and accounting (AAA) to be provided and enforced for those utilising the API. This provides the functionality necessary to integrate existing billing mechanisms, and thus ensuring the capability to monetise the a content delivery platform. Programmable control also allows developers to scale applications regardless of size or location, and thus significantly reduce development timescales. III.

A RCHITECTURE

A. OpenCache Considering the objectives described in the previous section, we present the OpenCache architecture. This consists of a number of distinct layers, separated by concern. The architecture is illustrated in Figure 2, and described in detail in the remainder of this section. At the top of this architecture is the application layer, which is where any applications that leverage OpenCache will be located. Examples of the type of application that may be present here are demonstrated in more detail in Section V. These applications interact with the underlying deployment solely through the OpenCache controller, and implement various behaviours using this API, defined entirely within the application itself. This ensures a separation between OpenCache and any supplementary functionality. Primarily, it allows users to define their own cache behaviour, without the necessity to build their own bespoke software or manage the entire deployment.

Located underneath the application layer is the control layer. This where the main control elements of the architecture are found. This includes the OpenCache controller, responsible for connected cache nodes. Furthermore, it also includes the controllers for companion elements, namely the SDN controller and the virtual infrastructure manager. These facilitate the control of the network and virtualised computing resources, respectively. Further down the hierarchy is the the service layer; this is where the core functionality of OpenCache lies. It typically consists of a number of services, running across a multiple cache nodes. These nodes can be running on dedicated physical resources, or virtualised resources managed by the aforementioned virtual infrastructure manager. An OpenCache node can host multiple services, each of which is responsible for serving a unique set of content. This is important to the operation of OpenCache, as it allows a single node to host the services of multiple tenants simultaneously. Furthermore, it diverges from the current situation whereby a multitude of individual services and devices may be found in a network. As each service will have a unique configuration and control method, this significantly increases operational complexity and expenditure. OpenCache reduces this by harmonising these functionalities for any number of running services. Finally, the network layer is used to provide the functionality necessary to redirect requests for content to OpenCache. The OpenCache architecture achieves this using SDN technology to modify the forwarding plane as necessary. B. ETSI NFV Framework The ETSI NFV initiative aims to fundamentally change the way networks are deployed and operated. OpenCache is applied research where the application, in our case content delivery, has been evolved to support the ETSI NFV framework, providing an open-source experimental platform for development and testing of content delivery platforms within the architecture view of ETSI and future NFV-based networks. The NFV architectural framework, described in Figure 3, identifies functional components and reference points between them. These are adapted for OpenCache and their role and relationships detailed below: •

Orchestrator



Operational Support System (OSS



Virtualised Network Function (VNF)



NFV Infrastructure (NFVI) (hardware resources and virtualisation layer)



Virtualised Infrastructure Manager (Vim)



Virtualised Network Function VNF Manager (Vnfm

These functional components and functions are mapped into interfaces within the NFV architectural framework: •

Os-Ma: interface to OSS and handles network service lifecycle management and other functions



Vn-Nf: represents the execution environment provided by the Vim to a VNF (e.g. a single VNF could have multiple VMs)

behaviour also allows cache operators to supplement decisions with additional information that may be unique to them. For example, an operator may have a Quality-of-Experience monitor in their network, and choose to move content closer to the user to improve the experience. In addition, the operator may have previous experience and knowledge that they want to use in their caching decisions; they may know that a piece of content is always popular in particular time period. The flexibility to make these decisions is what OpenCache provides. IV.

Fig. 3: OpenCache mapped to an adapted ETSI NFV Reference Architectural Framework •

Nf-Vi: interface to the Vim and used for VM lifecycle management



Ve-Vnfm: interface between VNF and Vnfm and handles VNF set-up and tear-down



Vi-Ha: an interface between the virtualisation layer (e.g. hypervisor for hardware compute servers) and hardware resources

The OpenCache adoption of the ETSI NFV principles, functional architecture and interfaces would provide the basis for a future ETSI NFV proof-on-concept proposal [10] for vCDN applications. C. Impact Adoption of OpenCache could also fundamentally change how existing CDNs operate. Rather than offering the underlying hardware combined with a service (and all the decisions that accompany that), CDNs could specialise in optimising and deploying nodes in a more diverse set of networks. The logic behind where the content is placed, how it served, what exactly is served, etc., would be left for the customer to decide. To this end, caching simply becomes a service in the network: Caching as a Service (CaaS) [14]. Scaling of services and resource consumption is critical in allowing OpenCache, as a virtual network function, to coexist with other similar service in the network. By freeing this allocation, it allows other services to utilise the now unused resources to better effect. For example, a resource intensive function that is not necessarily time-sensitive, such as data processing, may use the resources freed by a OpenCache scaleback during off-peak hours, when content consumption is low. It is important to note that the above mentioned functionalities, which are semantically built on-top of OpenCache, are possibilities due to OpenCache API. The architecture itself does not define any intrinsic behaviour; an operator using the API defines these. This is important in the creation of tailored policies and unique functionalities, particularly in response to events such as flash-crowds and to implement energysaving measures. Separating this functionality from the cache

A PPLICATION P ROGRAMMING I NTERFACE

In Section III, we identified the need for programmable control of network functions. In this section, we present the OpenCache API, capable of controlling the behaviour of a virtualised cache deployment. This is achieved through the external API, outlined in Table I, and described in more detail in Section IV-A. As the OpenCache architecture proposes a controller-node arrangement for a cache deployment, there is the necessity for some additional internal synchronisation and state-sharing methods, which are outlined in Table II, and described in more detail in Section IV-B. To compliment these APIs, and enable the creation of powerful and flexible applications, a number of additional method calls are presented. These include detailed statistical reporting facilities (Section IV-A3), programmable call-backs which enable reactive behaviours (Section IV-A4), and functionalities built specifically for a virtualised environment (Section IV-A5). All of the APIs described in this section use JSONRPC v2 [15]. As such, all calls are bidirectional and follow the same format. A call will include a method field; in the case of the OpenCache API, this will be the name of the method to be called. Furthermore, a call will also contain a set of parameters to be used in the method call. In OpenCache, these parameters are expr and node. OpenCache will respond to a call with another JSON-RPC message containing the result field. In most cases, this will be a boolean representing the success of an operation, but it may also contain requested information, such as statistics. The expr field will contain OpenCache expressions, a concept unique to the platform. In most cases, an expression is analogous to a single service, serving content located at a single location. However, the expression can also be used to describe additional functionality. For example, the expr field can also contain a list of expressions, allowing calls to effect multiple services simultaneously. The same syntax is also used for the node field, which enables functionality such as wildcarding. This would make the same method call across all connected nodes. A. External API The external API is a public facing interface consisting of a number of methods. These are described in Table I. Importantly, the table also defines the direction of initiation for a call, with A → C denoting an application-to-controller initiation, and C → A denoting a controller to application initiation (a callback). In the following section we describe the functionality of these methods. Many of the methods are simply manifestations of cache primitives which we have identified and provided to an application. However, the API

Method start stop pause move fetch seed refresh stat

Params { ("expr" : ), ("node-id" : ) } { ("expr" : ), ("node-id" : ) } { ("expr" : ), ("node-id" : ) } { ("expr" : ), ("from" : , "to" : ) } { ("url" : ), ("node-id" : ) } { ("expr" : ), ("node-id" : ) } { ("expr" : ), ("node-id" : ) } { ("expr" : ), ("node-id" : ) }

register

{ ("expr" : ), ("node-id" : ), ("metric" : ), ("threshold" : )} { ("expr" : ) } { ("node-id" : )} { ("expr" : ), ("node-id" : ), ("metric" : ), ("value" : ) }

create destroy alert

Result [, ... ]

Direction A→C A→C A→C A→C A→C A→C A→C A→C

("node-id" : )

A→C A→C C→A

A→C

TABLE I: External API also includes novel functionality, non-existent in current cache applications. 1) Cache Primitives: The start method is used to start a service on a particular node or nodes. This service will be serving the content matching the expression (expr) given during its instantiation. From the perspective of the OpenCache controller, once a start command is issued, the internal API will be used to start the service on the required set nodes. The controller will also simultaneously communicate with the network controller to modify the forwarding layer of the network. This will ensure traffic matching the expression is redirected to the appropriate cache nodes, who now have the service running, ready to accept requests. The stop method achieves the opposite effect to the start method: it will stop the service on the nodes specified. In the case of stop, the content that was previously stored for serving responses that matched that expression will be completely removed from the node. The resources once consumed by the service, whether this be compute, memory, storage or network, will now be relinquished and freed up on the node. This functionality will again use the internal API to stop the services running on the specified nodes. The OpenCache controller will also remove the forwarding modifications that were previously used to provide the redirection for this service. The pause method is very similar to the stop method. However, the main difference between the two methods is that in the case of pause, the content objects used to serve user requests still remain on the node. When a service is paused, the compute, memory and network resources will be freed, but the storage usage remains. The pause method will also instruct the OpenCache controller to again remove the forwarding behaviour implemented to redirect requests to the cache. In order to restart the service, and bring it from a paused to a started state, the start command will be reissued with identical parameters. This has the advantage that the service will restart with a all of the previously fetched content available, rather than starting from an empty object cache. 2) Advanced Features: The move method enables interaction with the network without the application requiring knowledge of the specifics of the forwarding plane. In essence, it facilitates the modification of existing OpenCache-specific flow rules to migrate the direction of requests from one OpenCache node to another. In use, this allows applications to implement resiliency behaviours that require dynamic and timely changes to the network layer, examples of which can be found in Section V.

The fetch method is another interesting piece of behaviour possible through the API. It allows a service to proactively fetch content and store it in the cache, ready to be served in response to a client request. Importantly, this is done without it ever being requested by a client. This can be used in situations where an operator has advanced knowledge that a content object will be requested frequently. By placing the object in the cache, they can serve the content immediately without having to request the content from the origin server (a cache-miss). This is particularly important for large objects that will take a considerable amount of time to fetch. The seed method is used primarily in cases where identical content is located at multiple locations. By defining a list of equivalent expressions in the method call parameters, requests for content matching any of the expressions will result in the same content being served. The seed function can significantly reduce instances of duplication in the cache object store, thus maximising the storage efficiency. Furthermore, seed can also be used to serve items of content that are not bit-identical, but equivalent in representation. For example, a lower or higher quality version of the same image can be served from the cache, regardless of quality requested by the client. This functionality is achieved using both the cache implementation and the forwarding plane, where multiple flow- modifications will direct traffic to the same service. 3) Reporting: There are two functions associated with statistical reporting: the stat function is used to retrieve statistics for a specific set of nodes and/or services. This ability to tailor the query is important to give applications using the API fine-grained control over the information they receive in response. In addition to this, the refresh method is often used in conjunction with the aforementioned stat method. The OpenCache controller keeps a cached copy of statistics; refresh will update these immediately by requesting an update from each node/service. The specifics of the statistics returned in a stat call are covered in more detail in Section IV-C. 4) Call-backs: The OpenCache API also has an alert and call-back subsystem. This allows an application using the API to register for certain events using the register call. In OpenCache, events will be triggered when a service goes above a certain threshold of requests per second, or bytes of storage used. When one of these is exceeded, the node will send an alert to the controller, which will forward to this to any applications who have registered interest in it. This is realised through an alert call, from the controller, back to the application, and containing details of the metric and

Method start stop pause fetch seed refresh register hello goodbye keep-alive alert

Params { ("expr" : ) } { ("expr" : ) } { ("expr" : ) } { ("url" : ) } { ("expr" : ) } { ("expr" : ) } { ("expr" : ), ("metric" : ), ("threshold" : ) } { ("host" : , "port" : ) } { ("node-id" : ) } { ("expr" : ), ("node-id" : ), ("stats" : [, ... ]) } { ("expr" : ), ("node-id" : ), ("metric" : ), ("value" : ) }

Result

Direction C→N C→N C→N C→N C→N C→N C→N

"node-id" :

N→C N→C N→C



N→C

TABLE II: Internal API the amount by which it has been exceeded. These thresholds are configurable using the register call, and can also be modified during runtime. 5) Virtualization: The create function can only be used when an OpenCache instance has connectivity to cloud platform or hypervisor. This method will instantiate a new virtual machine, running as an OpenCache node, and start the services optionally defined in the method parameters. Similarly, the destroy method will remove this virtual machine instance, and all of the services running upon it. Evidently, destroy can only be called on nodes that are instantiated through the OpenCache Controller and API. B. Internal API For the most part, the OpenCache internal API is the same as the external API (see Table II). Typically, a node will only see calls destined for itself and as such, the node-id field will not be present in a controller-node (C → N) method call. In addition to these familiar methods, the internal API also includes a number of unique methods that are necessary for an OpenCache deployment to work cohesively. These are constrained entirely to communication from the node to the controller (N → C), and are described in more detail in the following section. The hello method is called on the OpenCache controller by a node that has just joined the deployment. A hello message will notify the controller that a new node is available for use, and will contain the host and port number to which the controller will use in future communication with the node. In response to this message, the controller will allocate the node a unique ID number from a pool. Inversely, the goodbye method is used when a node leaves the OpenCache deployment. This alerts the controller to the fact that this node is no longer available to start new services on, and that all the existing service that were running on it can be considered stopped. In contrast to goodbye, there are situations where the node will not be able to notify the OpenCache controller that it is no longer operational. For example, the node or its underlying hardware may have to submit to a forced shutdown. So that the OpenCache controller can maintain the availability of nodes regardless of their participation, a keep-alive message is used. This message is sent from the node to the controller at a configurable interval. If the OpenCache controller does not observe a message with a pre-defined set of time, it assumes that the node is offline, and removes it from the list of available nodes. The keep-alive message also

has an additional function: updated statistics are appended to the message, which allows the controller to update its cached version, ready to serve stat requests from the external API. The keep-alive is sent from each running service, rather than from the node itself. This allows the controller to track the status and health of individual services, rather than a more general view of just the node itself. C. Statistics As mentioned in previous sections, the OpenCache controller keeps a cache of statistics locally in order to serve stat requests more promptly. These statistics can be leveraged by application developers to create reactive applications that adjust to cache condition and state. The keep-alive message is used by each service to update the local cache of statistics at periodic intervals. This frequency is configurable depending on the deployment scenario. The choice to append the statistics to a keep-alive message was made in order to intentional reduce the amount of messaging overhead generated by an OpenCache deployment. If a developer requires results more frequently than the prescribed update interval, then the refresh command can be used beforehand to preemptively fetch fresh statistics. The stat command will return a number of useful metrics back to the developer. These includes the state of a service, the number of cache hits and misses a service has encountered and the accompanying size of these events. The stat response will also include a count of all the objects stored for that service, along with their size. Importantly, this response will also provide information as to how many services and nodes are present in the returned results, including the specific tuples (node and service) seen. The OpenCache-specific statistics are also combined with information from both the SDN controller and the infrastructure manager. For example, the OpenCache API can also be to return information in the stat response which includes network topology and device attachment points. Likewise, the OpenCache API can also be used to provide information ascertained through the infrastructure manager, such as resource availability and performance metrics. It is important to note the lack of a common API for interacting with an SDN controller. Without standardisation of a Northbound Interface protocol, OpenCache requires a separate module implementation for each different controller it wants to communicate with. A similar situation exists for the variety of infrastructure managers. Although work is moving

Load Balancer

OpenCache Controller

Virtualisation Controller

OpenCache Node #1

SDN Controller

OpenFlow Switch

stat

checkOverloaded

identifyCandidate

create

createNewInstance

OpenCache Node #2

create hello assignID

start start startService

move modifyForwarding

modifyForwarding

stop stop stopService

Fig. 4: Load Balancing Message Flow towards achieving some commonality [17], at present a generic solution is not possible. V.

E XAMPLE A PPLICATIONS

In order to effectively demonstrate the abilities of the previously described API, we built two distinct example applications. Although both offer increased resilience to a cache deployment, the load balancing application, described in Section V-A, is concerned with pre-emptive avoidance of failure, achieved through effectively managing the load on different OpenCache nodes. The failover monitor described in Section V-B differs in that it is designed to detect unscheduled and unpredictable downtime in the architecture. Both applications have their own distinct logic, which operates independently to OpenCache itself, and relies on the API for interaction. It is important to note that load balancing and failover behaviours would often be individual hardware appliances in their own right. However, when implemented using the OpenCache API, these functions can also be virtualised, similar to the caching node itself. As these applications are built entirely in software, there is the possibility for detailed customisation and optimisation, dependent on operator requirements. This could include utilising input from other sources, such as Network Management Systems (NMSs) or Quality of Experience (QoE) measurement frameworks. A. Load Balancer This application effectively load balances requests for content between different OpenCache nodes. For demonstration purposes, we will use the information ascertained from the statistics provided through OpenCache to determine when a node or individual service is deemed to be overloaded. This is only one potential method to determine the load on a service, and could also be supplemented with metrics ascertained through other means, as mentioned previously. The process and message flow followed by our load balancer is described in Figure 4. The method calls shown

in italic are part of the OpenCache API, whereas the other calls are outside of the scope (and thus differ dependent on the controller or application used). In the first instance, the load balancer will request information about specific nodes it is monitoring (using the stat command). These statistics are returned back to the application, which will then analyse them to determine if any nodes are overloaded. If any are deemed to be currently consuming resources over the configured threshold, the application will seek to find a suitable candidate to move the load to. In our virtualised scenario, node #1 is designated as overloaded. As this is the only node present in the OpenCache deployment, there will be no suitable candidate found as the target for migration. As a result of this, the application will create a new node on which the service will be migrate to. This will be achieved by the application sending a create command to the OpenCache controller, which will negotiate with the virtual infrastructure manager to bring a new OpenCache node online. Once this process is complete, the load balancer will start an identical service running on the new node, node #2. This is done in anticipation of the load being migrated between the two nodes. Once the service is started and ready to handle requests, the application will use the move command to modify the forwarding plane and change the destination of the redirected requests for content. At the completion of this process, node #2 will handle all the requests that were previously destined for the node #1. Consequently, the application will stop the existing service on node #1, and thus free up the resources previously consumed. B. Failover Monitor The unavailability of content will undoubtedly negatively impact the Quality of Experience for a user requesting from the cache. When it is considered that these caches are often provided as a service, an interruption will typically constitute a breach of a Service Level Agreement (SLA). As such, we demonstrate the ability of an application using the OpenCache API, to not only detect failure, but also react and remedy the situation quickly and effectively. The message flow for this process is similar to that of load balancing, and so is not included for brevity. However, the application logic is slightly altered. Instead of detecting capacity (and consumption thereof), we are rather detecting the availability and uptime of a node. This is achieved by periodically polling the service in order to elicit a response. If the service does not respond, the service is deemed failed and offline. Once failure is detected, the same flow (as with the load balancer) continues: the failover monitor will seek to migrate this offline service to an alternative node. If no existing nodes are available, a new virtualised node will be created. In the same way as the load balancer, the forwarding layer will be modified to match the current location and availability of services. C. Experimental Results In order to evaluate the suitability of the API, we utilised resources within the Fed4FIRE [11] testbed. The experiment involved a number of distinct facilities, interconnected together to create a large-scale topology. In one facility, we located our

The experimentation examined the impact of the two applications on QoE from the perspective of the client. In the case of the load balancer, we wanted to ensure that the load balancing process, and thus the migration of a service between two nodes, had minimal impact on the client. Five experimental runs where performed, each using a Scootplayer client to download the necessary content. Scootplayer also monitored a number of important QoE metrics, such as startup delay, buffer occupancy, average bitrate, and others. The impact of load balancing can be clearly seen in Figure 5, where the vertical ˜ seconds into line denotes the load balancing operation at 13 playback. At this point in time, a reduction in the amount of content buffered on the client is observed. The buffer, which holds a maximum of 60 seconds worth of playback, temporarily reduces to 58 seconds. As we used 2 second segment lengths in our playback, this is equivalent to one chunk in the buffer. This reduction, and the subsequent recovery, can attributed to the modification of the forwarding plane during a request. More specifically, the modification to the forwarding plane (necessary to implement the load balancing) will break the existing connection between the client and the cache. This will cause an application-layer request retry in Scootplayer. As the player is still consuming content (playing back), the fill of the buffer will be reduced momentarily. However, once the client re-establishes the connection, it will download the necessary content and refill the buffer back to the maximum 60 seconds. As the load balancing has taken place, it will now be downloading the content from the new cache node, rather than the overloaded one. It is important to note that although we only show buffer occupancy in this figure, other metrics were otherwise unaffected. We ascertained this fact from a a baseline experiment (also shown), without the load balancing application present. In the case of the failover monitor, we wanted to observe the impact of the time taken to respond to a failure, and how this may effect the client in a similar way to the load balancer. It became clear through our experimentation that the time taken to respond to a node failure is dependent on the resolution of detection. In the case of our example application, failure is identified through periodically polling the service to detect reachability. If the service does not respond, it can be assumed that the service is offline. In our experimentation, we examine a number of different polling frequencies at 1, 5 and 10 second intervals. These are shown in Figures 6, 7 and 8 respectively. As before, 5 experimental runs were performed. This was repeated for each of the three resolutions. Similarly, a baseline experiment was undertaken for each resolution to ensure that the application had no impact on either the buffer occupancy or other QoE metrics.

60 With Load Balancer Baseline Load Balance

Time Buffer (playback/s)

50

40

30

20

10 0

10

20

30 40 Time elapsed (seconds)

50

60

Fig. 5: Client Buffer During Load Balancing 60 With Failover Monitor Baseline Failure Recovery 50

Time Buffer (playback/s)

clients; each hosting an instance of Scootplayer [3], a fullyinstrumented HTTP Adaptive Streaming (HAS) video player. These clients then requested content from a server running in one of the other facilities. We then deployed the OpenCache prototype [2]: a functioning implementation of the OpenCache architecture. Previously used to demonstrate the ability to cache using SDN technologies [4], the prototype enabled the realisation of the previously discussed OpenCache API, and provided the basis to evaluate the aforementioned applications. Importantly for our experiment, the testbed provided the necessary SDN capabilities required for OpenCache to operate.

40

30

20

10 0

10

20

30 40 Time elapsed (seconds)

50

60

Fig. 6: Client Buffer During Failover with 1s Resolution In these figures, the first vertical line dictates when the initial node fails. The second vertical line indicates when the application detects the failure and remedies the situation by moving the requests across to a functioning cache node. It is evident that once failure occurs, the client continues to consume content from the buffer, reducing its size. However, as new content cannot be retrieved, the buffer becomes significantly depleted. The greater this depletion becomes, the longer it will take for the client to recover back to a fullybuffered state, as evidenced in 8. During this period of time, the cache node itself will be under heavier load (more requests per second) and the client more susceptible to further interruptions. This buffer depletion continues to occur until detection takes places and appropriate actions are taken by the fail-over monitor. The amount of buffer depletion a client encounters is strongly linked to the detection resolution; a larger polling interval will result in the service remaining in a failed state for longer. The impact on the client is that it cannot retrieve new content, and moves closer to the buffer becoming empty. At this point playback will stop. An operator must therefore consider client requirements and resources before establishing a suitable value for polling. This will be driven by the amount of buffer their customer’s playback clients can accommodate. The video playback use-case we have used in

effectively to both excess load and outright failure, when deploying and operating vCDNs.

60 With Failover Monitor Baseline Failure Recovery

VII.

Time Buffer (playback/s)

50

F UTURE W ORK

It is envisaged that work will continue on the OpenCache prototype, enabling further research into content delivery architectures. Moreover, the research will continue to embrace ETSI NFV objectives, functional components and interfaces.

40

30

Future effort will also concentrate on the OpenCache API, and developing it into a candidate for standardising the interface between virtualised CDN infrastructures and applications.

20

R EFERENCES 10 0

10

20

30 40 Time elapsed (seconds)

50

60

[1]

Fig. 7: Client Buffer During Failover with 5s Resolution [2] 60

[3]

With Failover Monitor Baseline Failure Recovery

Time Buffer (playback/s)

50

[4]

40

[5] 30

[6]

20

[7] [8] 10 0

10

20

30 40 Time elapsed (seconds)

50

60

Fig. 8: Client Buffer During Failover with 10s Resolution this work is not the only possible permutation, and many applications running over the Internet do not necessarily have a buffer at all. As a result, such applications have no inherent ability to deal with unavailability of a service. As latency and failure can be potentially crippling to a service [6], it would be recommended that in these cases, a resolution interval should be set at the highest possible frequency without incurring a significant messaging overhead that would further impede the delivery of the service. VI.

C ONCLUSION

In this work, we designed, developed and deployed OpenCache using key design requirements for a virtualised content delivery network (vCDN) platform, leveraging existing SDN research and embracing industrial demand for virtualising network functions. These principles directly impacted the OpenCache architecture, and enabled its use and manipulation within virtualised environments. A key facet of this architecture is the programmatic control of function, and demonstrates the potential of a full featured OpenCache API. Using a prototype implementation of the architecture in conjunction with a large-scale testbed, we demonstrated the flexibility and power offered to an application using this API. In particular, we highlight how vital resiliency logic can react quickly and

[9] [10] [11] [12] [13] [14]

[15] [16] [17] [18]

[19]

P. Aranda, D. King, and M. Fukushima. Virtualization of Content Distribution Network Use Case. Internet-Draft draft-aranda-vnfpoolcdn-use-case-00, IETF Secretariat, October 2014. http://www.ietf.org/ internet-drafts/draft-aranda-vnfpool-cdn-use-case-00.txt. M. Broadbent. OpenCache; An Experimental Caching Platform . https: //github.com/broadbent/opencache. M. Broadbent. Scootplayer; An Experimental MPEG-DASH Request Engine with Support for Accurate Logging. https://github.com/ broadbent/scootplayer. M. Broadbent, P. Georgopoulos, V. Kotronis, B. Plattner, and N. J. P. Race. OpenCache: Leveraging SDN to Demonstrate a Customisable and Configurable Cache. In 2014 Proceedings IEEE INFOCOM Workshops, Toronto, ON, Canada, April 27 - May 2, 2014, pages 151–152, 2014. A. Chankhunthod, P. B. Danzig, C. Neerdaels, M. F. Schwartz, and K. J. Worrell. A Hierarchical Internet Object Cache. Technical report, DTIC Document, 1995. S. Egger, T. Hossfeld, R. Schatz, and M. Fiedler. Waiting times in Quality of Experience for Web Based Services. In Quality of Multimedia Experience (QoMEX), 2012 Fourth International Workshop on, pages 86–96. IEEE, 2012. ETSI GS NFV 001. Network Functions Virtualization (NFV); Use Cases, 2013. ETSI GS NFV 001. Network Functions Virtualization (NFV); Resiliency Requirements, 2015. ETSI GS NFV 002. Network Functions Virtualization (NFV); Architectural Framework, 2014. ETSI GS NFV PER002. Network Functions Virtualization (NFV); Proof of Concepts Framework, 2014. Fed4FIRE Consortium. Fed4FIRE Project Information. http://www. fed4fire.eu/. B. Frank, I. Poese, Y. Lin, G. Smaragdakis, A. Feldmann, B. Maggs, J. Rake, S. Uhlig, and R. Weber. Pushing CDN-ISP Collaboration to the Limit. SIGCOMM Comput. Commun. Rev., 43(3):34–44, July 2013. M. J. Freedman. Experiences with CoralCDN: A Five-Year Operational View. In NSDI, pages 95–110, 2010. P. Georgopoulos, M. Broadbent, B. Plattner, and N. Race. Cache as a Service: Leveraging SDN to Efficiently and Transparently Support Video-on-demand on the Last Mile. In Computer Communication and Networks (ICCCN), 2014 23rd International Conference on, pages 1–9. IEEE, 2014. JSON-RPC Working Group. JSON-RPC 2.0 Specification, 2012. B. Molina Moreno, C. Palau Salvador, M. Esteve Domingo, I. Alonso Pe˜na, and V. Ruiz Extremera. On Content Delivery Network Implementation. Computer Communications, 29(12):2396–2412, 2006. Open Networking Foundation. North Bound Interface Working Group (NBI-WG) Charter, 2013. M. Z. Shafiq, A. X. Liu, and A. R. Khakpour. Revisiting Caching in Content Delivery Networks. In The 2014 ACM International Conference on Measurement and Modeling of Computer Systems, pages 567–568. ACM, 2014. D. Wessels. The Squid Internet Object Cache, 1998.

Suggest Documents