Service Control Point Overload Rules to Protect Intelligent Network ...

2 downloads 0 Views 160KB Size Report
Service Control Point Overload Rules to Protect Intelligent Network Services. Bruce S. Northcote, Member, IEEE, and Donald E. Smith. Abstract—This paper ...
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

71

Service Control Point Overload Rules to Protect Intelligent Network Services Bruce S. Northcote, Member, IEEE, and Donald E. Smith

Abstract— This paper analyzes implementations of automatic code gapping (ACG) in intelligent network service control points (SCP’s). The objective is to identify ACG controls that provide high call throughput while protecting the network from overloads. To achieve this we emphasize the need for careful engineering of vendors’ control implementations (for example, the proper choice and use of internal SCP measurements) within the constraints of present ACG requirements. We use simulation and find that algorithms that use a combination of processor utilization measurements and measurements of excessive response times can yield reasonable call throughput. Shortening measurement intervals to speed up the rate at which the SCP can change the rate at which traffic arrives to it further improves call throughput. Finally, making the sizes of the rate changes smaller refines the granularity of the control and leads to better results. Index Terms— Intelligent networks, overload control, service control points.

I. INTRODUCTION

S

ERVICE control points (SCP’s) are databases that contain the data and service logic that many telecommunications network services require. “800” or “freephone” number translation is an early example of an SCP service. A host of new or proposed services, including voice activated dialing, calling name delivery and other screening features, automated telephone polling, and personal communications services (PCS’s), will turn SCP’s into the cornerstone of the intelligent network. Since SCP’s will serve large populations, it is vital that SCP’s have effective mechanisms for handling overloads that they are likely to experience from time to time. In most deployments worldwide, the intelligent network shares resources with the common channel signaling (CCS) network and uses signaling system number 7 (SS7, defined in [5]) as its communications protocol. SS7 specifies congestion control procedures that should be applied when an overload is detected in particular protocol layers of the network. As a CCS node, an SCP must apply these procedures. For example, [10] analyzes intelligent network service performance in the presence of SS7 link congestion.

Manuscript received October 7, 1996; revised September 18, 1997; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor P. O’Reilly. B. S. Northcote was with Bellcore, Red Bank, NJ 07701 USA. He is now with Fujitsu Nexion, Acton, MA 01720 USA (e-mail: [email protected]). D. E. Smith was with Bellcore, Red Bank, NJ 07701 USA. He is now with GTE Laboratories, Waltham, MA 02254 USA (e-mail: [email protected]). Publisher Item Identifier S 1063-6692(98)01655-0.

It has long been recognized that there also should be an overload control, specific to the intelligent network, that relieves SCP processor congestion. Various incarnations of rate control procedures have been proposed and studied. See, for example, [6]–[9], [11], and [13]. Automatic code gapping (ACG) is a table-driven scheme, defined in Bellcore generic requirements [1], [3], [4], that has been chosen as the North American SCP overload control. ACG consists of two logical parts: 1) algorithms in the SCP for converting internal measurements into numerical overload levels; 2) procedures (usually in service switching points (SSP’s), the switches capable of common channel signaling) for gapping traffic on the basis of overload levels. Under code gapping, a source applies a rate control of the form “send at most one new call per seconds,” where is called the gap interval. The Bellcore generic requirements [1], [3], [4] specify the values of that are currently implemented by equipment vendors, the measurements that should be taken to facilitate control decisions, the frequency with which the measurements should be taken, and the durations for which controls should be active. In addition, Bellcore has introduced an expanded set of values for gap interval and duration in [2] and recommends that, in future, the expanded set be implemented by vendors. Smith [12] analyzed both static and adaptive algorithms for determining gap intervals assuming a fixed set of overload rules in the SCP. This paper complements [12] by shifting the focus onto SCP actions. We address which internal measurements in [4] should be used to define SCP congestion and how to use them to generate integer overload levels. We also discuss how to map each overload level into a pair of integers describing the gap interval and the duration of time that each SSP should gap initial queries to the SCP, emphasizing how to improve ACG implementations without further adding to existing Bellcore recommendations. Hence, we are constrained to select gap intervals from prescribed tables of values [2], to map SCP measurements into gap intervals by comparing the measurements to fixed thresholds, and to update the gap intervals on a much longer time scale than a message processing time. We recommend careful reengineering of existing control strategies, rather than rewriting ACG requirements that already provide great flexibility in designing overload controls. This paper demonstrates how, within the constraints of [1]–[4], it is possible to design ACG overload control implementations that improve upon some of the standard implemen-

1063–6692/98$10.00  1998 IEEE

72

tations of ACG that are deployed in today’s networks. Careful engineering of overload controls is essential. A. Overload Control Objectives Our primary objective for an SCP overload control is to maximize network call throughput. We emphasize call throughput, not message throughput, as our objective is to protect intelligent network applications and services. Ideally, when the load offered to the SCP is less than the SCP’s capacity (which requires definition; see Section IV), the SCP should process successfully the entire offered load and, when the load exceeds the SCP’s capacity, the load handled should equal the SCP’s capacity. Our approach to satisfying this objective is to try to find gap interval values that match the rate of traffic entering the SCP to the SCP’s capacity. Moreover, we try to find the right values as quickly as possible. Another objective concerns service interaction. Different SCP application processes, which we will call “subsystems” [4], may handle different services (e.g., Advanced Intelligent Network (AIN) 0,1 AIN 0.1,2 and PCS). Calls to separate subsystems are distinguished by a subsystem number (SSN) for routing within an AIN SCP. The separation of distinct service classes into different AIN subsystems promotes a modular SCP architecture that is favored by vendors. A good ACG implementation will be capable of discriminating between AIN subsystems that are deployed on the same SCP. Therefore, we require the ability to detect and control overloads to different subsystems independently. For example, if only AIN 0 services are overloaded, then traffic to other SCP subsystems should be relatively unaffected (i.e., should be protected by “firewalls”). On the other hand, if two subsystems each generate excessive loads, then the ACG implementation should be capable of detecting this, and both subsystems should be actively controlled. B. Summary of Results Within the framework of the constraints above, we present methods to improve SCP performance during overload by better matching source traffic rates to the SCP capacity. Some solutions refine the granularity of ACG controls. One step is to add smaller gap interval values to the allowed set [2]. Another is to apply controls to smaller populations via more narrowly defined identifiers. Other solutions increase the speed with which the controls take effect. For example, the SCP can recalculate which gap intervals to use more often. We show the improvements that result from these steps in several scenarios that represent (in the essential parameters) realistic networks. 1 AIN 0, which began technical trials during 1991, was implemented in a number of different ways as requirements were established separately by individual companies based on suppliers’ specifications of private virtual networks (PVN’s). An example of AIN 0 service is Area Wide Centrex Extension Dialing (similar to a PVN service). 2 AIN 0.1 is an incremental step toward realizing AIN Release 1, a longterm industry goal, with clearly defined originating and terminating basic call models (BCM’s). Subsequent AIN releases (AIN 0.2 and beyond) will be based on supersets of the AIN 0.1 BCM’s. Most AIN service providers now have AIN 0.1 deployed, allowing for more sophisticated applications than PVN’s. AIN development continues. For example, some PCS architectures will rely on AIN 0.2 functionality, which is the latest AIN release.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

Next we introduce subsystem-specific controls to improve fairness while retaining the gains above. The use of overall SCP level controls can result in inequities according to the size of sources [12]. We show that controls based on subsystem utilization can give two competing services roughly equal shares of SCP capacity. However, the selection of fixed thresholds for subsystem-specific controls can be difficult. This leads us to supplement the utilization-based rule with a rule triggered by messages encountering excessive delay. The last rule also accelerates the SCP’s search for suitable gap interval values. The paper is organized as follows. Section II describes the ACG mechanism and the constraints under which we operate. Section III describes the simulation model and Section IV our results. Section V concludes the paper. II. SCP OVERLOAD CONTROL This section provides an overview of the SCP overload control mechanism. The mechanism has three major components: measurements for overload detection, rules for converting measurements into overload severity levels, and code gapping at traffic sources based on severity. Within this framework, we distinguish between constraints (aspects that we cannot change) and elements we can modify to improve the effectiveness of the control. A. Available Measurements The measurements that can be used to define overload [4] are of two types: subsystem-specific and overall. Overall refers to the entire SCP, whereas subsystem refers to a particular portion of the SCP code that is executed when processing calls for a particular set of services. For example, there may be separate subsystems within the same SCP to process AIN 0 and AIN 0.1 calls. The measurements occur over intervals of fixed length which we call “measurement intervals.” Requirements [4] specify only that the measurement interval should be at most five minutes. Later, we explore different values of the interval to assess the benefits of more frequent updates. The measurements and their types are as follows. • Dropped messages (subsystem-specific): A subsystem will drop (discard without processing) any message that has been in the SCP for more than a certain amount of time (default in this paper: one second) before the subsystem begins working on it. Each subsystem counts the number of messages it drops during a measurement interval. • Average response time (subsystem-specific): The response time of a message is the time from when the message enters the SCP until the subsystem finishes working on the message. The average response time is obtained by averaging the response times of all messages to a subsystem over a measurement interval. • Incoming messages (subsystem-specific): Each subsystem counts the number of messages arriving to it during each measurement interval • Overall utilization is the fraction of time the processor is not idle, i.e., the fraction of time it is working on pro-

NORTHCOTE AND SMITH: SERVICE CONTROL POINT OVERLOAD RULES

grams (which may include call processing, maintenance, scheduling, and other overhead processes). Utilization is computed in successive measurement intervals. • Subsystem utilization is the fraction of time the processor is executing sections of code that are subsystem specific. This does not include any processing required for interprocess communication (the transfer of messages between buffers in the SCP). B. Overload Rules SCP’s must assess regularly their overall and subsystem congestion status. To achieve this, an SCP refers to a set of overload rules. An overload rule is a function that determines from a given set of measurements an overload level. An overload level is an integer that increases with the severity of the overload. Separate overload level counters are maintained for the overall SCP and for each individual subsystem of the SCP. The SCP may have more than one overload rule. We study several threshold-based rules that compute changes in overload levels from measurements. Overload rules are assessed whenever new measurements are taken and each rule acts independently on its overload levels. Each overload level is constrained to lie between zero and some maximum value; we omit mention of this below for brevity. The overall utilization rule we consider increases the overall overload level when the utilization gets too high and decreases the overall overload level when the utilization decreases acceptably. In algorithmic form, we have the overall utilization rule, shown at the bottom of the page. In the rule, lower threshold < upper threshold and the target utilization lies between the two thresholds. When overall utilization lies between the two thresholds, the utilization is deemed to be close enough to the target utilization so that no change is necessary. In the subsystem utilization rule, shown at the bottom of the page, subsystem threshold represents the maximum proportion of time

73

that the SCP processor should be spending on executing tasks for a given subsystem. It may vary between subsystems. We do not use a rule based on incoming message counts. A threshold-based rule would, in effect, require a static translation of counts of incoming messages into processor load. Such a translation is difficult in the intelligent network environment because SCP processor real times associated with different intelligent network messages can vary by up to two orders of magnitude. This contrasts sharply with freephone services, which have relatively constant real time costs per message. In freephone applications, incoming messages is a good predictor of SCP load. We use dropped messages instead of average response time because avoiding the dropping of messages is a clear performance objective, whereas target average response times are harder to specify. The dropped messages rule takes the form shown at the bottom of the page, where dropped messages is the number of messages that the relevant subsystem drops in a measurement interval. In practice, the dropped messages threshold of zero should be replaced by a positive value to prevent the triggering of overload controls in the event of an erroneous count of dropped messages. Because the SCP measurements affect only the increments in the overload level, the overload rules above are search algorithms. The SCP repeatedly adjusts the overload level until the incoming load no longer violates the threshold conditions in the rules. The specification above does not include the search initial values. In Section IV we set the initial level equal to zero, although one might explore other values. The rules set the overload level equal to zero after the overload ends, which is consistent with our choice of initial values. Finally we mention that, in practice, SCP users will need additional rules to ensure that overload is not declared when maintenance routines are running during low traffic periods. One might supplement a utilization rule with a rule requiring incoming messages to exceed a certain threshold. In other

Overall utilization rule: if (overall utilization > upper threshold) overall overload level = overall overload level + 1 else if (overall utilization < lower threshold) overall overload level = overall overload level - 1 Subsystem utilization rule: if (subsystem utilization > subsystem threshold AND overall utilization > upper threshold) subsystem overload level = subsystem overload level + 1 else if (overall utilization < lower threshold) subsystem overload level = subsystem overload level - 1 Dropped messages rule: if (subsystem dropped messages subsystem overload level = else if (overall utilization < subsystem overload level =

> 0) subsystem overload level + 1 lower threshold) subsystem overload level - 1

74

words, if utilization is high, the processor should check that a reasonable number of CCS messages are arriving to the SCP. C. Traffic Streams and Sources We consider the aggregate of all calls that result in initial intelligent network queries originating from the same SSP with an identical global title address (GTA) to be one traffic stream. The GTA consists of a ten-digit NPA-NXX-XXXX (which, depending on the intelligent network service, may be the dialed digits, originating number, or billing number) and a service dependent translation type (TT). For example, all calls to 800123-4567 from a given local access and transport area (LATA) may be expected to result in the same population of the GTA field. All initial queries require global title translation (GTT) to be performed by a signaling transfer point (STP). During GTT, an STP will map the combination of GTA and TT to an SCP point code and SSN. These are the parameters which ultimately specify where (which SCP and which subsystem) the application (i.e., the logic) for handling the intelligent network call actually resides. The ACG procedure allows an SCP to control all traffic generated by traffic streams with partially matching GTA’s (matched on the most significant digits for -digit gapping) simultaneously. We define a source to be the sum of all traffic streams aggregated at an SSP that may be controlled by a single ACG request from the SCP (for the overload rules and ACG parameters currently under discussion). D. Gap Interval and Duration Values When generating a transactions capabilities application part (TCAP) message in response to an initial intelligent network query the SCP refers to its current overload level, namely, the maximum of the overall and subsystem (the subsystem which processed the query) overload levels. If this value is positive, then the SCP will include an ACG request in the TCAP response message, specifying a gap duration, gap interval, and target population (TP). For automatically implemented ACG controls the TP parameter specifies the TT and six–ten digits (the default being six digits) of the NPA-NXX-XXXX, and indicates to the SSP which source is to be controlled. How sources are gapped is determined by the gap interval and the gap duration. The gap interval specifies that the SSP shall accept at most one intelligent network call from the controlled source every seconds (the remaining calls are blocked). The gap duration specifies that gapping with interval should remain in effect for seconds or until the SCP makes a different ACG request, whichever occurs first. An SCP maintains a list of all active ACG requests that it has issued. Similarly, an SSP maintains a list of all of its active ACG controls. Whenever an SSP initiates a query for a controlled TP, it includes the ACG control’s parameters in the message. Subsequently, the SCP will compare these parameters to those from its own list of active controls. If there is an inconsistency, then the SCP will instruct the SSP to remove or update (if overloaded) the ACG control as appropriate. Upon receipt of an ACG request for a source

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

that is already subject to ACG, an SSP will “refresh” the control with the new ACG request parameters. Finally, an ACG control may be canceled by sending a congestion status with value zero, in which case the SSP will remove the ACG control from its list of active controls and discontinue the blocking of calls from the specified source. We map overload levels into pairs in the simplest possible way consistent with [1]: overload levels 1, 2, 3, and so on map to the allowed gap interval values in ascending order. Table I shows the mapping (for only the first ten overload levels) both for the currently allowed gap interval values and those recommended for the future (see [2]). As for durations, we mapped the th overload level to the larger of 32 s (the smallest duration that is larger than our initial choice of measurement interval, 30 s) and the th allowed gap duration [1]. The reason for making the duration larger than 30 s was to prevent gapping from expiring while the SCP might still be overloaded (assuming no new gapping requests came from the SCP). III. THE SIMULATION MODEL

AND INPUTS

We created a C language simulation of a single processor SCP serving calls from many sources. The principal definitions, modeling features, and assumptions are as follows. • A call generates a sequence of messages to the SCP. The SCP responds to each with a message (containing its overload level, which implicitly requests gapping) to the call’s source. The number of messages in a call, their spacing in time, and their real time costs in the SCP vary with the call type. The messages of all calls of a given type go to the same subsystem, which may accept more than one call type. There are multiple subsystems. • We model 400–10 000 sources, which correspond to the number of SSP’s in a typical LATA as codes for gapping range from three to six digits and the number of subsystems varies. When we increase the number of sources in passing from Scenario 1 to Scenario 2, the aggregate load does not change. • Sources send messages directly to the SCP. For simplicity, our model does not include the CCS network (STP’s and links) and does not include the integrated services digital network (ISDN) user part (ISUP) layer of the SS7 protocol. This assumption eliminates CCS links to the SCP as a potential traffic bottleneck. CCS links are more likely to be a bottleneck for calls with small real time costs and less likely for more complex call types. • The SCP simulation models call processing in detail, including processes dedicated to individual subsystems and interprocess communication. The model also includes a realistic scheduler that grants fixed time-slices to the various processes, including overhead processes. The model aggregates all operating system and high-priority noncall-related processes into a single overhead process. The overhead process runs 10% of the time during our simulations, to represent that proportion of time during which the SCP cannot process calls, even during a period of overload. Thus, at most 90% of the SCP’s total

NORTHCOTE AND SMITH: SERVICE CONTROL POINT OVERLOAD RULES

75

CURRENT

TABLE I FUTURE GAP VALUES

AND

Fig. 1. Offered traffic (calls/s) versus time (s) for Scenarios 1 and 2.

processing capacity may be used for call processing and interprocess communication. • We assume that processing capacity, not I/O, is the limiting resource in the SCP. • We simulate two distinct intelligent network subsystems: 1) SS1 processes basic freephone (800) calls, and is used in all scenarios. 2) SS2 processes “complex” calls; 95% are for an areawide CENTREX-like service, 0.7% are processing intensive database updates, and 4.3% are a fictitious intelligent network service involving several transactions. SS2 is used only in Scenarios 3–6. • In all cases, sources offer calls according to Poisson processes. However, calls that are blocked may retry; retries typically cause the overload to persist beyond the time when the first-offered load abates. • The first two overload scenarios we examine use only freephone calls. Since a freephone call requires about 1 ms of call-related processing (not counting interprocess communication costs), it is easy to see whether the SCP is running at or near capacity. The remaining scenarios use a combination of freephone and complex calls, with first-offered call rates set to ensure that both subsystems receive equal first-offered processing loads. The latter assumption does not result in a loss of generality, as we are able to infer what would happen under asymmetric loads. Figs. 1 and 7 show the offered traffic profiles in calls per second. Time in seconds is displayed on the axis, and the -axis is calls per second. Later graphs show completed calls per second, processor utilization, or overload level on the -axis. • Our simulation program indexes gaps as in the “future values” columns of Table I. This implies that when we are implementing the set of current gap interval values, in which the smallest usable gap is 3 s, the smallest nonzero overload level is level 6. Both Tables I and II contain gap values up to 58 s, but Table II has five more overload levels because the five new “future values” are assigned to the first five overload levels. IV. SIMULATION RESULTS This section presents six scenarios in which we progressively enhance ACG to improve SCP performance during

overload. Compared to Scenarios 1 and 3, Scenarios 2 and 4 aim to make ACG more responsive by speeding up its control actions. Scenarios 5 and 6 address fairness between competing services. Italics in each scenario show the changes from preceding scenarios. 1) Freephone service, 400 sources, current gaps, 30" timers, overall utilization rule. 2) Freephone service, 4000 sources, future gaps, 30" timers, overall utilization rule. 3) Freephone and complex calls, 10 000 sources, future gaps, 30" timers, overall utilization rule. 4) Freephone and complex calls, 10 000 sources, future gaps, 10" timers, overall utilization rule. 5) Freephone and complex calls, 10 000 sources, future gaps, 10" timers, subsystem utilization rule. 6) Freephone and complex calls, 10 000 sources, new gaps, 10" timers, subsystem utilization rule dropped messages rule. The parameters in the utilization rules are lower threshold %, upper threshold %, and subsystem threshold % (although some smaller values are discussed in Section IV-E). The choice of the upper utilization threshold effectively defines the SCP call capacity (when we use utilization rules) because the rules attempt to reduce traffic whenever call processing plus interprocess communication plus overhead costs exceed the threshold. The graphs showing subsystem utilization do not include interprocess communication costs. Note that the subsystem thresholds for both subsystems are the same. This is a reasonable choice because the subsystems have equal offered loads. A. Scenario 1: On-Off Behavior Freephone service, 400 sources, current gaps, 30" timers, overall utilization rule. Our first scenario is a simple one in which the number of sources is small because we assume that gapping is done on the first three digits of the GTA. We take the measurement interval timer to be 30 s to represent typical values used in SCP’s. The SCP can process 800–900 simple calls per second when running at 90% call-related processor utilization (interprocess

76

Fig. 2. Scenario 1 call completion rate (calls/s) versus time (s).

communication uses some real time). However, in Fig. 2, we see large periodic fluctuations in the completed call rate; the maximum is attained only briefly in each cycle. The average completed call rate is about 350 calls per second. Fig. 3 helps explain the periodic behavior observed in Fig. 2. During the periods of overload, the overload level alternates between level 0 (no control at all) and level 6 (representing the smallest gap interval value from the current set of gaps—see Table I), remaining constant for 30 s at a time. A cycle begins when the overall utilization becomes larger than upper threshold, setting overload level to six for the duration of a measurement interval. As the minimum old gap value of 3 s is too large (see next paragraph), the utilization then drops sharply. At the end of the measurement interval overload level becomes zero, which removes the gapping control at all sources. As the offered load is too high for the SCP, the overall utilization jumps to 100%, which invokes gapping at the end of the measurement interval, beginning another cycle. The measurement interval therefore determines the frequency of the oscillations in the call completion rate, utilization, and overload level. The maximum value of the call completion rate in a cycle depends on the SCP capacity. The call completion rate while controls are applied depends upon the number of sources, their call rates, and the gap value, as we now explain. In the present case, the smallest nonzero gap is the one used. Partition sources into groups by first-offered call rate functions. That is, two sources are in the same group if and only if their first-offered traffic rates are the same at all times. In the present scenario, the groups are small, medium, and large sources. One might further refine groups according to the type of service the source demands. Index groups by the letter . be the number of sources in group , be the call Let rate from a source in group , and be the gap applied to all sources. Assume that each source generates calls according to a Poisson process. Thanks to the memoryless property of the exponential distribution, the mean time between calls that the . source generates when it applies a gap of size is the aggregate call arrival rate from all Denoting by sources when they control calls with a gap of size , we have

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

Fig. 3. Scenario 1 overload level versus time (s).

after some algebra (1) Applying this formula to the numbers of sources and the call rates we used (200 small sources at 0.5 calls per second, 100 medium sources at 2 calls per second, and 100 large sources at 5 calls per second), along with s, gives the value calls per second. This agrees well with Fig. 2, even though (1) ignores reattempts. The comparison of (1) and Fig. 2 assumes that the SCP can process calls at the rate , which, in this case, is valid. The equation and figure demonstrate dramatically that in this example the least severe current gap value is too large with only 400 sources generating calls. The very first (least severe) overload level restricts traffic too much. The combination of a small offered call rate when gapping is in effect and application of controls for long (30-s) intervals results in poor call throughput. In addition, the corresponding 30-s period where traffic is not controlled at all threatens the SCP with instability. The next scenarios address these issues, first by attempting to match more closely to the SCP capacity. B. Scenario 2: One Subsystem, Good Performance Freephone service, 4000 sources, future gaps, 30" timers, overall utilization rule. Two remedies are available to deal with overcontrol at the first nonzero overload level. First, one can provide smaller gaps [2]. Clearly, smaller values of in (1) make larger. Second, one can subdivide sources to decrease their rates. Under subdivision, the number of source groups increases and the rates of individual sources decrease in such a way that the total first-offered rate from each former source [the numerators of the summands in (1)] remains unchanged. However, the denominators of the summands in (1) all decrease, making larger. Subdivision of sources is not an artifice; it corresponds precisely to gapping on longer digit strings. Applying ACG controls to traffic streams by specifying an extra GTA digit accounts for the ten times increase in the number of sources modeled in this scenario. Fig. 4 shows a vast improvement over Fig. 2. The throughput oscillations have disappeared and the throughput settles to an average rate of about 700 calls per second during the

NORTHCOTE AND SMITH: SERVICE CONTROL POINT OVERLOAD RULES

Fig. 4. Scenario 2 call completion rate (calls/s) versus time (s).

77

Fig. 7. Offered traffic (calls/s) versus time (s) for Scenarios 3–6.

CALL RATES

Fig. 5. Scenario 2 total SCP utilization (top) and call-processing utilization (bottom) versus time (s).

TABLE II SCP UNDER GAPPING

TO

We made two changes in going from Scenario 1 to Scenario 2; it is of interest to explore the effects of the changes individually. The results appear in Table II. The combinations of current gaps and 4000 sources and future gaps and 400 sources both assume that the gap used is the smallest available. This assumption maximizes the resulting rate in (1), hence the use of the inequality sign in the table. An important distinction between the Future/4000 entry of Table II and all the other entries is that Future/4000 corresponds to the third nonzero gap, while all the other entries correspond to the first nonzero gap. At the smallest nonzero overload level, it is possible for the overload level next to become zero, which removes controls entirely. The result can be the on–off behavior observed in Scenario 1. On the other hand, fluctuations of size 1 around level 3 do not remove controls entirely, thereby leaving the SCP protected from uncontrolled traffic.

Fig. 6. Scenario 2 overload level versus time (s).

C. Scenario 3: Two Subsystems, Slow Convergence to “Right” Gap Intervals

overload. Similar improvement is visible in Fig. 5, where the total utilization becomes roughly 90% (after a transient at 100%) during the overload. The SCP drops no messages here in contrast to Scenario 1, where it dropped up to 1000 per second when the overload level became zero. Fig. 6 explains the better performance observed in Scenario s), which 2. Here, the overload level settles at level 3 ( gives was not available before. Evaluating (1) with , close to what we observe in Fig. 4. Since the observed 90% total utilization includes 10% fixed overhead and about 10% interprocess communication, the processor spends about 70% of its time on call processing (Fig. 5). At 1 ms per call, this equates to 700 calls per second, so we see to the SCP capacity well. that the gap of 0.5 s matches However, it is fortuitous that the right gap level is small and so is attained quickly.

Freephone and complex calls, 10 000 sources, future gaps, 30" timers, overall utilization rule. We now change the service mix to include both freephone calls and more complex calls. Subsystem 1 handles freephone calls and Subsystem 2 handles the complex calls. The offered utilizations of both subsystems are equal, but the higher real time cost of the complex calls (5.6 ms per call versus 1.0 ms per freephone call) implies that the first-offered call rate of complex calls is about 5.6 times smaller than the offered rate of freephone calls. The number of sources is now 10 000, with 5000 of each type mapping to each subsystem. Since the total call rate for given offered load is now smaller, the interprocess communication costs are a smaller percentage of the total utilization. Table III applies (1) to the two subsystems during the overload period. The columns labeled “call rate” give the value

78

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

Fig. 9. Scenario 3 overload level versus time (s).

Fig. 8. Scenario 3 SCP processor utilization versus time (s).

CALL RATES

AND

TABLE III UTILIZATIONS TO SUBSYSTEMS

AS

FUNCTIONS

OF

GAPS

Fig. 10. Scenario 4 SCP processor utilization versus time (s).

complex call sources, which require yet larger gaps, explains why the overload level reaches eight. The present scenario underscores the accidental nature of the good results in Scenario 2. Moreover, it contrasts with Scenario 1, where the problem was that the very first available gap was too large. Here, the eighth gap is the “right” one. of for each subsystem that would result if the gap interval value in the row were applied to all sources sending traffic to that subsystem. The columns labeled “utilization” multiply the call rate by the processing time per call to arrive at the load offered to the subsystem in units of utilization. Recall that (1) does not account for call retries. One use of Table III is to tell what overload level is required to reduce the subsystem load to a desired level. This information will help in the interpretation of the results in the remainder of Section IV. Fig. 8 illustrates the essential behavior. The total utilization remains at 100% for over 200 s in each overload episode. Moreover, the utilizations of the two subsystems do not stabilize. There are also inequities between the two subsystems, whose discussion we defer until the next section. Fig. 9 shows the overload level as a function of time. The level rises slowly, attaining level 8 some 200 s into the overload, and then drops. The slow rise explains the 100% utilization level for over 200 s. An interesting difference between this and the last scenario is that the subdivision of sources decreases the individual source rates. As a result, even the sources generating freephone traffic require larger gaps than before to reduce their traffic rates by a given percentage. This and the presence of the

D. Scenario 4: Unfairness Freephone and complex calls, 10 000 sources, future gaps, 10" timers, overall utilization rule. Using one rule for all subsystems is clearly inconsistent with the objective of building firewalls. If only one subsystem were to overload the SCP, a single rule would also reduce traffic from all subsystems. This scenario illustrates that inequities may exist between subsystems even if both subsystems contribute equally to the overload. Shortening the timers from 30 to 10 s allows the overload level to reach a suitable level more quickly. In Fig. 10, we see the subsystem utilizations stabilize somewhat, with Subsystem 1’s utilization oscillating around 27% and Subsystem 2’s around 42%. As the overload level climbs from zero and then oscillates between levels 7 and 8, the utilization figures agree well with Table III. The utilization target range of 80%–90%, together with the fixed 10% overhead and the interprocess communication cost, would suggest that each service should get roughly 35% of the SCP capacity. However, Fig. 10 shows that the two subsystems do not get equal shares. The different treatments result from using one measurement (overall utilization) to assign the same overload level (and,

NORTHCOTE AND SMITH: SERVICE CONTROL POINT OVERLOAD RULES

Fig. 11.

Scenario 5 overload level versus time (s), 40% subsystem threshold.

therefore, gaps) to each subsystem. Since sources to Subsystem 1 send at higher rates than sources to Subsystem 2, they need correspondingly smaller gaps to yield the same percentage reductions in traffic. Table III makes this plain since both subsystems’ offered loads are equal. Unfairness by source rate was discussed extensively in [12], which proposed using an adaptive calculation to determine gaps for different rate sources to reduce traffic from all by the same percentage. However, one constraint of the present paper is that the adaptive control is not available. The sources in our model are of different sizes, they receive unequal treatment, and it is generally not possible to remedy unfairness by source size within the present framework. However, it is possible to alleviate unfairness by subsystem because subsystem-specific measurements can be used to set subsystem overload levels. In particular, the subsystem utilization rule can set overload levels for each subsystem to allocate equal shares of processor utilization to each subsystem. When both subsystems have equal offered utilizations (as is the case here), this policy is equivalent to equal percentage reductions in calls. E. Scenario 5: Sensitivity to Thresholds Freephone and complex calls, 10 000 sources, future gaps, 10" timers, subsystem utilization rule. In this section, we replace the overall utilization rule with the subsystem utilization rule. It turns out to be difficult to select fixed subsystem thresholds satisfactorily. There are risks associated both with choosing values that are too large and with choosing values that are too small. Our first results use values that are on the large side. In Fig. 11 the overload level for Subsystem 1 settles down to a neighborhood of 5, while overload level 2 settles down near 11. Table III indicates that these overload levels correspond to utilizations of about 35% for each subsystem. Figs. 11–13 show overload levels for three values for the subsystem utilization threshold: 40%, 38.5%, and 35%. In all cases, Subsystem 1’s utilization is much closer to Subsystem 2’s, so the subsystem rules do rectify the unfairness that we observed above. Note, however, that in Fig. 11 the increase in overload level 1 lags the increase in overload level 2 by about 100 s in both overload episodes. During each such 100-s period, the total utilization increases to 100% and remains there for an

79

Fig. 12. Scenario 5 overload level versus time (s), 38.5% subsystem threshold.

Fig. 13. Scenario 5 overload level versus time (s), 35% subsystem threshold.

additional 100 s. While overload level 1 is zero, the rate at which Subsystem 1 drops messages increases to over 400 per second for over 150 s. The real time cost of dropping a message is small, allowing Subsystem 1’s utilization to decrease at the beginning of each overload episode. However, the more messages processed per unit time, the higher the interprocess communication cost, which contributes to the 100% utilization. With the utilization threshold 35%, overload level 1 increases when overload level 2 does. In this case, the total utilization remains at 100% for only about 80 s and Subsystem 1 drops messages at a rate less than 250 per second for only 50 s. The utilization threshold 38.5% gives results similar to the threshold 40% during the first overload episode and results similar to the 35% threshold during the second episode, as the overload level graphs would suggest. The sensitivity of the results to the subsystem utilization thresholds occurs because the different thresholds are close to the percentages of processing capacity that the process scheduler allocates to the subsystems. The threshold of 40% is greater than subsystem 1’s allocation at the onset of overload, causing overload controls not to be triggered for subsystem 1. However, once traffic to subsystem 2 is reduced enough, subsystem 1 gets a larger real time allocation. Its utilization rises above 40%, triggering controls for subsystem 1. Clearly, 38.5% is close to subsystem 1’s real time allocation, as evidenced by the different behavior in the two overload episodes, while 35% is less than the real time allocation. Making a subsystem utilization threshold less than the process scheduler’s allocation to that subsystem can result in

80

Fig. 14.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 1, FEBRUARY 1998

Scenario 6 overload level versus time (s), 40% subsystem threshold.

the type of unfairness we saw in Section IV-D. For example, if both subsystem thresholds were 30%, the overload levels of both subsystems would increase to level 7, at which point subsystem 1’s utilization would go below the threshold (according to Table III). Overload level 2 would continue to increase. Assuming that the sum of the subsystem utilizations could be 70%, a level of nine for subsystem 2 would bring the SCP out of overload (see the definition of the subsystem rule), making further increases in overload levels unnecessary. Then subsystem 1 would get a 30% processor allocation and subsystem 2 a 40% allocation, contrary to what the equal thresholds for both subsystems would suggest. The bias against the subsystem with the higher call rate stems in part from the fact that each subsystem’s overload level starts at zero and then becomes 1, 2, and so on. Other schemes are possible, but each fixed pattern would likely change, not remove, biases. The risk that a subsystem threshold might be too large motivates us to introduce a rule based on dropped messages in the next section. F. Scenario 6: Extra Rule for Protection Freephone and complex calls, 10 000 sources, new gaps, 10" timers, subsystem utilization rule dropped messages rule. Adding the dropped messages rule has two benefits. First, it acts as a safety net when the subsystem utilization rule does not take action (e.g., some overhead processes take more real time than usual, causing the subsystem utilization to remain below the subsystem’s threshold). When the SCP drops messages, the overload levels of the corresponding subsystems will increase. Second, the presence of two independent rules permits overload levels to increase twice as fast as with one rule. In Fig. 14, we see both effects. Comparing to Fig. 11, note first that in the present scenario, overload level 1 and overload level 2 increase in tandem. Moreover, each curve rises twice as fast as in Fig. 11. The utilizations of the two subsystems in Fig. 15 are similar, although both fluctuate with changes in the overload levels (which are larger than in the preceding scenario, thanks to the additional rule). Reacting explicitly to dropped messages sharply reduces the volume of messages dropped; with any of the three subsystem utilization thresholds as in Scenario 5, the SCP drops messages only for brief intervals.

Fig. 15. Scenario 6 SCP processor utilization versus time (s), 40% subsystem threshold.

Fig. 16. Scenario 6 overload level versus time (s), 38.5% subsystem threshold.

As in Scenario 5, the results are still sensitive to the subsystem utilization thresholds. For example, the overload level curves corresponding to the threshold 38.5% jump up and down much more than those for the threshold 40%; see Fig. 16. However, the effects on call throughput are smaller than the overload curves would suggest. Note, in addition, that the changes in overload level generally are synchronized between the two subsystems. V. SUMMARY

AND

CONCLUSIONS

This paper investigated improvements to ACG within the constraints of existing requirements and implementations. The most important constraints are: • the need to select gap intervals from a prescribed set of values; • the need to map SCP measurements into overload levels by comparing them to preset thresholds; • the length of the SCP measurement interval. A common theme in all of our findings is that it is important to control traffic with the right gap interval values quickly. Gapping on longer digit strings (tantamount to subdividing sources into greater numbers of “sub”-sources with smaller call rates) and enlarging the set of gap intervals were two steps toward providing the “right” gaps. Shortening the measurement interval to a small value (10 s, which is not so small that it increases unacceptably the real time cost of operating the overload controls) sped up the rate at which the right gap values could be attained.

NORTHCOTE AND SMITH: SERVICE CONTROL POINT OVERLOAD RULES

With these improvements and an overall utilization rule, the problem of inequity between sources of different rates remained. An overall rule penalizes sources with larger rates. We next proposed to set gap intervals individually for subsystems with subsystem utilization rules. While this step made the subsystem utilizations more equal, the manual setting of the subsystem utilization thresholds turned out to be difficult. If a subsystem’s threshold was too high, the subsystem would not invoke controls and would end up dropping messages. If a subsystem’s threshold was too low, inequities again resulted. To rectify the problem of mistakenly large thresholds, we introduced a rule based on dropped messages. An additional benefit of the dropped messages rule was that the presence of a second rule caused the SCP to adjust overload levels twice as fast as with one rule. Despite the difficulties in assigning thresholds to subsystems, the control scheme as in Scenarios 5 or 6 should often be able to cope with overloads that are focused only on one subsystem. In such cases, when the overloaded subsystem offers a load greater than its threshold, then only its traffic will be reduced. The SCP will serve all the traffic from other subsystems. If historical data are available to help determine the length of the digit strings on which to gap, then the SCP operator can speed “convergence” to the right gap intervals, thereby improving performance. REFERENCES [1] “Advanced intelligent network (AIN) 0.1 switching system generic requirements,” Bellcore, Tech. Rep. TR-NWT-001284, Aug. 1992, issue 1. [2] “Advanced intelligent network (AIN) 0.1 switching system issues list report,” Bellcore, Rep. GR-1298-ILR, Apr. 1995, issue 2A. [3] “Advanced intelligent network (AIN) 0.2 switching systems generic requirements,” Bellcore, Rep. GR-1298-CORE, Dec. 1994, issue 2. [4] “AIN SCP generic requirements,” Bellcore, Rep. GR-1280-CORE, Aug. 1993, issue 1. [5] “Bell Communications Research specification of signalling system number 7,” Bellcore, Rep. GR-246-CORE, Dec. 1994, issue 1. [6] A. Berger, “Comparison of call gapping and percent blocking for overload control in distributed switching systems and telecommunications systems,” IEEE Trans. Commun., vol. 39, pp. 574–580, Apr. 1991. [7] R. A. Farel and M. Gawande, “Design and analysis of overload control strategies for transaction network databases,” in Proc. 13th Int. Teletraffic Congress, Copenhagen, Denmark, June 1991, pp. 115–120.

81

[8] G. H´ebuterne, L. Romoeuf, and R. Kung, “Load regulation schemes for the intelligent network,” in Proc. XIII Int. Switching Symp., vol. V, Stockholm, Sweden, 1990, pp. 159–164. [9] R. Kawahara and T. Asaka, “Overload control for intelligent networks based on an estimation of maximum number of calls in a node,” in Proc. IEEE Intelligent Network Workshop IN’96, Melbourne, Australia, Apr. 1996. [10] D. McMillan and M. Rumsewicz, “Analysis of congestion control for SCCP traffic and the impact on intelligent network services,” in Proc. IEEE Intelligent Network Workshop IN’96, Melbourne, Australia, Apr. 1996. [11] X. H. Pham and R. Betts, “Congestion control for intelligent networks,” in Proc. Int. Zurich Seminar on Intelligent Networks, Zurich, Switzerland, 1992, pp. 511–524. [12] D. E. Smith, “Ensuring robust call throughput and fairness for SCP overload controls,” IEEE/ACM Trans. Networking, vol. 4, Oct. 1995, pp. 538–548. [13] P. M. D. Turner and P. B. Key, “A new call gapping for network traffic management,” 13th Int. Teletraffic Congress, Copenhagen, Denmark, June 1991.

Bruce S. Northcote (M’95) received the B.Sc. (hons.) and Ph. D. degrees from The University of Adelaide, Australia. He is with Fujitsu Nexion, Acton, MA, where he is a Senior Performance Engineer responsible for the performance analysis and benchmark testing of the NEXEN 8000 ATM switching system. Prior to this he was a Senior Consultant with the Department of Engineering, Performance, and Control, Bellcore, Red Bank, NJ. It was there that he developed his research interest in the modeling and analysis of congestion controls and capacity limitations in the CCS network, with emphasis on the performance analysis of methodologies for the automatic detection and control of overloads in AIN.

Donald E. Smith received the B. A. degree in mathematics from Princeton University, Princeton, NJ, and the Ph.D. degree in mathematics from the University of Chicago, Chicago, IL. He is a Principal Member of Technical Staff at GTE Laboratories, Waltham, MA, where he works on trafffic and performance analysis. His recent interests include congestion and overload control in intelligent networks and the Internet, and ATM and Internet traffic engineering. In June 1995, he received the Leonard G. Abraham Prize Paper Award from the IEEE Communications Society for his paper, "Effects of Feedback Delay on the Performance of the Transfer-Controlled Procedure in Controlling CCS Network Overloads," published in the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS in April 1994.

Suggest Documents