platform operator towards achieving his goals. II. THE PROBLEM. A. Configuration complexity of Big Data handling platforms. Figure 1 shows a standard M2M ...
Apostolos Papageorgiou, Manuel Zahn, and Ernö Kovacs. Auto-configuration System and Algorithms for Big Data-enabled Internet-of-Things Platforms. 3rd IEEE International Congress on Big Data (BigData `14), pages 490-497. IEEE, July 2014. DOI: 10.1109/BigData.Congress.2014.78
Auto-configuration System and Algorithms for Big Data-enabled Internet-of-Things Platforms Apostolos Papageorgiou, Manuel Zahn, Ern¨o Kovacs NEC Laboratories Europe Heidelberg, Germany {apostolos.papageorgiou, manuel.zahn, ernoe.kovacs}@neclab.eu
Abstract—Internet of Things (IoT) platforms that handle Big Data might perform poorly or not according to the goals of their operator (in terms of costs, database utilization, data quality, energy-efficiency, throughput) if they are not configured properly. The latter configuration refers mainly to system parameters of the data-collecting gateways, e.g., polling intervals, capture intervals, encryption schemes, used protocols etc. However, re-configuring the platform appropriately upon changes of the system context or the operator targets is currently not taking place. This happens because of the complexity or unawareness of the synergies between system configurations and various aspects of the Big Data-handling IoT platform, but also because of the human resources that an efficient re-configuration would require. This paper presents an autoconfiguration solution based on interpretable configuration suggestions, focusing on the algorithms for computing the mentioned suggested configurations. Five such algorithms are contributed, while a thorough evaluation reveals which of these algorithms should be used in different operation scenarios in order to achieve high fulfillment of the operator’s targets. Keywords-M2M; IoT; configuration; gateway; autonomic; self-management;
I. I NTRODUCTION The Internet of Things (IoT) is expected to be one of the most important sources for Big Data [1], [2]. This is because it is practically realized large-scale platforms of the so-called Machine-to-Machine (M2M) technologies, which generate, report, and store data in an automated manner, often periodically. The expectations for M2M growth, foreseeing the interconnection of billions of devices [3], makes it obvious that the IoT will require special mechanisms for its Big Data. Big Data challenges do not appear only at the level of querying and processing. Prominent studies of Big Data challenges, such as [4], include data acquisition as the first step of the Big Data process. As stated in [4], “Big Data does not arise out of a vacuum”. Indeed, new platforms, which handle Big Data, face new challenges for this step, because as explained in the following- unsophisticated configurations of the data acquisition system can reduce the value of the Big Data and the performance of the Big Data platform. This work deals with exactly these configuration-related challenges, focusing on M2M systems, which -as explained before- are expected to be a typical “collector” of Big Data.
Figure 1.
M2M systems as Big Data collectors
More concretely, the paper identifies obstacles towards self-configuration solutions for M2M systems and proposes a system extension including five possible algorithms, which are developed and evaluated based on the insights of our related study. To this end, Section II discusses the most important issues and some related works with regard to achieving efficient self-configuration in M2M platforms that handle Big Data, Section III describes the proposed solution and the developed algorithms, while the evaluation of Section IV shows how well the different algorithms can help a platform operator towards achieving his goals. II. T HE PROBLEM A. Configuration complexity of Big Data handling platforms Figure 1 shows a standard M2M system architecture, indicating some variables that can help explain the need for self-configuration when Big Data-handling is involved. This is explained in the following with some numerical examples: • If n reaches billions (as forecasted by related surveys [3]), then m will correspond with many thousands, even in moderately big (e.g., city-scale) systems. Considering that reconfiguration might be needed upon changes of operator requirements, application preferences and
The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
statistics, etc., the frequency of reconfiguration might need to correspond with multiple times per day. If performed t times per day, assuming that each gateway has tens or hundreds of configurable parameters (z), then the number of parameter resettings per day (m × z × t) can grow up to many millions, even for city-scale platforms. • The size (r) of the handled Big Data (which can refer, for example, to database table rows for standard relational DBs, or log entries or similar for NoSQL databases), depends heavily on the GW configurations, because the latter include parameters such as capture intervals, polling intervals, aggregation properties, data resolution, filtering thresholds, etc. Thus, efficient reconfiguration upon changes of the system context can reduce r while maintaining the desired data quality. In [5], we explain how this can lead to better performance in Big Data scenarios. Despite the aformentioned benefits of dynamic reconfiguration, there are three main reasons why it is not performed: • If manual reconfiguration is involved, then m × z × t parameter resettings per day do not come into question, because the costs outweigh the benefits. • Even if automation is possible, self-configuration might be avoided because of the complexity of the synergies between configuration and result, i.e., because it is difficult to predict if certain reconfigurations will have the desired impact. This means that there are no known ways of performing efficient self-configuration of an M2M system. • Finally, gateways often differ between them. They have different parameters and implement different protocols. This makes it often impossible for the platform to actively set the final configuration values directly. Thus, solutions that can (i) fully automate the parameter resetting process, (ii) take the system synergies into account, and (iii) “live with” the heterogeneity of gateways, are needed. B. Related work There are various works suggesting generic architectures or basic structures for autonomous (or self-adaptive, incl. self-configuring) systems. In this section we will limit the discussion to approaches that handle the issue of (network) parameter tuning, because this is practically what is needed in the problem that we described. Given that the issue has not been handled explicitly for M2M systems and gateway parameters before, we look into approaches from related networking areas and we explain why we approach the issue in a different way. In the area of LTE, [6] and [7] provide flows with which an eNodeB can be self-configured by applying parameter values that are provided by central configuration servers, but
no concrete algorithms for computing configuration values are examined. [8] goes in that direction by providing an optimization algorithm for the tuning of the 3GPP-defined eNodeB parameters “hysteresis” and “time-to-trigger” in order to perform better handovers. However, this is possible because the central servers know exactly the (standardized) value ranges of these two parameters and the impact they have on handover performance. In the Big Data field, selfconfiguration approaches have only appeared for self-tuning the “more than 190 parameters of Hadoop” [9]. However, the techniques used there rely on Hadoop-specific “rules of thumb” which are often empirical. For example, a rule recommends setting the number of reduce tasks in a job to roughly 0.9 times the total number of reduce slots in a cluster. Approaches with similar focus and similar restrictions with the above have appeared for MAC protocol selfparametrization [10], TCP active queue systems [11], and more. The above clearly reveals the dependence of the algorithms on the domain-specific synergies between system parameters and performance metrics, as well as the necessity for constructs that can help us to deal with the heterogeneity of the configurable modules (i.e., the gateways, in our case). Such constructs can be description languages or standard protocols which respect the heterogeneity of the gateways. Therefore, the rest of the paper contributes both an enabling system and various possible algorithms for its parameter tuning logic, which are based on different core concepts. III. I OT PLATFORM AUTO - CONFIGURATION SYSTEM AND ALGORITHMS
A. Enabling system based on new description technologies Figure 2 describes how the system extension for autoconfiguration works. Machine-readable descriptions of (a) the operator commands (i.e., the things that she/he intends to achieve via reconfiguration) and (b) the system parameter synergies (i.e., the impact that certain reconfiguration actions have on various aspects), are provided as input. As shown in Figure 2, the Commands Descriptor contains -among others- weights and target levels for various goals and can be auto-generated from a GUI through which the operator can provide the corresponding preferences. Contrary, the Synergies Descriptor is expected to be quite static and will normally be preset, though it can also be edited any time. It contains values that indicate how each parameter impacts each possible operator target. The version we are currently working with (and which has been used in the evaluation of this paper) contains 19 commands and 26 parameters, with synergies that have been determined during the related educated study which we performed for [12]. The mentioned inputs are processed by a Configuration Computation Algorithm. There are practically infinite possibilities for the logic of this algorithm. The next subsection and the evaluation of this paper focus on various algorithms
that can be used for this task. However, the output must be in a specific, well-defined format. More concretely, the output of the algorithm is a GWinterpretable Configuration, i.e., a directive of the backend to the gateways in the form of suggested (non-final) parameter values. As shown in Figure 2, the respective machinereadable descriptor contains one tag for each configurable gateway parameter, indicating what the current suggestion for this parameter is. This suggestion must be one of the values that appear in the synergies descriptor for the respective parameter. Finally, each gateway uses its internal Gateway Stylesheet in order to translate the suggestions into concrete configuration parameter values. Similarly to the way HTML stylesheets can contain one entry for each HTML tag (in order to specify the formatting of this tag in the Web content), a Gateway Stylesheet can contain an entry for each tag of a GW-interpretable Configuration. The example of Figure 2 helps to understand this better: If the suggestion is medium and the stylesheet indicates that this gateway supports for this parameter the (ordered) values no, debug, and info, then debug should be selected as the final value. The rest of this paper focuses on the (gateway-independent) process until the generation of the GW-interpretable configuration and not on the final interpretation. B. Self-configuration Algorithms As already mentioned, there are uncountable possibilities for the self-configuration algorithm (“Config. Computation Algorithms” in Figure 2). Three basic algorithms and two extensions of them, i.e., five algorithms in total, have been developed. All of them take the command descriptor and the synergies descriptor as input and provide suggested configurations, i.e. a “GW-interpretable configuration”, as output. All algorithms have a first optional step in common: Step 0: Dependencies of gateway parameters on external factors lead to a recalculation of the knowledge base, i.e., the synergies descriptor. External factors refer to any extra information that might dictate adaptation of some values of the synergies descriptor table (top right in Figure 2). For example, extra information might be the compressibility of the data used in the system, actual platform utilization degrees, or feedback from energy consumption measurements. The rest of their steps is summarized in the following, while examples based on the command and synergies descriptor shown in Figure 2 are provided in brackets. Further, the last paragraph of this section discusses further possibilities (e.g., based on optimization problems) and explains what the development of these algorithms (or heuristics) contributes to the field. CCA / ECCA The Configuration-Comparing Algorithm (CCA), as well
Figure 2.
Auto-configuration system extension overview with examples
as its variation ECCA (Extremal Configuration-Comparing Algorithm ), calculate gateway-interpretable configurations based on the idea of separately optimizing each gateway parameter. This is done with maxed-out values (ECCA) or aligned target values (CCA). The following elementary steps are performed for each gateway parameter: Step 1: All high-level commands from the operator targets are associated with the corresponding knowledge base entries. (e.g. capture rate: low(Costs, Energy Efficiency), high (Big Data quality)). Step 2: The classified commands are examined: Step 2a: Based on the weights of the commands, the most valuable entry (MVE) is calculated. Most valuable means the entry with the highest accumulated weight (e.g. for capture rate: wlow (= 0.4+0.3) < whigh (= 0.8), thus MVE = high). Step 2b: The parameter characteristics are considered. For qualitative values or ECCA, the MVE is directly suggested (e.g. capture rate: high). In case of quantitative values and CCA, a target value negotiation is realized: The combination of Target Type and Target Value yields in a specific desired interval of the MVE, a so called scale. A scale is calculated for each command. The scales are weighted in order to map the importance of the commands. The resulting aligned target value represents the suggestion for this gateway pa-
2
rameter (e.g. capture rate: MVE = high, but due to a Target Level of 50% for “Big Data quality” the capture rate is set to medium). The Weight Ranking Algorithm (WRA) calculates its output based on the idea of satisfying the commands based on their descending order of weights. The following two elementary steps are performed: Step 1: All commands from the operator targets are sorted in a descending order of their weights (e.g. Big Data quality (0.8) > Energy Efficiency (0.4) > Costs (0.3)). Step 2: The interpretable configuration is filled in a weight ranking-based manner: The commands are considered in the prior calculated order for the purpose of copying their entries as long as these entries are not yet set (e.g. first copy the entries of Big Data quality, then copy the entries of Energy Efficiency -if not already set- and so on). Since the algorithm strongly depends on weights and follows a “first things first” principle, WRA leads to valuable results only if the commands have big differences in importance (which is reflected by their weights). BADA / BigBadaBoom The Balanced Achievement Degree Algorithm (BADA) is based on the idea of a fair, dynamic leveling of the target achievement degrees for all commands. BigBadaBoom is an extension of BADA which considers the command weights for calculating the suggested values, as well. Figure 3 clarifies this basic concept: whereas ECCA, CCA and WRA attempt to reach a maximum total fulfillment, the BADA algorithms are attempting to reach a fairly distributed fulfillment. BADA performs the following fundamental steps (because of its complexity and higher degree of detail, its pseudocode is provided in Algorithm 1, as well): Step 1: The number of “desirably fulfilled parameters” is calculated based on the Target Level and the applicable gateway parameters for each command (i.e., if the Target Level is less than 100%, then not all parameters must be satisfied for a command). Step 2: The interpretable configuration is initialized with possibly existing distinct entries, e.g., if all given commands require a “high” value for a given parameter, then this value is directly copied into the suggested configuration. Step 3: BADA/BigBadaBoom iterations are executed for all remaining (unset) entries of the interpretable configuration. In, the following, these iterations are described for BigBadaBoom (BADA works identically, with the difference that it uses simple degrees of target achievement, i.e., DTA, instead of weighted degrees of target achievement, i.e., wDTA): Step 3a: Weighted degrees of target achievement (wDTA) are calculated for all commands. The weights slip into the calculation of wDTA in a non-linear way depending on the
1.5 Fulfillment
WRA
Big Data quality Costs Energy Efficiency
1
0.5
0 ECCA/CCA/WRA
Figure 3.
BADA
Visualization of BADA’s fairness concept
gap to a degree of 100% target achievement, i.e. on “how far from achieving an absolute fulfillment the command currently is”. Thus, the closer wDTA is to 100%, the smaller is the effect of the weight of a command, so that the command with the lowest wDTA is chosen for setting a suggested entry in the next step. Step 3b: Among the gateway parameter values that satisfy the command selected in 3a, the one which maximizes the aggregated benefit for the other commands is chosen. This benefit depends on whether the specific entry correlates with the remaining entries taking their wDTA’s into account, as well. This is because it is better to select a gateway parameter that helps to satisfy most other commands, too, while it is less damaging to set a non-matching entry to a command with an already high wDTA. Step 3c: The entry of the “poorest” command (in terms of current wDTA) that is most beneficial for the other commands (cf. 3b) is written into the suggested GW-interpretable configuration. The algorithm exits when there are no more entries to fill into the latter. Algorithm 1.
Pseudocode of the BADA / BigBadaBoom algorithms
DEFINITIONS DP: Map of commands, each command is associated with a number of desired fulfilled parameters wDTA: Map of commands, each command is associated with a number of the weighted degree of target achievement BlackList: Set of commands, each command of this set is completed and does not need to be handled any more ------------------------------------------------------------// Input: CP (table of commands and parameters) // Input: OT (set of operator targets) // Output: IC (interpretable configuration) // Step 1: Calculate desired number of parameters for all commands c of OT DP.put(c,(50%+c.getTargetLevel/2)*c.getNumberOfSuggestedEntries) end for // Step 2: Set distinct suggestions for all parameters p of CP if entrySet of p is distinct IC.getParameter(p).setSuggestion(p.getDistinctEntry()) end if end for // Step 3: BADA minWeight=getMinWeight(OT) e=0.000001 while IC is not complete // Step 3a: Calculate wDTA’s and filter the poorest cmd for all commands c of OT
if BlackList does not contain c pro=c.getNumberOfProEntries(IC) contra=c.getNumberOfContraEntries(IC) all=c.getNumberOfSuggestedEntries() if pro+contra==all BlackList.add(c) end if else DTAi=(pro-contra)/all wDTAi=0 if BADA wDTAi=DTAi end if if BigBadaBoom minW eight 1−DT Ai × DT Ai , ( 1−c.weight+minW eight ) eight 1−DT Ai ) × DT Ai , wDT Ai = ( minW c.weight ( minW eight )DT Ai × DT A , c.weight
i
DT Ai > 1 0 ≤ DT Ai ≤ 1 otherwise
end if wDTA.put(c, wDTAi) end else end if end for poorestCmd=wDTA.getCmdWithLowestWDTA() lowestWDTA=wDTA.getLowestWDTA() highestWDTA=wDTA.getHighestWDTA() // Step 3b: Calculate the entry benefits highestBenefit=Double.MinValue highestBenefitParam=null for all parameters p of CP if (IC.getParameter(p).getSuggestion==null && CP[poorestCmd][p]!=null) benefit=0 for all commands c except poorestCmd of OT if CP[c][p]!=null
the size of the problem allow it). Instead of trying to enforce a universally applicable fulfillment metric, the set of developed algorithms provides different kinds of solutions to the problem, which can be examined and evaluated with regard to their behavior against different fulfillment metrics. This is exactly what the evaluation of the next section will do, in order to help operators to choose one of them or decide what kind of logic they prefer for the computation of the suggested configurations. Finally, there are various possibilities for extending these algorithms. For example, a further algorithm has been developed, which can be combined with all previously presented algorithms. By analyzing the command descriptor, it determines the command set which maximizes the total complementary correlation and gives only this set as input to ECCA, CCA, WRA, BADA, or BigBadaBoom. Simply said, it filters out the commands that have different requirements than the others. Further details are out of scope for this paper, but additional extensions or different solutions are possible. IV. E VALUATION
|wDT A.get(c)−lowestW DT A|
b = 2 − ( |highestW DT A−lowestW DT A|+e ) benefit+=(CP[c][p]==CP[poorestCmd][p]) ? b : -b end if end for if benefit>highestBenefit highestBenefit=benefit highestBenefitParam=p end if end if end for // Step 3c: Choose highest benefit parameter IC.getParameter(highestBenefitParam).setSuggestion(CP[poorestCmd ][highestBenefitParam]) end while return IC
Further aspects and discussion One of the main reasons why these algorithms have been developed is the fact that the logic with which the operators want to fulfill their commands can vary from operator to operator and cannot be known a priori. This means that there is no standard fulfillment metric, which the “configuration computation algorithms” could try to maximize (potentially by setting up an optimization problem). For each of the presented algorithms, there could be an equivalent objective function, the maximization of which would give exactly the same results as the algorithm itself, only that the optimization-based solution would be much more timeconsuming and complex, thus error-prone. For example, the calculation of an optimal configuration based on our knowledge base (26 parameters with an average of 4.6 entries each) involves more than 19 quadrillion (19 ∗ 1015 ) combinations of parameter entries, which could lead to prohibiting times for the evaluation of the next section even if the objective function was linear. However, if an operator comes up with a concrete and different fulfillment metric that she/he wants to maximize, the setting up of an optimization problem instead of using any of these algorithms might be a feasible solution (if the nature of the objective function and
A. Scope and setup As implied in the previous section, the present evaluation examines the behaviour of the presented auto-configuration algorithms, i.e., the levels of “fulfillment of operator goals” that they achieve. The used fulfillment metrics are described later. In the following, an OT (Operator Target) corresponds with the assignment of a Target Level (0-100%) and a weight (0-1) to a command, i.e., to a row of the top right table in Figure 2, while a combination of multiple OTs is hereinafter referred to as OTC. Compared approaches: The five algorithms presented in the previous section, namely CCA, ECCA, WRA, BADA, and BigBadaBoom, are compared for three different metrics and two different classes of OTC datasets (more details below). The point is that a “perfect algorithm” does not exist, because all algorithms rely on subjective assumptions about how the operator wants her/his commands to be fulfilled. However, it is possible to reach conclusions about which algorithm is recommended under which circumstances. Metrics: The comparison is done based on the fulfillment of the OTCs achieved by each algorithm, denoted as F (OT C). However, this fulfillment is calculated for three different metrics. The exact functions for calculating F (OT C) for all three metrics are provided in the Appendix, while the metrics and a short summary of their core logic are listed here: • Metric-1 (Beneficial overfulfillment): The fulfillment of target levels for all commands of the OTC is computed (see Appendix) while their overfulfillment is rewarded, i.e., if the operator sets a Target Level of 70% for a given command and the algorithm achieves to satisfy this command perfectly (i.e., 100% matching between
suggested entries and command requirements) then this is rewarded. • Metric-2 (No overfulfillment): As above but the overfulfillment is not rewarded, i.e., if the operator sets a Target Level of 70% for a given command, then satisfying this command by 70% or perfectly leads to the same result. • Metric-3 (Minimum fulfillment): As Metric-1, but what counts here is not the average fulfillment for all given commands, but the fulfillment achieved for the command with the “worst” result. Further, statements about the complexity and the configuration computation time of each algorithm will be provided based on the number of gateway parameters (n1 ) and the number of OTs in each OTC (n2 ). Scenarios and controlled variables: The tests have been run for 100,000 OTCs, i.e., for 100,000 subsequent reconfigurations. However, the following variables have been varied in a controlled way: • The experiments were repeated for different numbers of OTs in each OTC, namely 2, 4, and 8. • The experiments were repeated for two different classes of randomly generated OTC’s, namely once with positive and once with negative correlations between the OTs. Positively correlated OTs consist of commands that are easier to satisfy “at the same time”, while commands of negatively correlated OTs have competing requirements. “Correlation degrees” between these two classes lead to results inside the spectrum defined by the two classes, while complete randomness is closer to the positively correlated OTs, because most real OTs tend to have positive correlations (for example, “Costs” and “Database Utilization” are satisfied by similar configurations). B. Results and discussion Figure 4 shows the fulfillments F (OT C) of the compared approaches for the three different metrics, the two different types of OTC datasets, and the three different numbers of OTs per OTC. Table I summarizes the most important results in a table providing additional information for our comparisons. For all these results, the confidence intervals were so small (please refer to the Appendix) that they are not depicted and can be safely ignored in the rest of the discussion. The following paragraphs summarize the most important conclusions that can be drawn from these results. Note that the practical value of the observations is high also because of the fact that the experiments have been performed with an educated knowledge base, i.e., with real synergies between commands and parameters. As mentioned earlier, these synergies have been revealed by our related analysis. One of the most interesting observations relates to the effect of the fairness concept employed by BADA/BigBadaBoom. It has turned out that although these algo-
rithms achieve their goal of avoiding the neglection of low-priority commands (this is reflected in BADA/BigBadaBoom’s dominance for Metric-3, achieving >40% for positively correlated commands and >20% for negatively correlated commands, contrary to the big “minuses” of all other approaches) while retaining very high results for the other metrics as well (best results for Metric-2 with over 60% and not much worse than CCA/ECCA for Metric-1). Overall, it seems that if overfulfillment of targets is not of extreme importance, BigBadaBoom should be seriously considered as the preferred auto-configuration algorithm. However, if a small number of commands per OTC is expected, then the operator might want to prefer ECCA (or CCA). This is because ECCA gives (by definition) the best results for high-priority commands if overfulfillment is important. Indeed, ECCA performs best for Metric-1 with results between 2 and 4 (or 200%-400%), while its Metric-2 and Metric-3 results remain among the highest ones when only 2 commands exist in each OTC. This happens because the fulfillment of low-priority commands cannot drop too low if not too many commands need to be satisfied (also because the positively correlated synergies often make it easy to satisfy different commands at the same time). More fine-grained comparisons in very concrete cases can be done based on Table I. Further differences of the algorithms are provided as well. For example, WRA has the lowest complexity (Θ(n × log n)) and can calculate many suggested re-configurations per second, although there seem to be no obvious use cases of real M2M platform operation where this would make a significant difference. Finally, it must be noted that the experiments were done with random target levels in the entire spectrum 1%-100%. If this had been restricted to higher values, i.e., if we expect the operator to provide target levels that are on average much higher than 50%, then the BADA algorithms would not dominate the fulfillment of Metric-2 anymore. This is because the higher the average target levels the smaller the span that BADA can “play with” in order to fairly “distribute” the fulfillment degrees among the commands (refer to Figure 3). V. C ONCLUSION Against the difficulties of dynamically configuring Big Data-handling IoT platforms in order to achieve various possible operator goals (e.g., performance-, costs-, or datarelated ones), the presented work contributed the following: • Some details of a solution for central computation of suggested configurations, which are then interpreted (or not) and applied (or not) on the gateways that communicate directly with the data sources. • Five new algorithms for performing the computation of the mentioned suggestions, which are all based on different ideas and might be preferred by different operators or in different situations.
4.5
4.5 CCA ECCA WRA BADA BigBadaBoom
Fulfillment
3.5 3
CCA ECCA WRA BADA BigBadaBoom
4 3.5 Fulfillment
4
2.5
3 2.5
2
2
1.5
1.5
1
1 2
4
8
2
Number of Operator Targets (OTs)
(a) Fulfillment with Metric-1 and pos. correlated commands
1 0.8
CCA ECCA WRA BADA BigBadaBoom
1.4 1.2 Fulfillment
Fulfillment
1.2
0.6
1 0.8 0.6
0.4
0.4
0.2
0.2
0
0 2
4
8
2
Number of Operator Targets (OTs)
0
0
Fulfillment
Fulfillment
1
-3
CCA ECCA WRA BADA BigBadaBoom 2
-1
CCA ECCA WRA BADA BigBadaBoom
-2 -3 4
8
2
Number of Operator Targets (OTs)
4
8
Number of Operator Targets (OTs)
(e) Fulfillment with Metric-3 and pos. correlated commands Figure 4.
8
(d) Fulfillment with Metric-2 and neg. correlated commands
1
-2
4
Number of Operator Targets (OTs)
(c) Fulfillment with Metric-2 and pos. correlated commands
-1
8
(b) Fulfillment with Metric-1 and neg. correlated commands
CCA ECCA WRA BADA BigBadaBoom
1.4
4
Number of Operator Targets (OTs)
(f) Fulfillment with Metric-3 and neg. correlated commands
Fulfillment of the different algorithms for different metrics, OTC datasets, and numbers of OTs per OTC Table I C OMPARISON OF AUTO - CONFIGURATION ALGORITHMS
Consideration of
Algorithm
CCA
Weight
Target level √
simple advanced √
√ √
√ √
Results AVG fulfillment of 100,000 random OTCs with 8 OTs each Positively correlated OTs
Negatively correlated OTs
Complexity class
AVG speed (re-config. per second on commodity PC)
Metric-1 Metric-2 Metric-3 Metric-1 Metric-2 Metric-3 √
√
WRA
BigBadaBoom
Scaled Fairness values
√
ECCA
BADA
Features
√
251.8%
68.3%
-107.8%
179.9%
42.4%
-205.9%
Θ(n1 × n2 )
≈ 20k
273.8%
67.5%
-116.4%
197.5%
41.9%
-207.9%
Θ(n1 × n2 )
≈ 29k
240.6%
51.6%
-180.4%
157.9%
10.1%
-336.4%
Θ(n2 × log n1 )
≈ 75k
235.4%
80.4%
45.0%
153.3%
63.9%
23.1%
Θ(n2 1 × n2 )
≈ 2.5k
243.5%
81.7%
43.2%
159.2%
65.3%
21.5%
Θ(n2 1 × n2 )
≈ 2.4k
A comparison of the degrees of fulfillment of operator targets that the algorithms can achieve under different circumstances. More concretely, the evaluation (which is based on IoTspecific system synergies) has shown that the algorithms which try not to ignore lower-priority targets (i.e., the so-called BADA algorithms) tend to fulfill the operator targets better in most of the scenarios. However, due to the subjectivity of fulfillment, other algorithms are suggested in special cases, for example, algorithms that focus more on overfulfilling the requirements of high-priority targets are suggested when the operator is not expected to set many targets at the same time. •
ACKNOWLEDGMENT The presented research work has been partly funded by the European Commission within the Seventh Framework Programme FP7 (FP7-ICT) as part of the OrPHEuS project under grant agreement No. 608930. R EFERENCES [1] G. Ding, L. Wang, and Q. Wu, “Big Data Analytics in Future Internet of Things,” The Computing Research Repository (CoRR), vol. abs/1311.4112, 2013.
[9] H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu, “Starfish: A Self-tuning System for Big Data Analytics,” in 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11), 2011, pp. 261–272. [10] P. Hurni and T. Braun, “MaxMAC: A Maximally TrafficAdaptive MAC Protocol for Wireless Sensor Networks,” in 7th European Conference on Wireless Sensor Networks. Springer, 2010, pp. 289–305. [11] H. Zhang, C. V. Hollot, D. Towsley, and V. Misra, “A SelfTuning Structure for Adaptation in TCP/AQM Networks,” in IEEE GLOBECOM ’03. IEEE, 2003, pp. 3641–3646. [12] A. Papageorgiou, M. Zahn, and E. Kovacs, “A Mechanism for Efficient Auto-configuration of Energy-related Parameters in IoT Systems,” in International Conference on Future Energy Systems (ACM e-Energy), 2014, under submission.
A PPENDIX The fulfillment F (OT C) (for all three examined metrics) is computed based on the knowledge basis according to the following steps (whereby |Ei | stands for the number of entries of OTi , DN Ce stands for the degree of non-conformity of an entry, baseEntry is the per-parameter optimal configuration value extracted from the knowledge base, vsEntry is the suggested value, F (OTi ) stands for the fulfillment of OTi , T L(OTi ) stands for the Target Level of OTi , SF (OTi ) stands for the scaled and averaged fulfillment of OTi , mSF (OTi ) stands for the metric-dependent scaled fulfillment, wi stands for the weight of OTi , Case1 is true when baseEntry and vsEntry have suggested but different entries, and Case2 is true when baseEntry has a suggestion but vsEntry doesn’t):
[2] C. Tsai, C. Lai, M. Chiang, and L. Yang, “Data Mining for Internet of Things: A Survey,” IEEE Communications Surveys Tutorials, vol. PP, no. 99, pp. 1–21, 2013. [3] Ericsson, “More than 50 Billion Connected Devices,” Ericsson White Paper, no. 284 23-3149, pp. 1–12, 2011. [4] D. Agrawal, P. Bernstein, E. Bertino, S. Davidson, U. Dayal, M. Franklin, J. Gehrke, L. Haas, A. Halevy, J. Han, H. V. Jagadish, A. Labrinidis, S. Madden, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, K. Ross, C. Shahabi, D. Suciu, S. Vaithyanathan, and J. Widom, “Challenges and Opportunities with Big Data,” 2012, a community white paper developed by leading researchers across the United States. [Online]. Available: http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf [5] A. Papageorgiou, M. Schmidt, J. Song, and N. Kami, “Smart M2M Data Filtering Using Domain-Specific Thresholds in Domain-Agnostic Platforms,” in IEEE International Congress on Big Data, June 2013, pp. 286–293. [6] H. Hu, J. Zhang, X. Zheng, Y. Yang, and P. Wu, “Selfconfiguration and Self-optimization for LTE Networks,” IEEE Communications Magazine, vol. 48, no. 2, pp. 94–100, 2010. [7] NEC Corporation, “Self-Organizing Networks: NEC’s Proposals For Next-Generation Radio Network Management,” February 2009. [Online]. Available: http://www.nec.com/global/solutions/nsp/mwc2009/images/ SON whitePaper V19 clean.pdf [8] T. Jansen, I. Balan, J. Turk, I. Moerman, and T. K¨urner, “Handover Parameter Optimization in LTE Self-Organizing Networks,” in IEEE Vehicular Technology Conference (VTC ’10). IEEE, 2010, pp. 1–5.
DN Ce =
1.0, if Case1 && baseEntry is qualitative |baseEntry.value() − vsEntry.value()|, if Case1 && baseEntry is quantitative 1.0 − 1.0 , Num of distinct param entries
if Case2 && baseEntry is qualitative 0.5, if Case2 && baseEntry is quantitative 0.0, otherwise X SN Ci = DN Ce all entries e of OTi
F (OTi ) = SF (OTi ) = ( mSF (OTi ) =
F (OT C) =
(|Ei | − SN Ci ) |Ei |
(F (OTi ) − 0.5) × 2 T L(OTi )
1, if SF (OTi ) > 1 && Metric-2 SF (OTi ), otherwise ( min(SF (OTi )), if Metric-3 P n i ×mSF (OTi )) i=1 (w P n i=1 wi
,
otherwise
Note that a scaled fulfillment with overfulfillment (Metric-1) implies the possibility of fulfillments higher than 100%. For example, this can happen if the operator aims for an OT with 50% Target Level and the suggested configuration exceeds this Target Level. Further, negative fulfillments are possible due to the average reference. Note about the confidence intervals of the experiments: Assuming a normal distribution and for a confidence level of 99%, the confidence intervals were in the range of 0-0.05, i.e., too small to affect our conclusions. For example, the confidence interval for CCA and Metric-1 were [2.50; 2.53], while for BADA and Metric-2, it was [0.803; 0.805].