Enhanced multi-version data broadcast schemes for time- constrained

3 downloads 0 Views 202KB Size Report
this paper, we study the problems of disseminating consistent data items in a TCMCS in which the transactions are called real-time mobile transactions ...
Enhanced multi-version data broadcast schemes for timeconstrained mobile computing systems1 Hei-Wing Leung, Joe Yuen, Kam-Yiu Lam and Edward Chan Department of Computer Science City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong {cskylam|csedchan}@cityu.edu.hk Abstract In this paper, we study the data dissemination problem in time-constrained mobile computing systems (TCMCS) in which maximizing data currency (minimizing staleness) and meeting transaction deadlines are of equal importance as providing consistent data items to transactions. We first investigate what are the performance problems of the multi-version data (MV) broadcast technique when it is used for TCMCS. Then, we propose various enhancements to resolve the problems. In order to further reduce the probability of missing deadlines, we introduce a prioritized on-demand broadcast scheme, which is integrated with the data broadcast mechanism for disseminating data items to time-constrained transactions. The performance characteristics of the proposed strategies have been studied through extensive simulation.

1. Introduction Many new mobile computing applications have real-time properties and requirements [3,4]. The values of the data items are highly dynamic and the transactions generated from mobile clients may associate with a deadline constraint on their completion times. We call this kind of systems time-constrained mobile computing systems (TCMCS) whose prime performance objectives are to minimize the probability of missing the deadlines of time-constrained transactions, and to provide consistent and valid data to transactions. In recent years, many efficient data dissemination methods have been proposed particularly for read-only mobile transactions [1,2,3,5,6,7,8]. Many of them are based on data broadcast or on-demand transmission. However, most of the previous studies are not for TCMCS and they aim to improve the response time instead of meeting the deadline requirements. The issue of how to provide consistent data items to mobile transactions in addition to minimizing data access delay has received growing interest in recent years [5,6,7]. An efficient and pioneering method is the multi-versions data broadcast (MV) [5,6] in which the broadcast server not only broadcasts the latest version of a data item but also the previous versions of the data item. However, MV may not be suitable for TCMCS.

1

Some of the requirements of MV may affect the design of broadcast schedule and may increase the deadline missing probability of time-constrained transactions. At the same time, MV is not able to maximize currency (and minimize the staleness) of the data items provided to a transaction. This is highly undesirable to many TCMCS. Reading out-dated data could seriously affect the usefulness of transaction results. In this paper, we study the problems of disseminating consistent data items in a TCMCS in which the transactions are called real-time mobile transactions (RTM-tran). The performance objectives are to: (1) minimize the deadline missing probability of RTM-tran; (2) provide consistent data items to RTM-tran; and (3) minimize data staleness (or maximize the currency) provided to RTM-trans.

2. The System Model A TCMCS consists of a database server, a broadcast server, a number of mobile clients and a mobile network. The database server maintains a database, which records real-time information, i.e., stock tickers, news updates, traffic conditions, and weather conditions. It is assumed that the database server receives updates from an external source to maintain the validity of the data items such that their values will be consistent with the actual status of the corresponding objects in the external environment. Each update is labeled with a timestamp to indicate when the value is taken. The broadcast server connects to the database server. It selects data items from the database to broadcast to the all the mobile clients through a mobile network using a chosen broadcast scheduling algorithm to define the broadcast schedule, which may be based on the popularity of the data items. Each RTM-tran consists of a set of read operations, and the read operations are unordered, i.e., they can be executed in any order. Each RTM-tran is associated with a deadline on its completion time. Meeting the deadline is an important performance requirement. In addition to satisfying the deadline requirement, the usefulness of a transaction is also affected by the staleness of its accessed data items. Informally, a data item is stale if it is an out-dated version, i.e., a newer version has been created.

The work described in this paper was partially supported by a grant from the Research Grants Council of Hong Kong SAR, China [Project No. CityU 1078/00E].

3. The Original Multi-Version Data Broadcast (MV) Method 3.1 Principles of MV In MV, the database maintains multiple versions for a data item. Updates on data items are batched together until the end of a broadcast cycle. Each newly created data version is assigned a version number, which indicates at which cycle-end it is generated. The server broadcasts previous versions of a data item together with the last committed version of the data item at the last broadcast cycle. It is assumed that each transaction has a maximum life-span and no transaction exists in the system longer than its life-span. The maximum life-span of the transactions together with the time required for completing a broadcast cycle determine the number of versions and the versions to be broadcast in a cycle for a data item. If a mobile transaction wants to access a data item, it will get the latest version for its first read operation from broadcast cycle. The subsequent read operations of the transaction will read data items with the same version number as the first one. By allowing a transaction to read an older version of a data item, data consistency can be ensured at the expense of currency as a mobile transaction is allowed to access old version of a data item. As shown in Figure 1, a transaction gets consistent data by reading the versions, which are valid at the same cycle.

Figure 1. Getting consistent data in MV MV can be applied for accessing cached data items. The clients may maintain the previous versions of data items at their caches and the same rule for accessing broadcast data is used for accessing cached items. The efficiency and characteristics of the multi-version methods as compared with other methods, such as the serialization graph broadcast has been examined in [5]. The multi-version method is very useful for systems where the mobile clients are frequently disconnected from mobile network.

3.2 Performance Problems of MV for TCMCS 3.2.1 Multiple Data Broadcast Overheads By reading data versions committed at the same cycle-end, data consistency provided to a transaction can be ensured. However, the drawback is that the overhead incurred in broadcasting multiple versions of the data items greatly increases the broadcast cycle length and the waiting time of a transaction for its required data items. This is a serious problem in TCMCS since the transactions have deadlines.

3.2.2 Performance tradeoffs: Consistency Vs. Currency Two important requirements of MV are:

(1) Each data item has to be broadcast at least once in each cycle; and (2) The latest version of a data item to be broadcast in a cycle is the last version before the start of the broadcast cycle. The first requirement makes the length of a broadcast cycle dependent on the database size and limits the design of a broadcast schedule. If the database size is very large, the waiting time for a data item will be very long. The second requirement could seriously affect the data currency provided to a transaction especially when the broadcast cycle length is long. A simple way to satisfy the first requirement listed in the above is to use a flat broadcast disk [1]. However, it will make the length of a broadcast cycle dependent on the number of data items in the database. Other than flat broadcast disk, it is also suggested that for many mobile computing system, where the access probabilities of the data items are not even, we may estimate the average access probabilities of the data items and then design a broadcast scheduling algorithm to broadcast “hot” data items more frequent than cold” data items. However, it is difficult to ensure that each data items will be broadcast at least once in a cycle when we use such a hot/cold broadcast scheduler. In order to do so we may need to define a very long broadcast cycle. The impact on system performance will be similar to using a long flat broadcast disk, i.e., a higher deadline missing probability and low data currency. Previous research work has also suggested the broadcast schedule to be based on the dynamic properties of data items. The broadcast system may consist of several broadcast disks, with each disk handling one type of data items. The data items of the same type have the similar update periods. The broadcast disk for the data type with a shorter update period will be given more bandwidth. If MV is applied to such a system, the length of a broadcast cycle is limited by the “speed” of the slowest broadcast disk.

4. Extensions of MV for TCMCS 4.1 Using a Smaller Broadcast Cycle An efficient way to improve the performance of MV for TCMCS is to relax the requirement that each data item has to be broadcast at least once in each cycle. We may use a smaller broadcast cycle in which a broadcast cycle may only contain a subset of data items in the database. Another benefit of using a shorter cycle is that the currency of the data may be higher since the delays in performing the updates are shorter.

4.1.1 Using Time-interval for Validation If a broadcast cycle contains only a sub-set of data items, it is possible that a data version may be valid for several cycles. One way for checking the validity period of the versions accessed by an RTM-tran is to use an up-to-cycle approach in which each data version is associated with two cycle numbers: lower cycle number and up-to-cycle number. The lower cycle number is the cycle number when the version is created. Initially, the up-to-cycle number is set to be the same as the lower cycle number. After each cycle, the up-to-cycle version number of the latest version will be increased by one. Therefore, the up-to-cycle number indicates at least up to which cycle the version is valid. However, for cached data items, the update of up-to-cycle numbers cannot be adjusted immediately. To resolve the problem, we may need to use a multiple version

validation report, which includes the current validity cycle numbers of all the data items which have been changed in the last report period. Under the up-to-cycle bound approach, if a transaction finds all its required data items with overlapping cycle numbers from the cache, it can commit immediately without validating with the invalidation report.

4.1.2 Multi-version validation report generation To ensure consistency of cached data items, a multi-version validation report is prepared and periodically broadcast by the broadcast server. An validation report contains update information for the data items occurred during a period called report period which is defined as the time interval from the (current time − report duration) to current time. Report duration is a tunable parameter. If update time is beyond the report duration, it will be too “old” to be useful. Each validation report is shifted by a period, called report period, from the last report time. The report duration is chosen to be greater than the report period. Using such a sliding window approach for generating the reports can help to resolve the problem of validation under disconnection. An RTM-tran can still validate its accessed data items if it has been disconnected from the network for a period of time not longer than the report duration.

4.2 Two Other Schemes for Reducing Broadcast Overhead and Improving Data Currency In the followings, we will present two alternatives for resolving the problems in MV for TCMCS by assuming that a broadcast cycle may only contain a subset of the data items. (1) Latest Version Only (LVO) Broadcasting multiple versions can greatly increase the broadcast overhead especially when the update rates of the data items are high. A simple way to improve data currency provided to RTM-trans is to broadcast the latest version of a data item only, i.e., the version committed at the last cycle end. It tradeoffs deadline missing probability with data freshness. The length of a broadcast cycle and the average waiting time of a transaction for its required data items can both be significantly reduced. (2) MV with Auto-refreshment (AR) Under data auto-refreshment, whenever a data item at the client cache or has been accessed is being broadcast, the newer version will be captured and stored in the client cache.

4.3

On-demand Broadcast for Minimizing Missing Deadlines

For many data broadcast systems, the data access patterns of transactions may not be uniform. Some of the data items are hot data and have higher probability to be needed by the mobile transactions. For this kind of data access pattern, it is better to use a hot/cold scheduler to broadcast the data items. However, if our objective is to minimize the probability of transactions missing their deadlines, using a hot/cold scheduler may actually increase the number of missed deadlines if some of their data items, even very small number, are cold data items. Let the length of a broadcast cycle be N data items. The first (N – m) data items in a broadcast cycle are determined by a broadcast scheduling algorithm and the selected data items are called on-schedule data items. After the broadcast of the on-

schedule data items, the system will examine the on-demand buffer, which contains the data requests received during the broadcast cycle. The data requests are sorted according to their deadlines, which are the deadlines of the originating transactions. The amount of broadcast bandwidth allocated for on-schedule broadcast and for on-demand broadcast is a system tuning parameter. Its value can be determined using a feedback control mechanism based on percentage of deadline missing due to the requests of cold data items. Once an RTM-tran is generated, its required data items will be examined. The client cache will be searched first. If any of its required data items can be found at the cache, the validity of the data items will be checked, i.e., using the up-to-cycle approach. If the validity interval of the latest version of a data item is within the currency requirement of the transaction, the data items will be accessed. An RTM-tran will generate ondemand data requests if it cannot find its remaining data items in the current broadcast cycle.

5. Performance Studies 5.1 Experimental Model A simulation model based on the model introduced in Section 2 is implemented. To simply the parameter set, it is assumed that all the items in the database have the same size. We simulate two classes of data items with different mean update periods. The data items are classified into two types based on their mean update periods. A mobile client generates an RTM-tran after completing the previous one and after a think time whose value is exponentially distributed. The deadline of an RTM-tran is its arrival time plus an offset. An RTM-tran will be aborted after its deadline. The data access pattern of the operations in an RTM-tran follows the Zipf distribution. When an RTM-tran has completed all its operations, it will commit. Table 1 lists the parameters and their baseline values. Parameter Baseline value Database size 10000 data items Broadcast rate 20 data items / sec Cache size 100 data items Cache replacement scheme LRU Zipf distribution Data access distribution (for both mobile and update transactions) Degree of skewness in accessing 1.0 data Number of operations in a mobile 1 to 4 transaction Number of write operations in an 1 to 2 update transaction Invalidation report generation 50 sec period Report duration 1000 sec Life-span of a mobile transaction 200 sec Mean think time of a mobile client 10 sec Mean inter-arrival time of update transactions 10 sec Class 1: (7500 items) 100 sec Class 2: (2500 items)

Table 1. Model parameters and baseline values

The primary performance measures are miss rate, stale access rate and broadcast overhead. Miss rate is defined as the number of RTM-tran, which miss their deadlines (aborted) divided by the total number of RTM-tran generated. Stale access rate is defined as the number of accesses on stale data divided by the total number of data accesses. Broadcast overhead captures the amount of broadcast bandwidth consumed by the protocols. In MV, the broadcast overhead is the percentage of broadcast bandwidth used for broadcasting all versions of data items except the newest version.

5.2 Results and Discussions 5.2.1 Performance Problems of MV Figure 2 to Figure 4 show the performance of MV when the database size is varied. Consistent with our expectation, the miss rate increases with database size since the waiting times for data items are longer when the database size is larger. Another serious problem of MV is that its stale access rate is very high even when the database size is small as shown in Figure 3. This implies most of the data items observed by the transactions are quite “old”. An important factor for the poor performance of MV is the high broadcast overhead, which is particularly serious when the database size is small and the deadlines (life-span) of the transactions are large as shown in Figure 4.

5.2.2 Performance Studies on Various Methods to Improve MV We first investigate the performance of adding LVO and AR into MV (MV+LVO+AR). The performance of the integrated method is labeled as MV+LVO+AR in the figures. As shown in Figure 5, the miss rates of the curves is U-shaped. The miss rate of MV is significantly higher than that of MV+LVO+AR for different database sizes. When the broadcast cycle length is small, i.e., smaller than 2000 data items, an increase in broadcast length decreases the miss rate for both schemes. However, if the broadcast cycle length is long, the miss rate increases with an increase in database size due to long waiting time for data items. The better performance of MV+LVO+AR is due to lower rebroadcast overhead. As shown in Figure 6, the broadcast overhead of MV+LVO+AR remains close to zero for different broadcast cycle lengths since it only includes the latest version of a data item in each broadcast. As shown in Figure 7, MV+LVO+AR also provides a higher data currency to the transactions. The stale rate MV+LVO+AR is much better than MV. Since MV+LVO+AR adopts the autorefresh and the single version broadcasting schemes, the version observed by a transaction is always close to the latest version. Figures 8 and 9 show the results of adding the on-demand broadcast scheme into MV+LVO+AR. The on-demand requests will be served at the end of a broadcast cycle. In our experiments, a virtual client (VC) is defined which submits requests to server to simulate the impact on the performance of other mobile clients. When there are more requests generated by the VC, the total number of requests in server will be high and the performance of the normal clients could be seriously affected. Figure 10 depicts the miss rate when we vary the size of a broadcast length. It shows that introducing the on-demand broadcast scheme in MV+LVO+AR can give a better

performance than MV+LVO+AR using broadcast only. The improvement is particularly significant when more bandwidth is allocated for serving the on-demand requests. One interesting point is that the impact of the virtual client on the performance is less significant when the broadcast length is large. Also, as shown in Figure 9, in the on-demand scheme, the stale access rate is higher than the pure MV+LVO+AR. The reason is that the database server serves on-demand requests at the end of a broadcast cycle. At that time, the data items have a higher probability to be stale. However, it is interesting to see that the improvement is less affected by the percentage of bandwidth allocation for on-demand data transmission and the workload from the VC.

6 Conclusions In this paper, we study the problem in providing consistent data to time-constrained transactions in mobile computing systems. We choose multi-version data broadcast (MV) as the basis and discuss what are the performance problems when MV is using for such kind of time-constrained mobile computing systems (TCMCS). Based on the problems, we propose several strategies to extend MV for TCMCS with the objectives to meet the deadlines and maximize the freshness of the data items provide to the real-time transactions from mobile clients. Extensive simulation experiments have been performed to investigate the performance characteristics and tradeoffs of the proposed extensions.

References [1] Acharya, S., Franklin, M., Zdonik, S., “Balancing Push and Pull for Data Broadcast”, in Proceedings of ACM SIGMOD, Tucson, Arizona, May 1997. [2] Datta, A., Celik, A., Kim, J. and VanderMeer, D.E., “Adaptive Broadcast Protocol to Support Power Conservant Retrieval by Mobile Users”, in Proceedings of International Conference on Data Engineering,1997. [3] Fernandez, J., Ramamritham, K., “Adaptive Dissemination of Data in Real-Time Asymmetric Communication Environments”, in Proceedings of Euromicro Conference on Real-Time Systems, June 1998. [4] Lam, K.Y., Chan, Edward and Au, Mei-Wai, “Broadcast of Consistent Data to Read-Only Transactions from Mobile Clients”, in Proceedings of 2nd IEEE Workshop on Mobile Computing Systems and Applications, New Orleans, Feb. 1999. [5] Pitoura, E. and Chrysanthis, P.K., “Scalable Processing of Read-Only Transactions in Broadcast Push”, in Proceedings of International Conference on Distributed Systems, May 1999. [6] Pitoura, E. and Chrysanthis, P.K., “Exploiting Versions for Handling Updates in Broadcast Disks”, in Proceedings of Very Large Data Base Conference, Sept. 1999 [7] Pitoura, E. “Supporting Read-Only Transactions in Wireless Broadcasting”, in Proceedings of the DEXA’98 Workshop on Mobility in Databases and Distributed Systems, August 1998. [8] Shanmugasundaram, J., Nithrakashyap, A., Sivasankaran, R. and Ramamritham, K., “Efficient Concurrency Control for Broadcast Environments”, in Proceedings of ACM International Conference on Management of Data, Philadelphia, 1999.

100%

100%

deadline=200 deadline=400

80%

deadline=600

Stale access rate %

Miss rate (%)

80%

60%

40%

60% deadline=200 40%

deadline=400 deadline=600

20%

20%

0%

0% 0

2000

4000

6000

8000

0

10000

2000

8000

10000

MV,db=5000 MV,db=7500 MV,db=10000 MV+LVO+AR,db=5000 MV+LVO+AR,db=7500 MV+LVO+AR,db=10000

100%

80%

80%

60%

40%

deadline=200

Miss rate (%)

Broadcast overhead(%)

100%

deadline=400

20%

60% 40% 20%

deadline=600 0%

0%

0

0

2000

4000 6000 Database Size

8000

80%

20%

Broadcast overhead(%)

80%

MV,db=5000 MV,db=7500 MV,db=10000 MV+LVO+AR,db=5000 MV+LVO+AR,db=7500 MV+LVO+AR,db=10000

0% 0

2000

4000

6000

8000

MV,db=5000 MV,db=7500 MV,db=10000 MV+LVO+AR,db=5000 MV+LVO+AR,db=7500 MV+LVO+AR,db=10000

40%

20%

0

2000

4000 6000 Broadcast length

8000

10000

Figure 7. Broadcast overhead at different broadcast lengths 100%

80% Stale access rate %

80%

10000

0%

10000

No Ondemand Broadcast 10% Ondemand Broadcast (VC generate 5 requests per sec) 30% Ondemand Broadcast (VC generate 5 requests per sec) 10% Ondemand Broadcast (VC generate 10 requests per sec) 30% Ondemand Broadcast (VC generate 10 requests per sec)

100%

8000

60%

Broadcast length

Figure 6. Stale access rate at different broadcast lengths

6000

Figure 5. Miss Rate at different broadcast lengths 100%

60%

4000

Broadcast length

100%

40%

2000

10000

Figure 4. Broadcast overhead at different database sizes

Stale access rate %

6000

Figure 3. Stale access rate at different database sizes

Figure 2. Miss rate at different database sizes

Miss rate (%)

4000

Database size

Database size

60%

40%

60%

40%

No Ondemand Broadcast 10% Ondemand Broadcast (VC generate 5 requests per sec) 30% Ondemand Broadcast (VC generate 5 requests per sec) 10% Ondemand Broadcast (VC generate 10 requests per sec) 30% Ondemand Broadcast (VC generate 10 requests per sec)

20%

20%

0%

0% 0

2000

4000 6000 Broadcast length

8000

10000

Figure 8. Miss rate at different on-demand broadcast setting

0

2000

4000 6000 Broadcast length

8000

10000

Figure 9. Stale access rate at different on-demand broadcast setting

Suggest Documents