A Cloud-based Recommendation Model - ACM Digital Library

20 downloads 11898 Views 188KB Size Report
A Cloud-based Recommendation Model. Ricardo B. Rodrigues. Informatics Center - Federal. University of Pernambuco. Recife, Pernambuco, Brazil.
A Cloud-based Recommendation Model Ricardo B. Rodrigues

Frederico A. Durão

Vinicius C. Garcia

Informatics Center - Federal University of Pernambuco Recife, Pernambuco, Brazil

[email protected] Carlo M. R. Silva

Mathematics of Institute Federal University of Bahia Salvador, Bahia, Brazil

[email protected] Rafael R. Souza

Informatics Center - Federal University of Pernambuco Recife, Pernambuco, Brazil

Informatics Center - Federal University of Pernambuco Recife, Pernambuco, Brazil

Informatics Center - Federal University of Pernambuco Recife, Pernambuco, Brazil

Usto.re Recife, Pernambuco, Brazil

[email protected]

[email protected]

ABSTRACT The recommendation systems aim to minimize information overload by helping user’s in searching desired information. Faced with this scenario, we investigate the use of cloud factors able to have a positive influence on generating recommendations. Thus, we present a new, simple model based on cloud features which is associated with the content-based technique of recommendation. The practical applicability of data storage environments in the cloud provides the best use of cloud resources and meets user’s preferences.

Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Information filtering.

Keywords Recommendation System, Cloud-based, Cloud Storage.

1.

INTRODUCTION

With the advent of cloud computing, cloud storage systems have emerged that enable their users to store files in the cloud. With the increasing use of these systems the mass of data stored in cloud became impossible to be processed humanistic implicated in the concealment of relevant information to users who fail to discover new content because they have no effective means to assist in the filtering data in search of relevant knowledge and meets their expectations. In this scenario, recommendation systems become an alternative to assist users in making decision to choose which file and filter relevant information among a multitude of data. Recommender systems (RS) are software programs and techniques which provide suggestions of items to users [5]. These systems are part of our lives, we are faced with daily

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. EATIS’14, April 02 - 04 2014, Valparaiso, Chile. Copyright 2014 ACM 978-1-4503-2435-9/14/04 ...$15.00. http://dx.doi.org/10.1145/2590651.2590673

[email protected] Rodrigo E. Assad [email protected]

recommendations by email or web pages. Many stores and platforms provide recommendation services, Amazon and BarnesAndNoble. There are two predominant approaches in building RS - Collaborative Filtering (CF) and ContentBased Filtering (CB). CF systems recommend items that are similar to the characteristics of the user, for example, his profile on a social network. CB Systems recommend the user to items similar to those in which he expressed interest in previous experiments. Therefore, the system analyzes descriptions of content of the items evaluated by the user to build his profile, which is used to filter the remaining items of the base [2] [6]. The recommendation systems aim to reduce information overload, performing filtering of items based on user interests. Of several existing techniques for performing this task, the approach used in this article is the Content-Based Filtering (CB), which is based on files that the user has shown interest in the past [6]. This paper presents a recommendation model for a storage environment data in cloud. The generation of recommendations is used the technique CB and factors of the cloud. The purpose of the recommendation model proposed here, is to recommend the user files that are similar to his preference and meet the factors of the cloud thus recommended a file to the user, is always available and accessible in the cloud storage, plus proportional reduction in the time spent in downloading a file recommended and filtering relevant content through the vast amount of data available in cloud. This paper is organized as follows: section 2 presents related works; section 3 presents the proposed model; section 4 presents results; and section 5 presents the conclusions and future work of this study.

2.

RELATED WORKS

Few studies in the literature discuss issues concerning recommendation systems on the cloud. In this section, we introduce some of them, emphasizing the similarities and differences in relation to the model proposed in this research recommendation. Lee et al. (2010) [4] proposes a RS that uses data stored into the cloud to provide recommendations of files stored in the cloud, a solution that distinguishes the scope of this work, which aims use factors of the cloud to ensure availability of the recommended files to users. Lai et al. (2011) [3] present a work that comes closest to the

purpose of this research. They propose an RS of TV shows on the cloud with the aim of offering a scalable system that has a high rate of availability for the system. The model proposed in this research uses cloud factors to generate recommendations, which guarantees the availability of the recommended file and saves time spent on downloading recommended files and makes sure the recommenˇ preferences. dations attend to the userSs

that a peer that stores a file to be recommended is available on the network. A file should only be recommended to the user if the peer that stores the files is online, making the file available by allowing the user to download it. The calculation of availability factor is shown in this Equation 2:

3.

In calculating the availability factor, “h” is the amount of time that a time machine that stores files is available on the network; “n” is the number of hours for which a machine may become available on the network if the clouds stay online all day; and “n” will be equal to 24 (hours). The number of hours of availability is normalized to a value between 0 and 1. The following example shows how the availability factor contributes to generate a recommendation. Consider that two files A and B are similar; “file A” is stored in peer 1, which is available on the network between 14 and 16 hours, totaling two hours of availability; “file B” is stored on another peer that is available on the network from 14 to 18 hours, totaling 4 hours of availability. In this way, the file that will be better ranked as the file will be available on the network for more time than “file A”, allowing the download to occur in a wider window of time. The chief objective is to reduce the risk of the user being unable to perform the download, such that a recommended file is always accessible to the user. Download Rate Factor: Download Rate Factor refers to the available rate to perform the download of a file on the cloud. The goal is to have files that reduce the time spent downloading which are better than other unranked files. The contribution of this factor in reducing the time spent downloading a recommended file is produced in conjunction with the factor “File Size” [Explained next point]; for example, if we have two files similar to user preferences where “file A” has a size equal to 10 Gigabytes and “file B” has a size equal to 2 Gigabytes, and the download rate is the same for both files, “file A” will be better ranked than”file B” by providing the best economy in the time spent in downloading. The download rate can modify the rank of recommendations depending on the time the recommendation is calculated, especially in environments where the download rate is oscillating. This factor has a value from 0 to 12 Megabits per second (Mbps), which represents the overall average rate of downloads. This factor is calculated by the following Equation 3:

THE RECOMMENDATION MODEL

The modeled recommendation process in this paper is formed by the technique of recommendation combined with the characteristics of the cloud. The proposed recommendation model is composed of five factors which come from the cloud, and involve metadata files being stored in the cloud. The proposed factors were defined based on the observation of storage cloud environments and user priorities in these environments. The factors are: • Similarity • Availability • Rate Download • File Size • File Relevance Below we detail each factor and their respective calculation: Similarity Factor: This factor meets the requirement for user preferences, as it calculates the similarity between the contents of a file in which the user has demonstrated preferences. Also, public files stored in the cloud are candidates to be recommended. The result of the similarity between the contents is obtained by the technique of cosine similarity, which returns a value between 0 (zero) and 1 (one) [1]. The calculation of cosine similarity is given by the Equation 1: St = cos(θ) =

A·B kAk·kBk

(1)

The similarity between two vectors A and B is calculated, where we obtain the product of A and B and calculate the magnitude of the vectors A and B. Such magnitudes are multiplied and divided by the scalar product of the vectors A and B. The files that are similar to the file that represents the user’s preferences will be ranked according to their degree of similarity. The greater the score of similarity, the best ranked file will be in reference to one similar to it. For example, if the RS contains two files “A” and “B” similar to user preferences and has a similarity score equal to “file A = 0.8” and “file B = 0.5”, in this scenario “file A” will be ranked better than “file B” in similarity factor. The similarity between the files to be recommended and a file in which the user has demonstrating interest is essential in this recommendation model, which aims to meet user preferences in relation to filtering relevant content among a large mass of data. Availability Factor: The availability factor refers to the time that a file will be available to the user. The availability of this model is measured in hours, and the number of hours

Dp = h ·

T d = ns ·

  1 n

  1 n

(2)

(3)

The factor is represented by T d, or “Download Rate Factor”, which represents the rate in Mbps download, then this value is normalized to a value between 0 and 1, which represents the value of the global average rate in Mbps downloads. File Size Factor: This factor corresponds to the size of the file candidate to be recommended and aims to contribute to the task of alleviating the time spent downloading a recommended file. As explained above in “Download Rate Factor”, the File Size Factor is directly related to the factor that measures the download rate available. The rank recommendation changes according to the rate available for download; if the download rate is low, smaller files should be better than their larger ranked counterparts. Likewise, when the download rate is high, larger files should

be better ranked. Here’s an example of the ranking of this factor: “file A” is similar to “file B”; “file A” has a size equal to 9 gigabytes. “File B” has a size of 2 gigabytes. Thus, the file will be ranked better by offering better conditions for the realization of the download (smaller size) whereas the download rate is low. The calculation of this factor is performed by the Equation 4: S=T ·

  1 n

(4)

The “File Size Factor” is represented by S, which is the file size to be recommended. File size is measured in gigabytes (GB) due to the fact that most cloud storage systems limit the maximum sizes of files that can be saved on the cloud and the space available to the user in the system in gigabytes. The file size is multiplied by n1 , which is normalized by a value from 0 to 1, and the value 1 is divided by n, which is the maximum size of a file that can be stored on the storage system. File Relevancy Factor: This factor is the social importance of a file in the cloud, determined by the amount of downloads made of that file. The higher the number of downloads of a file, the greater the popularity of the file on the network, resulting in an improved position of this file. Here’s an example of the ranking of this factor: “file A” is similar to the “file B”; “file A” already had 10 downloads, “file B” already had 16 downloads. Thus, “file B” will be ranked better than “file A” for having a greater number of downloads in the system. The calculation of this factor is represented by the Equation 5: R = Qd ·

  1 n

(5)

The file relevancy is represented by R. For every download done for a particular file, the downloads of that file are counted in increments of 1. This value is measured from 0 to n, where n is the largest amount of downloads performed in a single file on the network. The value of n is obtained from the observation of the file download history on the system. The calculation of the factor Qd, the number of downloads of a file, is normalized by dividing by n1 , so the resulting value of this factor is between 0 and 1.

3.1

Factor Weight

In a recommendation engine the factors must be balanced by weights, to compose the score recommendation, resulting in a ranking of the items that should be recommended to the user. In the model that we propose in this study, in weights for each factor were defined based on relevancy factor in building the objective of the proposed model. In Table of Weights, we present the proposed weights for each factor. Table 1: Factors Weights Factors Weights Similarity 4 Availability 2 Rate Download 2 File Size 1 File Relevance 1 In Table the weights of each factor of the model are pre-

sented in RecCloud. Below we detail how the weights were determined for each factor. • The Similarity factor has a weight of 4 and represents 40% of the score of the recommendation to ensure that the contents of a file recommended to the user is similar their preferences. Another motivating factor for the point corresponding to a 40% similarity score of the recommendation is the purpose of alleviating or solving a major problem of the technique of contentbased recommendation, cited by [8] [7] [6], which is the suggestion of items that are always very similar, limiting users in discovering new content, and thus our recommendation model has answered user preferences and at the same time will be recommending new content that is related to the contents of user preferences. A file that has a similarity equal to 0 compared to user preferences should not be recommended. • The Availability factor has a weight of 2, which represents 20% of the score for a recommendation. It represents the time when a database server is available on the network, allowing the download of a file that is recommended. This factor is extremely important for recommendations based on cloud fea-tures, and represents one of the main features and advantages of using cloud storage system files. A file can only be recommended to the user if it is stored on an available server. • The Download Rate factor has a weight of 2, which represents 20% of a recommendation. A file that has a low download rate with a size larger than other similar files, the recommendation score will be lower and therefore will not be as good as similar ones, ranked successively by downloads that require more time and processing. A file with a low download rate may appear in the top rankings of a recommendation, since its size is proportional to the low rate of download. For a file in the cloud to become recommended, this factor must be >0, thus it will be possible to download the file. • File Size factor is assigned a weight of 1, representing 10% of the score of the recommendation. Less critical, this factor has less weight than the other factors. Thus a file that has a size equal to the maximum accepted by the environment may be recommended if your download rate is proportional, ensuring good performance in the download file. • Relevancy File factor is assigned a weight of 1, representing 10% of the score of recommendation. Less critical, this factor has less weight than the other factors. Thus, a file that is not popular in the cloud can be recommended to the user, just like a new network file if it is well ranked among the other factors in the model.

3.2

Calculations of Recommendations

In this section we present the calculation of recommendation, which consists of weighting factors with their weights. The calculation of the recommendations is represented by the formula in Figure 1: The calculation presented in the recommendation score is equal to the result of the weighing of the factors by their

recommendation.

5. Figure 1: Recommendation Calculation. respective weights. The RS multiplies the factor by factor Availability Dp for the Download Rate T d, the product of this multiplication is subtracted by file size, and the result is multiplied by the result of the sum of the similarity factor S, with the Relevancy factor R. After this process the result is normalized between 0 (zero) and 1 (one). Thus, the score is always a value between 0 (zero) and 1 (one). The similarity factor is added to the relevancy factor with the objective of recommending similar content to user preference files and files that are most relevant in the cloud, from unranked to top-ranked recommendations. The file size factor penalizes the availability and download rate factors, aiming to provide the user with better conditions for the realization of the recommended download file, lessening the time spent downloading and recommending files that have higher rates of availability in the cloud. A file will only be recommended to the user if their recommendation score is greater than 0 (zero). Files with a score equal to or less than 0 (zero) are not recommended to the user who requested the recommendations.

4.

RESULTS

The experiment conducted in this paper was performed in a real environment data storage in the cloud. The experiment presented below, provides partial results of this research, generated by simulation users using the system. The main goal of this experiment is analyse the relevance of the recommended file in relation with the preference elicited by the user. The experiment conducted in this study evaluated the recommendations made by the system. In this experiment we used a database containing 100 scientific articles in the public domain, from this cloud-based, recommendations were requested to distinct content files. In total 50 recommendations were evaluated, which were assessed as Like or Dislike. In the event that a recommendation does not meet the preferences and expectations of the evaluator should receive the same evaluation Dislike or Like in the case of recommendation suit the preferences of the evaluator. Figure 2 shows the results of the evaluations.

CONCLUSION AND FUTURE WORKS

This paper investigates the impact of factors derived from the cloud on generating recommendations into a cloud storage environment. The mathematical model was presented and proposed in this research, as well as the factors that form the proposed model ”Cloud-based”. The development of the system and initial experiments were deployed and executed in a real environment data storage in the cloud. As future work, it is deemed important to redo and improve the experiments presented in this article, using real users of the cloud environment, as well as conduct new experiments in order to compare the results obtained in this model with other avaiables models in the literature. Particularly, we intend to propose new cloud-based factors that may contribute to the improvement of the proposed model.

6.

ACKNOWLEDGMENTS

This work was supported [in part] by the National Institute of Science and Technology for Software Engineering (INES1 ), funded for Facepe and CNPq, process 573964/20084 and APQ-1037-1.03/08.

7.

REFERENCES

[1] R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999. [2] Y. Blanco-Fern´ andez, J. J. P. Arias, A. Gil-Solla, M. R. Cabrer, and M. L. Nores. Providing entertainment by content-based filtering and semantic reasoning in intelligent recommender systems. IEEE Trans. Consumer Electronics, 54(2):727–735, 2008. [3] C.-F. Lai, J.-H. Chang, C.-C. Hu, Y.-M. Huang, and H.-C. Chao. Cprs: A cloud-based program recommendation system for digital tv platforms. Future Gener. Comput. Syst., 27(6):823–835, June 2011. [4] S. Lee, D. Lee, and S. Lee. Personalized dtv program recommendation system under a cloud computing environment. IEEE Trans. on Consum. Electron., 56(2):1034–1042, May 2010. [5] M. J. Pazzani and D. Billsus. Learning and revising user profiles: The identification of interesting web sites. Machine Learning, 27(3):313–331, 1997. [6] F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011. [7] N. Stormer, H.; Werro and D. Risch. Recommending products with a fuzzy classification. CollECTeR Europe, 2006. [8] D. M. Vieira. Sobre a interdependˆencia da recomenda¸ca ˜o de conte´ udo e do desempenho da rede. Master’s thesis, Universidade Federal do Rio de Janeiro, UFRJ., 2013.

Figure 2: Evaluation. From the analysis of the values shown in Figure 2, we infer that 85% of the recommendations received positive reviews, which is that most of the recommendations generated attained to the expectations of the evaluator. This way, validates recommendations generated and proposed model of

1

http://www.ines.org.br