Document not found! Please try again

Protocol for E-Commerce Data Harvesting

3 downloads 1567 Views 2MB Size Report
Another e-commerce company that provides API to allow product data retrieval by the affiliate users is BestBuy. However, they differ the method for calling their ...
Protocol for E-Commerce Data Harvesting Dani Gunawan Department of Information Technology University of Sumatera Utara (USU) Medan, Indonesia [email protected] Abstract—Many retailers operate their business through ecommerce. Some of them provide Application Program Interface (API) to their affiliate users in efforts to selling their products. As most affiliate users are members to several retailers this will lead into two problems. Firstly, an affiliate user should implement different codes suitable to the requirements of each data provider. Secondly, the response from each data provider is different. As a consequence, affiliate users need to understand data structure of each data provider. Proposed solution to these problems is by utilizing a new protocol called E-Commerce Data Harvesting (ECDH) for the purpose of e-commerce data harvesting. Using the same protocol, affiliate users can utilize the same Uniform Resource Identifier (URI) to each data provider. In addition, we suggest GoodRelations ontology to support the data interoperability. This can address more industrial segments by combining GoodRelations with other ontologies. Combining both, protocol and ontology, we can establish an e-commerce data harvesting to provide smooth data interoperability. Keywords—protocol; e-commerce; goodrelations; harvesting; data interoperability; ontology;

I.

INTRODUCTION

Life’s better when we’re connected. Just like the slogan of one of the biggest bank in North America, this will be our future. For the next few years, everything will be connected, as we are closer to the scenario of “smart living”. For instance, if one considers watching a live concert in any city, he or she will be able to obtain information related to his/her travel arrangements such as nearby accommodations, local restaurants and access to the public transportations. Furthermore, one will be able to purchase all the travel needs through different merchants online. As each data provider uses different procurement systems and protocols, the suppliers are facing difficulties in supporting a large number of protocols. In order to interoperate with various procurement systems and private marketplaces, communication between different systems have to be established by utilizing the same protocols. Then semantic interoperability among them can be formulated by applying the concept of ontology [1]. The Internet of Things (IoT), a recent paradigm about the future, defines that every object in our life will be equipped with microcontroller and interacts to each other using suitable protocol stack via Internet [2]. However, barriers to implement that concept lie on the technical issues such as data interoperability, communication model, and suitable devices.

[3]. As different hardware or software vendors mostly use their own data format, this will cause a problem in the data exchange. Furthermore, the challenge is on how to deal with non-interoperability heterogeneous technologies used in the city and urban development [4]. One legendary example of IoT is a fridge that can order the shopping list using its own recommendation system based on the availability of food or beverages inside it. In the past few years, this type of fridge is still in our dream. Now, one of the biggest house appliance vendors takes a step ahead to realize it. The company has demonstrated one smart refrigerator that can inform the owner that he or she has just run out the supply by sending a text message to the owner [5]. Then the next questions will follow: How the fridge communicates with each retailer? Does the way it communicates with one retailer will be different with the others? These questions will require solutions in order to yield a smooth data interoperability between all parties coined in this paper as E-Commerce Data Harvesting (ECDH). II. E-COMMERCE PRODUCT API Since the end of 2014, there are 3 billion people use the Internet, grow 40.4 per cent per year [6]. As for e-commerce sales worldwide, it had reached nearly $1.5 trillion in the same year. It has increased significantly nearly 20% over sales in 2013 [7]. Therefore, Internet and online sales are believed to play important roles in most of the businesses [8]. As ecommerce grows, many opportunities emerge especially in the online marketing. Nowadays people can act as affiliate users even though they do not have products and services. Affiliate user is a member of an e-commerce companies that help selling and promoting its product online to earn commission. They can advertise products or services offered by a retailer as well as products from data provider to the consumers. As a result the affiliate users will get commission after the consumers finalize their transactions through the affiliate users’ advertisement [9]. There are various ways to obtain data of one particular product by the affiliate users. For example, Amazon, an online and main retailer that started its business as an online bookstore, offers widgets to integrate its business to the affiliate users. The affiliate users only need to embed the widget in to their website. The widget usually coded in HTML or JavaScript. Some widgets have more sophisticated features,

such as product search, carousel, mp3 clips, slideshow and so forth. However, utilization of a widget has a drawback. It is rigid for customization. It also has limitation as it can only be applied for web-based application. As widget is coded in HTML or JavaScript, it cannot be embedded directly to any mobile application. Solution to this problem is having a suitable Application Program Interface (API) developed by the respected data provider. API is an intermediary tool to provide product properties such as product name, SKU, color, or price to the affiliate users. The affiliate users can customize the provided data to suit their needs. Ebay as one of the largest online marketplaces is a good example of another data provider. Ebay provides API to allow affiliate users to retrieve products data that are available on its website. To use its API, Ebay has a set of rules to follow. For example, if one requires to lists available “The Little Mermaid” DVD, he or she needs to call such API below: http://open.api.ebay.com/shopping?callname =FindProducts&responseencoding=XML&appid=Y ourAppId&siteid=0&QueryKeywords=the%20litt le%20mermaid%20dvd&version=713 The API calling above calls function named FindProducts, which is used to find products by utilizing query string callname=FindProducts. EBay will find the requested product from the query string QueryKeywords. Based on the URI above, after finding the requested product, EBay will wrap response with XML by utilizing query string responseencoding=XML. Some data providers may support various formats such as JSON and Name Value pair. EBay will recognize the affiliate user who uses the API by appid query string. Another e-commerce company that provides API to allow product data retrieval by the affiliate users is BestBuy. However, they differ the method for calling their API. Below is the example on how to find a specific product using BestBuy API: http://api.remix.bestbuy.com/v1/products(( search=touchscreen&search=asus)&salePrice< 500&categoryPath.id=pcmcat209000050006)?sh ow=name,sku,salePrice&format=xml&apiKey=Yo urAPIKey Responses of the API calling for both Ebay API and BestBuy API are shown in Fig. 1 and Fig. 2 respectively. As shown in Fig. 1, EBay API response wraps product details in item tag. EBay API provides the product name and the price from title tag and currentPrice respectively. It clears that the product name is “The Little Mermaid” DVD and the price is USD 15.50. EBay API provides the currency information in currentPrice attribute, currencyId. On the other hand, BestBuy API response wraps product details in product tag as shown in Fig. 2. It has child’s tag name, sku and salePrice. From the response, as shown in Fig. 2, the affiliate users will know that the product name is Asus – MeMO Pad 7 – 8GB – Black and the price is 89.99. However,

Fig. 1. EBay API Response

the affiliate users do not know the currency, as BestBuy API does not provide it. Therefore, one can see that EBay API uses different vocabulary to the BestBuy API to define their products’ names and prices. Although API can simplify business integration between ecommerce companies and affiliate users, it has some drawbacks when it comes to the implementation. This is shown in Fig. 3. As each e-commerce company has different sets of APIs, the affiliate users application need to call each of them by different protocol. It is time consuming to understand on how to use each API. Furthermore, each e-commerce has its own set of rules to use the API. Therefore, URIs to call API may vary, depends on the API provider. Currently, there is no available standard to develop API for exposing each ecommerce data. In addition, the response could also be different for each API provider.

Fig. 2. BestBuy API Response

Another problem has also risen, as the affiliate users need to understand the data structure implemented by each API provider. Based on the examples in Fig. 1 and Fig. 2, EBay API uses title tag to define product name, meanwhile BestBuy API uses name. For product’s prices, EBay API uses currentPrice with currencyId attribute as currency information, while BestBuy API only provides salePrice without currency information. As the consequences, affiliate users should implement different code to parse provided information. They should do it every time they need to add a new data provider. It is not an effective practice, as the affiliate users require developing different approach for different API providers. Further challenge for the affiliate users is the requirement to update their code whenever these data providers update their API. They need to allocate more time to update their application in order to adapt changes made by data providers. Based on the stated facts, API calling and data structure are the most common problems that occur in the middleware. III. E-COMMERCE DATA HARVESTING PROTOCOL As middleware plays important roles in simplifying new services [2], one may propose a solution to uniform the API calling and its response. This has been realized previously in the digital repositories data interoperability. Using the same protocol one can yield data interoperability among various digital repositories. The protocol called Open Archive Initiative – Protocol for Metadata Harvesting (OAI-PMH) can be implemented for each digital repository [10] [11]. By default, OAI-PMH uses unqualified Dublin core ontology to provide semantic interoperability [12]. Others metadata standards that can be used are Machine Readable Cataloging (MARC), Institute of Electrical and Electronics Engineering Learning Object Model (IEEE LOM) and so forth. There are some digital repository softwares that widely are available widely to support OAI-PMH such as DSpace [13] and EPrints [14]. In addition, National Library of Australia (NLA) Digital Object Repository and CiteSeerX are the other decent digital repositories which compliant to OAI-PMH [15] [16].

Fig. 3. API Calling Using Different Protocols

The OAI-PMH supports three request protocols to help the harvester, a server that collects metadata, to understand OAIPMH repositories. Those are Identify, ListMetadataFormats and ListSets. Identify is used to obtain digital repository information such as repository name, URI and anything related to administration. ListMetadataFormats value defines metadata format available in a repository. ListSets defines structure set from a repository. In addition to the previous request protocols in understanding the OAI-PMH repositories, there are several more request protocols to harvest metadata. These are ListRecords, GetRecord, and ListIdentifiers. ListRecords is used to harvest records from a repository. It supports options to allow data harvesting by any set (collection of items) and/or datestamp. GetRecord is used to harvest certain record from a repository. It requires identifier and metadata format argument to check that the item is from the requested record and to define format used by digital repository respectively. ListIdentifiers is used to collect information about identifier, datestamp and set. Based on the existing solution for data interoperability used in digital repositories, we suggest formulating a new protocol for smooth e-commerce data harvesting. We call this new protocol as E-Commerce Data Harvesting (ECDH) Protocol. The protocol process flow diagram is illustrated in Fig. 4. Using the same protocol, it is expected that users can utilize similar URI format to obtain the same type of products from any data provider. As an example, users can call verb=ListProducts from both data provider A and data provider B to list all products and they only need to change API’s endpoint to interact with the other data provider. The rest of the URI will be similar with the others. As for the API response, it will be based on a standard that can maintain product information. We propose two candidates as the API’s response. These are GoodRelations ontology and BMEcat 2005 format. GoodRelations is ontology for describing products and services offers on the web. This ontology is flexible and supports value intervals plus existential quantification [17]. It is widely used and has been adopted by some of the largest search engines to reveal product details on search results. The BMEcat 2005 is an XML standard for business-to-business catalog data exchange. BMEcat 2005 format has been developed to simplify product catalogs exchange between suppliers and purchasing companies. The utilization of these standards is not limited only for transmitting product’s data to the purchasing companies but also can be used for transmitting product’s data internally within a single company or externally between various companies [18]. As we have done extensive literature review in formulating a new protocol to uniform the API calling, we find that GoodRelations is a promising ontology to realize it. BestBuy.com as a major implementer of GoodRelations ontology had recently announced an increase of 30% traffic to their store’s pages, which contain GoodRelations annotated structured data [19]. Previous publication [19] has also shown

TABLE I. Algorithm

MAPPING OF AMAZON TAXONOMY TO ODP AND GR Mapping

Park & Kim

CMAP

PROMPT

Fig. 4. API Calling by Using E-Commerce Data Harvesting Protocol (ECDH)

that GoodRelations Dataset is mostly combined with other ontology, such as Dublin Core (DC), Friend of a Friend (FOAF), vCard. Larger focused vocabularies are provided by the relevant business data sources, such as frbr is used by O’Reilly to annotate bibliographic data. Furthermore, combination GoodRelations with other ontologies, it can address more industrial segments’ need for smooth data interoperability. One can combine it with GeoNames Ontology, The Vehicle Sales Ontology (VSO), The Ticket Ontology (TIO), The Accommodation Ontology (ACCO), Consumer Products Ontologies or other user defined ontology to support particular industry segments [20]. In addition to that, one of the future developments of GoodRelations is to integrate it with BMEcat 2005 format as it may be beneficial in providing a Semantic Web vocabulary based on Business to Business (B2B) catalog data exchange [17]. There is also a tool that has been developed to convert BMEcat XML data sources into a RDF-based data model anchored in the GoodRelations vocabulary [21]. This means existing BMEcat users can easily utilize GoodRelations vocabulary as an alternative to expose the respected data. A good ontology should be able to accommodate different data types so that the things that are common can be represented together, while the things that are distinct can be represented as well [22]. As such, GoodRelations ontology has better performance than others and this information has been extracted from a published work in [23]. Mapping of Amazon taxonomy to GoodRelations (GR) and Open Directory Project (ODP) has been compared using several algorithms such as Park & Kim [24], CMAP [23] and PROMPT [25]. As shown in Table 1, mapping precision and accuracy using Park & Kim algorithm increase by 18.05% and 11% respectively. Using CMAP algorithm, mapping precision and accuracy also increased by 38.5% and 19% respectively. Similar trends have been shown when one uses PROMPT algorithm. Its precision and accuracy has increased by 27.85% and 14.6% respectively. These data demonstrate that GoodRelations has general term that is suitable for e-commerce requirements. Therefore, conversion from existing ontology to GoodRelations is easier.

Amazon

Precision

Accuracy

ODP

29.10%

20.80%

GR

47.15%

31.80%

ODP

31.44%

25.60%

GR

69.94%

44.60%

ODP

7.74%

14.40%

GR

35.59%

29%

If one reviews back the legendary fridge example, implementation of the same protocol and ontology leads to the ability of the fridge to harvest product details from several retailers. Then the fridge can compare the price among them. In addition to that, the retailers can provide their location by combining both GoodRelations and GeoNames ontology, so that the fridge can harvest retailer’s location as well. Thus, it will be easier to combine the computational analysis results in a unified framework by using the linked data approach to metadata management [26]. This will lead to the ability of the fridge to decide which retailer has the cheapest product with the nearest location. IV. CONCLUSION A protocol called E-Commerce Data Harvesting (ECDH) can be proposed to unify communication model among several systems. It can simplify many ways to call API from several retailers. It is expected that affiliate users can utilize similar URI to call API from various data providers. By implementing uniform data structure response using suitable ontology, one can shorten the development time that previously allocated to parsing different data structure from each data provider. We suggest GoodRelations ontology because of its flexible features. It can be combined with the other ontologies to address more industrial segments. By combining both protocol and ontology, we can establish e-commerce data harvesting to provide smooth data interoperability. ACKNOWLEDGMENT The author would like to thank Emerson P. Sinulingga PhD for his invaluable support in the realization of this publication. His insight, expertise and guidance helped me in all the time of research and writing this publication. REFERENCES [1] Zhengjie Fan and Sisi Zlatanova, "Exploring Ontologies for Semantic

Interoperability of Data in Emergency Response," Applied Geomatics, vol. 3, no. 2, pp. 109-122, June 2011.

[2] Luigi Atzori, Antonio Iera, and Giacomo Morabito, "The Internet of

Things: A survey," Computer Networks, vol. 54, no. 15, pp. 2787-2805, 2010.

[3] Mischa Dohler, Ignasi Vilajosana, Xavi Vilajosana, and Jordi Llosa,

"Smart Cities: An Action Plan," in Barcelona Smart Cities Congress, Barcelona, 2011, pp. 1-6.

[4] Andrea Zanella, Nicola Bui, Angelo Castellani, Lorenzo Vangelista,

[16] National Library of Australia. (2008, April) National Library of

[5] Nikole Kobie. (2015, March) The internet of things: convenience at a

[17] Martin Hepp, "GoodRelations: An Ontology for Describing Products

[6] International Telecommunication Union, "Measuring the Information

[18] Volker Schmitz, Jorg Leukel, and Oliver Kelkar. (2005) Specification

and Michele Zorzi, "Internet of Things for Smart Cities," IEEE Internet of Things Journal, vol. 1, no. 1, pp. 22-32, February 2014. price | Technology | The Guardian. [Online]. http://www.theguardian.com/technology/2015/mar/30/internet-ofthings-convenience-price-privacy-security Society Report 2014," International Telecommunication Union, Geneva, Report 2014.

[7] eMarketer. (2014, July) Worldwide Ecommerce Sales to Increase

Nearly 20% in 2014. [Online]. http://www.emarketer.com/Article/Worldwide-Ecommerce-SalesIncrease-Nearly-20-2014/1011039

[8] Internet Society, "Global Internet Report 2014," Internet Society, Report 2014.

[9] Dennis L. Duffy, "Affiliate Marketing and Its Impact on E-commerce," Journal of Consumer Marketing, vol. 22, no. 3, pp. 161-163, 2005.

[10]

[11]

Carl Lagoze, Herbert Van de Sompel, Michael Nelson, and Simeon Warner. (2015, January) The Open Archives Initiative Protocol for Metadata Harvesting. [Online]. http://www.openarchives.org/OAI/openarchivesprotocol.html Shuming Li, Zongkai Yang, Qingtang Liu, and Tao Huang, "Research of web information retrieval based on metadata and OAI," in Granular Computing, 2008. GrC 2008. IEEE International Conference on, Hangzhou, 2008, pp. 383-386.

[12] Arwen Hutt and Jenn Riley, "Semantics and Syntax of Dublin Core

Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials," in Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, Denver, 2005, pp. 262-270.

[13] DSpace. (2012, October) OAI-PMH Data Provider 2.0 (Internals). [Online]. https://wiki.duraspace.org/pages/viewpage.action?pageId=45548245

[14] EPrints. (2012, May) Synchronize your repository via OAI-PMH. [Online]. http://wiki.eprints.org/w/Synchronize_your_repository_via_OAI-PMH

[15]

CiteSeerX. CiteSeerX http://csxstatic.ist.psu.edu/about/data

Data.

[Online].

Australia Digital Object http://www.nla.gov.au/digicoll/oai/

Repository.

[Online].

and Services Offers on the Web," in Proceedings of the 16th International Conference on Knowledge Engineering: Practice and Patterns, Acitrezza, 2008, pp. 329-346. BMEcat® 2005. Document.

[19] Jamshaid Ashraf, Richard Cyganiak, Sean O'Riain, and Maja Hadzic,

"Open eBusiness Ontology Usage: Investigating Community Implementation of GoodRelations," in Linked Data on the Web 2011, Hyderabad, 2011.

[20] Martin Hepp. (2014, April) Extensions for GoodRelations for Specific Industries. [Online]. vocabulary.org/Vocabularies

http://wiki.goodrelations-

[21] Alex Stolz, Benedicto Rodriguez-Castro, and Martin Hepp, "Using

BMEcat Catalogs as a Lever for Product Master Data on the Semantic Web," in 10th Extended Semantic Web Conference, ESCW 2013, Montpellier, 2013, pp. 623-638.

[22] Dean Allemang and Jim Hendler, Semantic Web for the working ontologist : effective modeling in RDFS and OWL.: Elsevier Inc., 2011.

[23] Lennart J. Nederstigt, Steven S. Aanen, Damir Vandić, and Flavius

Frăsincar, "An Automatic Approach for Mapping Product Taxonomies in e-Commerce Systems," in Proceedings of the 24th International Conference on Advanced Information Systems Engineering, 2012, pp. 334-349.

[24] Sangun Park and Wooju Kim, "Ontology Mapping Between

Heterogeneous Product Taxonomies in an Electronic Commerce Environment," International Journal of Electronic Commerce, vol. 12, no. 2, pp. 69-87, December 2007.

[25] Natalya F. Noy and Mark A. Musen, "The PROMPT Suite: Interactive

Tools for Ontology Merging and Mapping," International Journal of Human-Computer Studies, vol. 59, no. 6, pp. 983-1024, December 2003.

[26] Sean Bechhofer, Kevin Page, and David De Roure, "Hello Cleveland!

Linked Data Publication of Live Music Archives," in 14th International Workshop on Image and Audio Analysis for Multimedia Interactive Services, Paris, 2013, pp. 1-4.

Suggest Documents