Big-data: transformation from heterogeneous data to ...

Multimed Tools Appl (2016) 75:12727–12747 DOI 10.1007/s11042-015-2918-5

Big-data: transformation from heterogeneous data to semantically-enriched simplified data Kaleem Razzaq Malik 1 & Tauqir Ahmad 1 & Muhammad Farhan 1,2 & Muhammad Aslam 1 & Sohail Jabbar 2,3 & Shehzad Khalid 3 & Mucheol Kim 4

Received: 15 May 2015 / Revised: 7 August 2015 / Accepted: 24 August 2015 / Published online: 8 September 2015 # Springer Science+Business Media New York 2015

Abstract In big data, data originates from many distributed and different sources in the shape of audio, video, text and sound on the bases of real time; which makes it massive and complex for traditional systems to handle. For this, data representation is required in the form of semantically-enriched for better utilization but keeping it simplified is essential. Such a representation is possible by using Resource Description Framework (RDF) introduced by World Wide Web Consortium (W3C). Bringing and transforming data from different sources in different formats into the RDF form having rapid ratio of increase is still an issue. This requires improvements to cover transition of information among all applications with

* Muhammad Farhan [email protected] Kaleem Razzaq Malik [email protected] Tauqir Ahmad [email protected] Muhammad Aslam [email protected] Sohail Jabbar [email protected] Shehzad Khalid [email protected] Mucheol Kim [email protected] 1

Department of Computer Science and Engineering, University of Engineering and Technology, Lahore, Pakistan

2

Department of Computer Science, COMSATS Institute of Information Technology, Sahiwal, Pakistan

3

Department of Computer Science, Bahria University, Islamabad, Pakistan

4

Department of Multimedia, Sungkyul University, Anyang-si 430-742, Republic of Korea

12728

Multimed Tools Appl (2016) 75:12727–12747

induction of simplicity to reduce complexities of prominently storing data. With the improvements induced in the shape of big data representation for transformation of data to form into Extensible Markup Language (XML) and then into RDF triple as linked in real time. It is highly needed to make transformation more data friendly. We have worked on this study on developing a process which translates data in a way without any type of information loss. This requires to manage data and metadata in such a way so they may not improve complexity and keep the strong linkage among them. Metadata is being kept generalized to keep it more useful than being dedicated to specific types of data source. Which includes a model explaining its functionality and corresponding algorithms focusing how it gets implemented. A case study is used to show transformation of relational database textual data into RDF, and at end results are being discussed. Keywords Resource description framework schema (RDFS) . Big data . Data representation

1 Introduction The constant increment in the volume and subtle element of data caught by associations, for example, the ascent of social media, Internet of Things (IoT), and multimedia, has delivered a mind-boggling stream of data in either organized or unstructured arrangement. Data creation is happening at a record rate, alluded to in this as big data, and has risen as a generally perceived pattern. Big data is inspiring consideration from the educated community, government, and industry [15]. So as to comprehend data and to pick up knowledge, the data must be sorted, changed, blended and prepared both measurably and logically. The potential development increment in data bases in all zones makes the investigation of colossal measures of more intricate data more difficult and less clear. A noteworthy test for specialists and professionals is that this development rate surpasses their capacity to outline fitting data representation for data examination and upgrade concentrated workloads [20]. Today, semantic web innovation is spoken to by the as yet advancing Web Ontology Language (OWL), or ‘web cosmology dialect’. (Philosophy^ is often used to allude to any component of learning representation.) As it happens, the development of the semantic web, including OWL and its colleagues, for example, RDF (‘asset depiction structure’). Where RDF is introduced by W3C as a standard structure for representing data in semantic information which links data in hierarchal form at the level of metadata. Generally RDF is used to represent information and resources on web. Where these resources need to be interpreted by machine with the use of reasoning and rules [25]. RDF is used to represent data in hierarchal classification based relationships [19]. These RDF based data classifications can’t capture complete semantics of a relationship. But it is really hard to handle complexity with higher level predicates, for example, OWL [26]. This makes RDF most suitable and simple as data representation used for Big Data with some updates. There are multiple strategies available for transforming data into ontology. Among which one is to use an intermediate standard of xml as a mean to both to transform into any other computer platform based data form. Another way is to use ERD graphs to be transformed into ontology graphs or Class Diagrams [11]. Regular expression a mathematical approach can also be used to convert among relational DB and OWL. There is also another approach, focusing on storing such an information of triples taken from RDF in the shape of resource-propertyvalue as defined in a W3C standard in Database for storage purposes [24]. Under the umbrella


12729

of big data representation plays major key role for analyzing, storage, retrieval, purifying and visualization of data in distributed and huge scale [10, 21]. This representation is done using semantic web based language for RDF. Then there is a term used as data science involving transformation of data based information into knowledge [29, 33]. Such knowledge can be achieved due the data linkage capable of integrating between heterogeneous data structures. This gives us some hint towards Big Data representation means in a nutshell. As the way, to know how data is found in web semantic Ontologies as linked form [6, 40]. Whereas, Linked Data is to build web of connected data using unique identifiers. Linked Data are to connect linked open data through network HTTP protocol for linkage and collaboration. Advances in Semantic Web have made cosmology another helpful hotspot for depicting multimedia semantics [16, 27]. Metaphysics constructs a formal and express representation of semantic orders for the ideas and their connections in data broad occasions, and permits thinking to infer understood information [9]. These days, the utilization of chart arranged representations and rich semantic vocabularies are picking up energy. From one viewpoint, charts are adaptable models for coordinating information with distinctive degrees of structure, as well as empower these heterogeneous information to be connected in a uniform manner [5, 36]. Then again, vocabularies portray what the information mean. The most commonsense pattern, in this line, proposes the utilization of the Resource Description Framework: RDF a standard model for information encoding and semantic innovations for distribution, trade, and utilization of this Big Semantic Data at all inclusive scale. Primary information stream performed on this web of Data: production, trade, and utilization [4, 23, 35]. Whatever remains of the paper is composed as takes after. Area 2 gives the related work of big data and semantic connection system. The issue definition and scientific displaying is presented in Section 3 and 4. Area 6 proposes the technique for measuring relationship through demonstrating. Contextual analysis are introduced in Section 7. Conclusions are made in the last segment.

2 Literature review To understand in depth how all these relate to achieving a goal which is common to all these approaches is through their data model study. And what they are being made to achieve. By doing so there is much more reliable result of showing which approach can bypass other in gaining better results. On ontology end Resource Description Framework (RDF) model uses a directed labeled graphs (DLG) due to the similarity between both, but differs when comes to actually defining DLG by providing multiple routes to nodes [12, 37, 38]. Big Data bring about the conjunction of three V’s, depicting complex and huge data challenges and opportunities in 3 dimensions. Where first V is for Volume which is the clearest component because of a huge measure of information that is persistently accumulated and put away in enormous information sets. These information sets are uncovered for diverse purposes and utilizations. Adaptability is the significant errand connected with Big Data volume on the off chance that we consider that successful stockpiling apparatuses are the first essential in this circumstance [1]. Second V is for Velocity, essentially capacity choices influence information recovery, a definitive objective for the client which trusts it to be executed at conceivable quick speed; especially on account of continuous courses of action [7, 14]. Speed depicts how information

12730


stream, at high rates, in an inexorably disseminated situation. These days, speed increments in a comparable manner than volume. Gushing information handling is the primary test identified with this measurement on the grounds that specific stockpiling is required for viable volume administration, additionally for constant reaction. Assortment alludes to different degrees of structure (or deficiency in that department) inside the source information. This is basically because of Big Data may originate from numerous sources (e.g., sciences, legislative issues, economy, informal organizations, or web server logs, among others) and every one depicts its own particular semantics, consequently information take after a particular auxiliary demonstrating [24, 41]. Third V is for Variety, the fundamental test of Big Data mixture is to attain to a successful instrument for connecting various classes of information varying in the inward structure. Connected Data is about utilizing the WWW to unite related information that wasn’t beforehand connected, or utilizing the WWW to bring down the obstructions to connecting information as of now connected utilizing different routines. All the more particularly, Wikipedia characterizes Linked Data as Ba term used to portray a suggested best practice for uncovering, offering, and uniting bits of information, data, and learning on the Semantic Web utilizing URIs (Uniform Resource Identifiers) and RDF [3]. To achieve speedy, simple and effective data for capturing, sharing, searching, analysis, transfer and visualization into a unified, linked, and less complex data representation is needed; like RDF [1]. Efforts are made by researchers for transforming databank into DTD (Document Type Definition) or RDFS (Resource Description Framework Schema) either partly or completely. By observing at the identifiers in XML document, a machine can choose better fitting tags for the adequate role as class or property [39]. Whereas, XML document can be transformed to improve capability of being interpreted as RDF. It’s essential to keep XML’s original structure unaffected during transformation process for improved results and handling of the data [28]. Transformation requires mapping of two different data models to come to an agreement. Different transformation techniques are being introduced among them some state of art techniques include Direct Mapping, R2O, eD2R, Relational.OWL, D2RQ, Triplify, R2RML, and R3M [17]. Another representation of Big Data representation is through Header Dictionary Triples (HDT) advanced form of RDF [1]. HDT is adopted for publishing and exchanging RDF based data; translated through heterogeneous data. Transformation process of this study can be proved in developing RDF data simplified for fastest results production in case of huge data interpretation.

3 Problem This study is to see problems at the level of data collection and representation when discussing big-data. Big-date data come majorly in the form of videos, audio, and textual forms either linked or scattered [22, 31, 32]. This data is further used for real time prediction or analysis purposes for catering issues of any kind. These issues majorly involve geographical, biological, threats [2, 30]. Now it becomes necessarily more important to evolve data according to the need of big-data, so that less complexity should be invoked when data is coming from all and distributed source at real time and increasing simultaneously. We have proposed improvement in the XML, RDF to overcome the issues concerning updates rapidly and keeping linkage maintained [12]. We also attempted to see


12731

it through using some mathematical modeling. Finally, some results are being discussed to see its outcome.

4 Mathematical representation Most data in Big-data appears in four formats Video, Audio, Textual, and Images. Following Table 1 represents the terminology used in mathematical modeling Now let’s first define the generalized form of XML data for a specific piece of information lnðtÞ þl lnð2Þ X lnð2Þ t ¼ e−lambertW ðlnð2Þe ÞþX lnð2Þ ∵ðt Þ∈T X ¼

Where t belongs to the family of set of XML; now let’s take a function Xt klnðk Þ Xt ¼ þ n−k t lnð2Þ ∵k; n∈N

ð1Þ

Here klog2(k) represents all tags with opening and closing tags, whereas n – k are remaining single tags in XML under some specific information representation. Equation (1) Xt Can provide a complete set of tags to represent specific information If α=klog2(k)+(n−k) then α¼

klnðk Þ þ n−k lnð2Þ

ð2Þ

By putting the value of Eq. (2) in Eq. (1) X t ¼ αt

ð3Þ

Table 1 Definitions of terms used in mathematical modeling Notation

Description

V

A set of all video contents

W I

A set of all words A set of all images

A

A set of all sound data

T

XML set of interlinked data using tags

t

Tag representing element for XML

k

Total number of tags having opening and closing

n

Total number of tags to represent an information

S

Source for RDF triple

R P

Resource for RDF triple Predicate for RDF triple

12732


For constant increase of data on time basis Eq. (3) becomes X t ¼ αt þ 2αt þ 3αt þ 4αt þ … þ mαt 1 X t ¼ mðm−1Þλαt 2

ð4Þ

If change is constant, then constant factor lambda lies between 0 < λ ≤1

ð5Þ

Where m∈N and m is maximum change which can occur in an instance ( Xt ¼

αt mðm−1Þ 2

λ>0 λαtλ ≤1

) ð6Þ

Big data when translated into XML form contain values and schema for all types of contents which requires to be taken care of. Eq. (6) shows the importance of the change factor in our case lambda which remains non effective if closest to zero. For the set of Video V tags can be represented in the form of tv for XML function Xtv and similarly, set of Audio A tags can be represented in the form of t for XML function Xta, set of Word tags can be represented in the form of tw for XML function Xtw and finally set of Images tags can be represented in the form of ti for XML function Xti ( ) αt λ>0 X tv ¼ mðm−1Þ λαtv λ≤1 2 ( ) αt λ>0 X tα ¼ mðm−1Þ λαta λ≤1 2 ( ) ð7Þ αt λ>0 m ð m−1 Þ X tw ¼ λαtw λ≤1 2 ( ) αt λ>0 X ti ¼ mðm−1Þ λαti λ ≤1 2 X bigdata ¼ X tv þ X ta þ X tw þ X ti

Equation (7) is the simplified form of XML data translated for data represented in any type of incoming big data contents. Here we can also say that ðX tv Þ≈ðX ta þ X tw þ X ti Þ

ð8Þ

Equation (8) can only be true if coming from the same source and in that case Eq. (7) will look like X bigdata ¼ X tv þ X tv X bigdata ¼ 2X tv

Let T is a set of all tag sets {T1,T2,T3,T4,…,Tn} where each element of T is a set as ⊆T


12733

In RDF each Ti tag set is transformed into multiple linked triples of (S, P, O) set Ri Ri ¼ fðS 1 ; P1 ; O1 Þ; ðS 2 ; P2 ; O2 Þ; ðS 3 ; P3 ; O3 Þ; …; ðS k ; Pk ; Ok Þg

ð9Þ

Where k is the possible triple value of the corresponding / tag set. The RDF complete set can be seen as R ¼ fR1 ; R2 ; R3 ; R4 ; …; Rm g∵m∈N Here m is the real number representing a maximum set produced for a specific XML data. According to Eqs. (5) and (9) it can be said RT ¼ R þ λR

0

ð10Þ

In Eq. (10) λ is same as defined in Eq. (4) same change factor and Ŕ is new RDF triples added to old set R at instance T Rise in the value of λ appear due to conflict, duplications, rapidly increasing linkage and collisions in data produced in the form of RDF. This can be reduced by decreasing the degree of complexity of transformation at the level of data representation of big data. There is need of controlling factor which can link data positively at the time it’s been produced. This control can be achieved at the point of XML creation. Each datum can further use classification from metadata to see linked data origin and purpose [13, 18, 34].

5 Methodology to overcome issues 5.1 Implementation and process When trying to be more in control, transformation (as shown in Fig. 1) is required than RDB Schema to XML Schema and then into XML Schema will work fine as all of these come with rich interface to transform up to the need. Among semi-structured data which has richer approach is better due to as being improved to gain better results and future needs [8]. Then in Fig. 2 is a complete process of how bidirectional transformation is being performed through showing input-process-output for our research work. In Fig. 2, heterogeneous data is sent to the system of XML transformation. This system builds XML document from incoming data. This document further gets transformed into RDFS through another system of transformation. This information gets form of linked data which is ready for inference. When data is passed to inference engine in the shape of linked data new results against rules and predicates are generated. This newly generated data can be in any form and size making it vulnerable to

Fig. 1 Two approaches with structural level difference in Transformation Process

RDBS

Transformation (RDBS-DTD)

RDBS

Transformation (RDBS-XML Schema)

DTD

XML Schema

Basic (Semi Structured))

Richer (Near to Structured)

12734 Fig. 2 Complete Process of Big data from heterogeneous data getting Transformed and Inferred

Multimed Tools Appl (2016) 75:12727–12747 Transformaon (data-XML Schema)

Heterogeneous data

Inference Engine

XML Schema

Transformation (XML SchemaRDFS)

Big Data

Data Linkage in Knowledgebase

Linked Data

RDFS

be passed again from this system to get more results or can be used by inference engine directly, depending on the nature of data. In Fig. 3, Big Data source is producing data and being collected at the process of XML transformation. This process further generates XML equivalence form of data along with its metadata (data about data) which is necessary for data representation. This process is important as to keep track of data without introducing complexities. To resolve issues of transformation through simplicity and standardization of XML centered data model either form of XML document i.e., DTD or XML Schema are used. In further section, transformation algorithms 1 and 2 are covering this process of XML Transformation either using DTD or XML Schema.

6 Algorithms Following algorithms are made to simplify and transform heterogeneous data to reduce the issues discusses in above sections. Each of discussed algorithms of this manuscript can further be divided into two steps i) preprocessing part and ii) transformation part. Preprocessing portion is to perform activities to make coming data in a form being ready for transformation process for execution. Transformation the main portion is where mapping and transformation between different data models will happen.

Fig. 3 Data Representation of Big Data at the level of XML based Transformation Data

XML Transformaon XML

Data

XML

Link

Metadata

Data Representaon

Big data Source


12735

Algorithm 1: Big data to DTD Transforming Big Data into DTD has a trick of its own as been shown in the Table 1 of this paper in the section of mapping that how these technologies can relate to each other. Furthermore, following algorithm provides detail implementation for a transformation to happen among given RDBS in DTD.

12736 Fig. 4 Schema of RDB taken as an example to show the results of the transformation process


Bankorder (ordered title branch city amount );

text, text, text, text, float

The algorithm 1 shows how making first element as a data file representative and add parameters to this element as all the files contained by the big data source. Then by the use of ATTLIST tag against each element defined to store all fields of the table as attributes. In the attribute are with id, then can represent primary key and similarly IDREF for reference representation. And remaining attributes are having a data type of PCDATA, which is equivalent to the string data structure. Algorithm-1 takes the data file of Big Data as an input and transforms it into a DTD of a XML document as output. In this algorithm, there exist two nested loops. The outer loop is responsible of dealing with the main representative fields of the File and transforming them into corresponding element of the DTD document. The inner loop transforms every nested fields of the representative into the attributes of the corresponding elements. Then on the bases of different test cases discussed below covers that whom field will be identifier, reference to other resource or attribute of a resource itself using ID, IDREF and PCDATA of DTD schema syntax. There exist multiple test cases in algorithm 1 as follows: Case 1 Primary and reference indexed data These indexes are for data concerning identification and allocation of resource dependencies. When file is transformed into DTD then use ID is for within file identifiers and IDREF is for resource dependencies identifiers for the data file. Case 2 Simple Field data Other than indexes all data will be transformed into simple attributes of type PCDATA.


Algorithm 2: Big data to XML Schema

12737

12738

Multimed Tools Appl (2016) 75:12727–12747 < xs:element name=”branch” type=”xs:string” /> < xs:element name=”city” type=”xs:string” /> < xs:element name=”amount” type=”xs:decimal” />

Fig. 5 bankorder schema transformed using a transformation algorithm between RDBS to XML Schema

Now this is another algorithm for transformation of Big data as if the XML document is being generated using XML Schema. In this case we only considered complex elements to represent a file along with its primary and foreign indexing relationships as attributes. Algorithm-2 takes the data file of Big Data as an input and transforms it into a XML Schema of a XML document as output. In this algorithm, there exist two nested loops. The outer loop is responsible of dealing with the main representative fields of the File and transforming them into corresponding element of the XML Schema document. The inner loop transforms every nested fields of the representative into the attributes of the corresponding XML elements. Then on the bases of different test cases discussed below covers that whom field will be identifier, reference to other resource or attribute of a resource itself using attributes having ‘use as required’ and maintaining an array to produce all referenced recourse into XML schema syntax. There exist multiple test cases in algorithm 2 Case 1 Primary and reference indexed data These indexes are for data concerning identification and allocation of resource dependencies. When file is transformed into XML tags then making attribute as required for within file identifiers and attributes containing extra info is for resource dependencies identifiers for the data file. Case 1 Simple Field data Other than indexes all data will be transformed into complex element containing elements of type similar to resource file.

ID IDREF CDATA CDATA CDATA CDATA

#REQUIRED #REQUIRED #FIXED #FIXED #FIXED #FIXED>

Fig. 6 bankorder schema transformed using a transformation algorithm between RDBS to DTD

Multimed Tools Appl (2016) 75:12727–12747 Fig. 7 A sample Bbankorder^ triples list generated using a transformation algorithm between DTD to RDFS / XML Schema to RDFS

Subject BankDB BankOrder BankOrder ID ID

12739

Predicate rdfs:Class rdfs:Class rdf:Property rdfs:DataType rdfs:range

Object rdf:resource BankDB ID String BankOrder

Algorithm 3: DTD to RDFS

In above algorithm root element is made as a root class and then through a looping mechanism a Class for each table is found in root element attribute list. For each element found in the list make it, its property and assign a data type against each type found to the property which in this case will only be strings. Through this way we can generate triples as a

12740


representative to DTD and indirectly our source files. And furthermore, this list can be translated into a directed graph of semantic web resources. There exist multiple test cases in algorithm 3 as follows: Case 1 Range and domain of triples Range in hierarchal triples represent an identifier whereas domain is for reference based identifier corresponding to a domain/area to which this information belongs. Case 2 Simple triples Other information in translated into triples related to corresponding XML information of data. Algorithm 4: XML Schema to RDFS

This is the last algorithm where document name is used to represent a root class and then through a looping mechanism a Class for each table is found to be in complex element. For each element found makes it its property and assigns data type against each type of that


12741

BankDB rdfs:Class rdfs:range

BankOrder

rdf:Property

rdf:Property rdf:Property

ID

rdf:Property

BankDB

BankDB

BankDB

Fig. 8 Bbankorder^ triples list translated into a directed graph

property. And now through this way we can generate triples as a representative to XML Schema and indirectly our source files. And furthermore, this list can also be translated into a directed graph of semantic web resources as done in the case of the DTD. Test cases for this algorithm are same as for algorithm 3. Through these algorithms it’s been shown that our solution is based on two possible cases. One is when we start from raw data state into common standard form that is XML. Then after performing transformation we will end at RDFS state useful for analysis and inference on given data. Whereas intermediate state either can be of DTD or XML Schema type for XML document. Above in this section of manuscript, all related four algorithms are presented covering complete transformation mechanism discussed in Fig. 2.

7 Case study To show the working of the elaborated process for transformation in big data, first let’s look at the data format as RDBS to show algorithm working on small samples, first let’s look at the RDBS of our example taken of relation named Bbankorder^ given in Fig. 4. Now by using algorithms 1 and 2 given in the paper’s section of implementation RDBS shown in Fig. 4 is transformed into Figs. 5 and 6 as a result. Through this we have gained an intermediate format understood by web based technologies in the shape of DTD and XML Schema. So, the remaining task is to transform results into RDFS format which in our case is done using algorithms 3 and 4 from implementation section. And the resultant RDFS not the complete list of triples gained are shown in Fig. 7. Then further in Fig. 8 shows a directed graph of showing resources gained through complete triple list. Whereas Fig. 9 shows if algorithm 3 is further improved to handle constraints, then that could be done using simple type tags.

Fig. 9 An example of implementing constraints using XML Schema

12742


8 Conclusion and future concerns Late research demonstrates that fast data assets in the wild are developing at an amazing rate. The fast build number of data assets has conveyed a critical need to create insightful techniques to compose and procedure them. In this paper, the Semantic Representation model is utilized for arranging approaching data assets. Numerical demonstrating is intended to set up related relations among different assets (e.g., Web pages or reports in advanced library) going for expanding the approximately associated system of no semantics (e.g., the Web) to an affiliation rich system. Bringing real time data assets to a system requires fast response time by reducing any features which are only delaying the processing. In the system of Big Data requires to be translated into a form which can be analyzed with lowest complexity induced in data model for performance improvement. This data model in our case is RDF in its basic form. Which we have shown throughout process and mathematical modeling, such that a factor of simplicity can improve response concerning analysis process with the reduction of delay factor. For implementation four algorithms are given which are used for basic level transformation of data assets into RDF form. Yet improvement in these algorithms can be useful for covering large scale of data model concerning audio, video, image and textual formats. Compiling the main idea collectively and making it look like comparative how these newly developed technologies like RDF for web semantic can be made compatible with old and traditional technologies of web and databases as XML and RDBMS respectively. It is important for new researchers to gain proper understanding of their working and capability to make suitable improvements to upcoming technologies of the web. Currently left issues are like composite key handling and improving DTD as much as that it can be used to resolve RDB constraints and data types. So, issues like these can be considered as future concerns of this research work.

References 1. R Akerkar (2013) Big data computing: CRC Press 2. E Antezana, Kuiper M, Mironov V (2009) Biological knowledge management: the emerging role of the semantic web technologies. Brief Bioinform 10:392–407 3. S Auer, A-C N Ngomo, P. Frischmuth, J Klimek (2013) Linked Data in Enterprise Integration, Big Data Computing, p. 169 4. BFdF Souza, ACO Salgado, MdCMC Batista (2013) Information Quality Criteria Analysis in Query Reformulation in Dynamic Distributed Environments 5. Bizer C, Boncz P, Brodie ML, Erling O (2012) The meaningful use of big data: four perspectives–four challenges. ACM SIGMOD Record 40:56–60 6. Broekstra J, Klein M, Decker S, Fensel D, Van F, Horrocks I (2001) Enabling Knowledge Representation on the Web by Extending RDF Schema, WWW01, May 1–5, 2001 Hong Kong 7. S Christodoulou, N Karacapilidis, M Tzagarakis, V Dimitrova, G de la Calle (2014) Data Intensiveness and Cognitive Complexity in Contemporary Collaboration and Decision Making Settings, Mastering DataIntensive Collaboration Decision Making, ed: Springer, pp. 17–48 8. A Cuzzocrea, C Diamantini, L Genga, D Potena, E Storti (2014) A composite methodology for supporting collaboration pattern discovery via semantic enrichment and multidimensional analysis, in Soft Computing Pattern Recognition (SoCPaR), 2014 6th Int Conf, pp. 459–464 9. de Diego R, Martínez J-F, Rodríguez-Molina J, Cuerva A (2014) A semantic middleware architecture focused on data and heterogeneity management within the smart grid. Energies 7:5953–5994


12743

10. M Dörk (2012) Visualization for Search: Exploring Complex and Dynamic Information Spaces, Citeseer 11. A Eberhart (2003) Ontology-based Infrastructure for Intelligent Applications, Universitätsbibliothek 12. Frasincar F, Houben G, Vdovjak R, Barna P (2002) RAL: an Algebra for Querying RDF, Proc 3rd Int Conf Web Information Syst Eng, IEEE 13. Frey JG, Bird CL (2013) Cheminformatics and the semantic web: adding value with linked data and enhanced provenance. Wiley Interdisciplinary Rev: Computational Mol Sci 3:465–481 14. D Gentner, F van Harmelen, P Hitzler, K Janowicz, K-U Kuhnberger (2012) Cognitive approaches for the semantic web 15. H-M Haav, P Küngas (2013) Semantic Data Interoperability: The Key Problem of Big Data, Big Data Computing, p. 245 16. Herrmann-Krotz G, Kohlmetz D, Müller-Rowold B (2011) Publikationen. New Rev Hypermedia Multimedia 20:53–77 17. Hert M, Reif G, Gall HC (2011) A comparison of RDB-to-RDF mapping languages, In Proc 7th Int Conf Semantic Syst, pp. 25–32, ACM 18. Hitzler P, Janowicz K (2013) Linked data, big data, and the 4th paradigm. Semantic Web 4:233–235 19. Hsu PL, Hsieh HS, Liang JH, Chen YS (2015) Mining various semantic relationships from unstructured user-generated web data. Web Semant Sci Serv Agents World Wide Web 31:27–38 20. Hu C, Xu Z, Liu Y, Mei L, Chen L, Luo X (2014) Semantic link network-based model for organizing multimedia big data. Emerging Topics Comput, IEEE Trans 2(3):376–387 21. HM Jamil (2014) Mapping abstract queries to big data web resources for on-the-fly data integration and information retrieval, in Data Engineering Workshops (ICDEW), IEEE 30th Int Conf, pp. 62–67 22. Khalili A, Auer S (2013) User interfaces for semantic authoring of textual content: a systematic literature review. Web Semant Sci Serv Agents World Wide Web 22:1–18 23. H Kim, K Kim (2014) Semantic levels of information hierarchy for urban street navigation, Int Conf Big Data Smart Computing (BIGCOMP), pp. 235–240 24. Kim Y, Kim B, Lim H (2006) The Index Organizations for RDF and RDF Schema, ICACT 25. Manola F, Miller E, McBride B (2004) RDF primer. W3C Recommendation 10:1–107 26. Manuja M, Garg D (2011) Semantic web mining of un-structured data: challenges and opportunities. Int J Eng (IJE) 5(3):268 27. Margara A, Urbani J, van Harmelen F, Bal H (2014) Streaming the web: reasoning over dynamic data. Web Semant Sci Serv Agents World Wide Web 25:24–44 28. Martens W, Neven F, Schwentick T, Bex GJ (2006) Expressiveness and complexity of XML schema. ACM Trans Database Syst (TODS) 31(3):770–813 29. SRH Noori (2011) A Large Scale Distributed Knowledge Organization System, University of Trento 30. SF Pileggi, R Amor (2015) Semantic Geographic Space: From Big Data to Ecosystems of Data, in Big Data in Complex Systems, ed: Springer, pp. 351–374 31. D Riemer, L Stojanovic, N Stojanovic (2014) SEPP: Semantics-Based Management of Fast Data Streams, in Service-Oriented Computing and Applications (SOCA), 2014 I.E. 7th International Conf, pp. 113–118 32. OR Rocha (2014) Context-Aware Service Creation on the Semantic Web, Politecnico di Torino 33. MA Sakka, B Defude (2012) Towards a Scalable Semantic Provenance Management System, in Transactions on Large-Scale Data-and Knowledge-Centered Systems VII, ed: Springer, pp. 96–127 34. P Serrano-Alvarado, E Desmontils (2013) Personal linked data: a solution to manage user’s privacy on the web, in Atelier sur la Protection de la Vie Privée (APVP) 35. S Sicari, C Cappiello, F De Pellegrini, D Miorandi, A Coen-Porisini (2014) A security-and quality-aware system architecture for Internet of Things, Information Systems Frontiers, pp. 1–13 36. R Soussi (2012) Querying and extracting heterogeneous graphs from structured data and unstrutured content, Ecole Centrale Paris 37. M Spaniol (2014) A Framework for Temporal Web Analytics, Université de Caen 38. M Strohbach, H Ziekow, V Gazis, N Akiva (2015) Towards a Big Data Analytics Framework for IoT and Smart City Applications, in Modeling and Processing for Next-Generation Big-Data Technologies, ed: Springer, pp. 257–282 39. PTT Thuy, Y-K Lee, S Lee, B-S Jeong (2007) Transforming valid XML documents into RDF via RDF schema. pp. 35–40 40. Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. Knowledge Data Eng, IEEE Trans 26: 97–107 41. J Zhao, O Corcho, P Missier, K Belhajjame, D Newmann, D De Roure et al. (2011) eScience, Handbook of Semantic Web Technologies, pp. 701–736

12744


Kaleem Razzaq Malik is the student of PhD in University of Engineering and Technology, Lahore, Pakistan and also working as instructor of computer science in Virtual University of Pakistan. He is working as Lecturer in Department of Software Engineering, Government College University Faisalabad, Pakistan from June 2013 Present (1 year 9 months) performing duties like Teaching. Doing Doctor of Philosophy (Ph.D.) in the field of Computer Science from UET, Lahore since year 2011. His interests include, computer programming, Semantic Web and Databases.

Tauqir Ahmad working as Associate Professor in Department of Computer Science & Engineering, University of Engineering and Technology (UET), Lahore from January 1999 – Present (16 years 5 months) performing duties like Teaching and Research. Completed Doctor of Philosophy (Ph.D.) in the field of Computer Science from UET, Lahore on year 2012.


12745

Muhammad Farhan is Assistant Professor at COMSATS Institute of Information Technology, Sahiwal Campus, Pakistan and also PhD Scholar at Department of Computer Sciences and Engineering in University of Engineering and Technology (UET), Pakistan and working as instructor of computer sciences in Virtual University of Pakistan (VU). He obtained MSCS from University of Management and Technology (UMT), Pakistan. He has received BSCS from Virtual University of Pakistan (VU). Currently he has 11+ years of teaching experience. His interests include, e-Learning, Computer Programming, Semantic Web and Databases.

Muhammad Aslam he is associate professor at University of Engineering and Technology, Lahore, Pakistan. Six years’ experience of Software Architecture Design, Team Leading, Team Building and Project, Five years’ experience of research and Development, Nine years’ experience of Research and Development as well as teaching at Postgraduate level(Supervising Ph. D. and M. Sc. Thesis). Ph.D. Computer Sciences: (CGPA 8.9/10) 2001–2005) CINVESTAV-IPN, Mexico (Cultural Exchange Scholarships between Pakistan and Mexico). RESEARCH/TEACHING INTERESTS: Knowledge Based Systems, Expert Systems, Intelligent Agents, Human Computer Interaction, and Computer Supported Cooperative Work, Cooperative Writing and Authoring, Communication, Coordination, Awareness, Cooperative Learning, Modern Operating Systems. DISTINCTIONS: Award of merit scholarship from Board of Intermediate and Secondary Education, Sargodha Division, Pakistan (1984–1986), Award of merit scholarship during B. Sc. Agricultural Engineering from Faculty of Agricultural Engineering, University of Agricultural, Faisalabad, Pakistan (1987–1991), Award of Silver Medal on account of winning second position in the faculty of Agricultural Engineering, University of Agricultural, Faisalabad Pakistan (1991), Award of Cultural Exchange Scholarship between Pakistan and Mexico for higher studies (2000–2004).

12746


Sohail Jabbar completed his MS (Telecom & Networking) with the honor of magna cum laude from Bahria University, Islamabad in 2009 a n d BS(Computer Science) from Allama Iqbal Open University in 2006. He has almost 7 years of teaching and research experience. He got many distinctions in his 25 research publications in various renewed journals and conferences. He has also been the reviewer of number of Impact Factor Journals. He is an active member of Bahria University Wireless Research Center (BUWRC). Currently, he is serving as Lecturer at department of Computer Science, COMSATS Institute of Information Technology, Sahiwal, Pakistan. His research interests include Wireless Sensor Networks, Machine to Machine communication and Internet of Things.

Dr. Shahzad Khalid an associate professor in the department of C&SE, has been declared as ‘Bahria University’s Pride’, says a report published in ‘Bugle’, the newsletter of Bahria University, Islamabad. Dr. Shehzad Khalid has recently published a research paper titled ‘Frameworks for multivariate m-Mediods based modelling and classification in Euclidean and general feature spaces,’ in an ISI indexed journal of Pattern Recognition with an impact factor of approximately 3. The professor has also published a conference paper titled ‘Image matching using feature set transformers.’


12747

Mucheol Kim is an Assistant Professor in a group of Industry-University Cooperation at Sungkyul University in Korea. His research interests include Information Retrieval, Web Technology, Social Networks and Wireless Sensor Networks. He was a Senior Researcher in Korea Institute of Science and Technology Information (KISTI), Daejeon, Korea. He received the BS, MS, PhD degrees from the school of Computer Science and Engineering at Chung-Ang University, Seoul, Korea in 2005, 2007 and 2012, respectively.

Big-data: transformation from heterogeneous data to ...

Big-data: transformation from heterogeneous data to ...

Suggest Documents

Avoiding Data Graveyards: From Heterogeneous Data ... - EXMARaLDA

From Heterogeneous Task Scheduling to Heterogeneous Mixed

Heterogeneous image transformation - Semantic Scholar

A Data Layout Transformation System for Heterogeneous ... - IMPACT

VC-Dimension and Rademacher Averages: From ... - BIGDATA

Semantic Information Retrieval from Distributed Heterogeneous Data ...

Distributed Data Mining From Heterogeneous ... - Semantic Scholar

Extracting Discriminative Shapelets from Heterogeneous Sensor Data

Presenting information from heterogeneous and distributed data ...

Heterogeneous Networked Data Recovery from Compressive ... - arXiv

Predicting cytotoxicity from heterogeneous data ... - BioMedSearch

Extracting Predictive Information from Heterogeneous Data Streams ...

Extracting Discriminative Shapelets from Heterogeneous Sensor Data

From homogeneous to heterogeneous network

VC-Dimension and Rademacher Averages: From ... - BIGDATA

Transformation From Hyperspectral Radiance Data to ... - IEEE Xplore

transformation of topographic data visualization from ...

Transformation from Glory to Glory

Transformation from chemisorption to physisorption

Avoiding Data Graveyards: From Heterogeneous Data Collected in

From Resilience to Transformation - EcologyAndSociety.org

Transformation from geocentric rectangular to

2014.03.24.Keystone BigData

Clustering Heterogeneous Data Sets