Effective Implementation of Data Segregation & Extraction Using Big ...

24 downloads 57005 Views 418KB Size Report
Effective Implementation of Data Segregation &. Extraction Using Big Data in E - Health Insurance as a Service. lK.MANOJ KUMAR. Assistant Professor,.
2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS -2016), Jan. 22 & 23, 2016, Coimbatore, INDIA

Effective Implementation of Data Segregation & Extraction Using Big Data in E - Health Insurance as a Service lK.MANOJ KUMAR

2TEJASREE.S

Assistant Professor, Department of Computer Science & Engineering, Sri Venkateswara College of Engineering (SVCE), Tirupati, Andhra Pradesh 517507 India, [email protected]

Assistant Professor, Department of Computer Science & Engineering, Sri Venkateswara College of Engineering (SVCE), Tirupati, Andhra Pradesh 517507 India, [email protected]

3S.SWARNALATHA

Assistant Professor, Department of Information Technology, Sri Venkateswara College of Engineering (SVCE), Tirupati, Andhra Pradesh 517507 India, [email protected]

Abstract

-

Big data is emerging technology now in all

areas, i.e. like online purchase, E- healthcare, tweet

analysis, and banking sector.

Now a day's insurance

companies are showing interest towards analysis of their huge datasets consists of patient's and hospital's information. From those data sets they extracting some useful information. Mostly they concentrate on success and failure percentage and feedback given by patients. Patients will be applying the hospital bills along with discharge summary, medical reports to the insurance company. Based on the patient procedure insurance company will decide to approve the claim and suggest for new patients. Here in this paper patients records, reports, symptoms, and feedbacks are analyzed using big data technologies like infinispan and map reduce concepts for data extraction and segregation in health

insurance.

Disclosing

of

patients'



private

information has been done using private data encoding algorithm.

Keywords - Big data; Data Extraction; segregation; E­ health insurance; privacy

I.

INTRODUCTION

Now a day's internet plays a key role for access the data, share the data and upload the data. To access the information from internet, need to maintain the database's, while sharing the data duplicate data will be generated, and while upload the data database need space to save uploading information. So daily hundreds and thousands

petabytes of data are generating, here a new era is started that is big data, if you see the growth of data size in last couple years we generate 90 percent of data in world. Social networks play a key role in data generation. In twitter every one tweets their own opinions and share interests. By collecting this information data analysis will be done and identify the user interests, for example 2years before a discussion happened on people's Medicare and their vouchers an open debate between Barack Obama and Mitt Romney in America within a couple of hours ten millions of tweets are triggered and its revealed the public interest, this type of online debates leads to senses the public interest and gives feedback. Health insurance companies started working on insurance datasets consists of patients information ,and hospital data and doctor's office information, these all bringing into one data stream called as [ll],electronic medical record (EMR). In hospital data server the patient information is stored, every visit of particular patient generates different types of data elements, which consists of personal information, medical details, invoice sununary, blood test results, x-ray images and billing details. The information which is collected in hospital's needs to validate and combined into a data pools with large size for meaningful analysis. Using health processing system we need to mUltiply all the

2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS -2016), Jan. 22 & 23, 2016, Coimbatore, INDIA

patients' swnmary and joining details in hospital need to combine with huge number of points where data is created and store in database. In this stage big data challenges are begin to work. We are collecting structured data from one source and unstructured data in another type of source, both structured and unstructured data forms semi structured data type source. So we are handling different type of structure information. We see well defined structured data in health level 7 (HL7) messaging standards contains e-health information. Communications in Medicine and Digital Imaging provides semi-structured data it deals exchanges over networks with radiology image.

Disease analysis

Fig: 1 Architecture Diagram

II.

RELATED WORK

The electronic health record (EHR) [2], collects all type of information's from different data streams like patient's records from doctor's office, hospital records, and insurance companies. when patient joined for treatment in hospital it generate thousands of data elements including personal information , digital images, medicine supplies , lab reports about blood test and billing charges summary. These records are not efficiently validated, integrated or processed into a large data pools. So it is not properly analysis. For summation form so many programming frameworks are present, but all are inspired by functional programming construct called map­ reduce[4], which is used by Google search engine .Google's uses map-reduce technique in their company ,for parallel programming . So same construct are used in machine learning for multicore use. For over clusters which have unreliable communication Google uses map-reduce it specialized for it.

But in machine learning they developed a light weight architect [3] for multicores using map-reduce techniques. In traditional way data is loading into one warehouse; but normally it is not possible because we used data from different platforms like insurance company, labs and agencies, where every one of them need particular data and all are not stored data which related to them. But big data breaks the normal traditional model. Big data flow is based on collecting nodes from within and outside of the enterprises, for this data federation is a way. Where data is collected from layer and integrate logic and data [5]. Privacy providing and security is main challenge in data storage of sensitive data. For E-health care segmentation of data for each firm is a major concern in developing big data solution. So data collecting from different sectors relate to E-health care stored into one window. For further conclusion [9], of correlating and extracting the previous data is difficult. These challenges leads to dealing with interrelated of data. In structured, organizing , decentralized, and own healing approach is required for big size of name memory, which creates millions of users and billons of directories and files that reflects big challenge for [10] metadata services. The distributed hash table based on Name space architecture shows solution to achieve more scalability, security, load balancing, single point of failure, more availability and quality of service. Big Data Technologies:

Infmispan is platform independent. It is a tool which is developed by Red hat [8]. Java is used for library files and it is a Data Grid type. Apache software license by version called 2.0. Infinispan is the new version of Jboss.

Injinispan:

In user program first Map -reduce library divides the input into M - parts which consist of 16MB to 64 MB size. The Master is the special program among those copies. The master selects idle one among M tasks from that reduce R tasks [6]. Master node watches the reducer work to read the data which is in buffer from map workers, local disks. By using the remote procedure calls. It sorts by keys called intermediate. Sorting is required because same similar keywords are together. When all tasks are completed like map task and reducer tasks, the master node start up the program of users. The Map­ Reduce call in the program of user which return backs to code of users.

Map-Reduce:

2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS -2016), Jan. 22 & 23, 2016, Coimbatore, INDIA

The output came from map-reduce technique is shown to patient. Among those results the feedback will be calculated and success rate is compared. Preserving data: The data which consist of results are saved for further analysis of particular disease. Whenever new patient wants to search the procedure and treatment information then it shows the related information. Knowledge Extraction: The patient search for particular disease, then information is extracted from previous preserving data. Which consist of success rate and failure rate. Reallocation of Data: Whenever patient claims for insurance, based on previous approached methods. That information is stored or reallocation of data is done. In the implementation part, here we are using private data encoding algorithm in E-Health Insurance for protecting private records of patients while data extraction through big data. Presenting Result:

Fig: 2 Flow Diagram

III.

IMPLEMENTAnON

In implementation part the information collected from hospital datasets consists of patient's information which consist of disease information, treatment procedure and invoice information [7]. After receiving information about patient in hospital it sends to insurance company. Insurance company analyze the data using Big data frame work called Infinispan framework and with help of map-reduce technique we extract use full information using mapping method . The resulted set of data consist of particular disease is showed. After completion of mapping technique, reducer part will be done. In reducer technique based on feedback and success rate result will be shown to patient the flow of implementation work shown in fig.2 as follows, The patient information is collected in hospital database. In hospital, the patient account is created which consist of all the information like doctors report, disease, symptoms, treatment history and discharge summary. Collecting Data:

The collected information of patient and hospital data is accessed by insurance company admin and using big data framework called Infinispan and map-reduce technique is applied.

Data Analysis:

Private data encoding algorithm is considered to convert any binary records, a cluster of bytes, into a stream of 128 printable types. There are character set map by the Private data encoding. The Private data encoding process is takes place in following steps: Step 1

- Split the input bytes stream into chunks of3

bytes. Step 2

-

Split 24 bits of each 3-byte chunk into 4 sets

of 6 bits. Step 3

Plot each cluster of 6 bits to 1 printable character, based on the 6-bit value using the Base128 -

set map. Step 4

-

If the last 3-byte chunk has only 1 byte of

input, pad 2 bytes of nil (\0000). Later converting it as an ordinary chunk, overrule the preceding 2 appeals with 2 equal signs ( ) so the cracking procedure identifies 2 bytes of zero were prolonged. ==

,

Step 5

Uncertainty the previous 3-byte chunk has only 2 bytes of input records, pad 1 byte of zero. -

After encrypting it as a normal chunk, override the last 1 character with 1 equal signs ( ) so the =

,

decoding process knows 1 byte of zero was expanded. Step 6

Otherwise return (\r) and new line (\n) are implanted into the output character. They will be -

unnoticed by the decoding approach.

2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS -2016), Jan. 22 & 23, 2016, Coimbatore, INDIA

Algorithm (Private Data encoding)

Input:

E-health insurance gives effective implementation of data extraction and segregation is done.

Entire patients' data submitted through forms

REFERENCES

by Hospital agents.

[1].Xindong Wu, Fellow, IEEE, Xingquan Zhu, Senior Member,

Encoded patient information in server, patient's private data is disclosed for insurance agents Output:

String encryptedString = null; IIEncoding byte[] encodedBytes = Base128.encodeBase128(unencryptedString.getBytes ()); encryptedString = new String(encodedBytes); String decryptedText=null; IIDecoding byte[] encodedBytes =encryptedString.getBytesO; decryptedText = new String(Base64.decodeBase64(encodedBytes));

IEEE, Gong-Qing Wu, and Wei Ding, Senior Member, IEEE," . Data Mining with Big Data ", IEEE Transactions On Knowledge and Data Engineering, VOL. 26, NO. I, January 2014.

[2] S.L. Jany Shabu and Manoj Kumar.K, "Preserving User's Privacy in Personalized Search," International Journal of Applied Engineering Research (IJAER), Vol. 9, no. 22, pp. 16269-16276, 2014

[3] 1.

Dean and

S. Ghemawat. Mapreduce: Simplified data

processing on large clusters. Operating Systems Design and Implementation, pages 137-149, 2004. [4] Cheng-Tao Chu

and Sang Kyun Kim "Map-Reduce for

Machine Learning on Multicore". [5]

W. Liu and E.K.

Framework",

IEEE

Park, "e-Healthcare Security Solution

International

Communication Networks,

Conference

on

Computer

MobiPST-2012, Munich, Germany,

August 2012. [6] Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters", OSDI '04: 6th Symposium on

Summary

Operating

Systems

Design

and

Implementation

USENIX

Association. [7] Carolyn McGregor, "Big Data in Neonatal Intensive Care",

The Private Data Encoding algorithm is simple to implement and using this mechanism while submitting patients' data to database servers, it transforms the entire patient's data into encoded format and that transformed patient data is stored in the servers. By this approach, private information is disclosed. Some non-sensitive information can be exposed.

University of Ontario Institute of Technology, Canada, IEEE Eng in Medicine and Biology Magazine, Published in June 2013. [8]lnfinispan-Wikipedia

source

and

link

is"

and

Secure

http://en.wikipedia.orglwiki/lnfinispan''. [9]

D.Manivannanl,

R.Sujarani2,"Light

Weight

Database Encryption Using TSFS Algorithm", [10] Harcharan Jit Singh ,V. P. Singh ,"High Scalability of HDFS using Distributed Namespace", International Journal of Computer Applications (0975 - 8887) Volume 52- No.17, August 2012. [11] K.Manoj Kumar and M.Vikram, "Disclosure of User's Profile in

Personalized

Search

for

Enhanced

Privacy,"

International

Journal of Applied Engineering Research (IJAER), Vol. 10, no. 16, pp. 36358-36363, 2015

IV.

CONCLUSION

Health care management consist of Patients and doctor's information. Insurance Agencies are linked with several hospitals, they receive huge data from hospital data base. So they are actively participating for the Analysis of Patient's Data and used to Extraction of Useful Information. In the implementation, big data technologies are used i.e. Infinispan and map-reducer techniques for analyzing more number of factors like patient information and disease type and symptoms. For particular diseases which treatment is given by doctor and based on that patient needs to give feedback to the insurance company for claims. From that feedback success rates and failures are calculated using map-reduce technique and extract the disease based mapping is done. Now output came from mapping is input to reducer. Reducer method is used for calculating the success and failure rate. Finally the new patient can see the status of particular diseases which hospital providing better treatment and also check the cost they are charging. Hence using Infinispan and Map­ reducer concepts, the analysis of insurance datasets in

K.MANOJ KUMAR, working as

Assistant Professor in department of CSE in Sri Venkateswara College of Engineering, Tirupati. He completed his M.E (CSE) Degree in Sathyabama University,

Chennai. His Research

interests includes

Data Mining,

Privacy, Big Data and Web Search Engines

TEJASREE.S, working as Assistant

Professor in department of CSE in Sri Venkateswara College of Engineering, Tirupati. She completed her M.Tech (CSE) Degree in JNTU Ananthapuf. Her Research interests includes Data Mining, Web Search Engines and Big Data.

2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS -2016), Jan. 22 & 23, 2016, Coimbatore, INDIA

S.SWARNALA THA,

Professor

in

working

department

of

as IT

Assistant in

Sri

Venkateswara College of Engineering, Tirllpati. She completed her M.Tech (CSE) Degree in JNTU

Ananthapur.

Her

Research

interests

includes Data Mining, Web Search Engines and Networking.

Suggest Documents