Towards Development of Health Data Warehouse ... - IEEE Xplore

2nd Int'l Conf. on Electrical Engineering and Information & Communication Technology (ICEEICT) 2015 Iahangirnagar University, Dhaka-1342, Bangladesh, 21-23 May 2015

Towards Development of Health Data Warehouse: Bangladesh Perspective Shahidul Islam Khan; Abu Sayed Md. Latiful Hoque Department of Computer Science and Engineering (CSE) Bangladesh University of Engineering and Technology (BUET) Dhaka, Bangladesh [email protected]; [email protected]

Abstract-Health informatics is one of the top most focuses of

cases with different definitions and structures in information

researchers now a days. Availability of timely and accurate data

system (structured data) [I]-[8]. For the above reason Clinical

is essential for informed medical decision making. Health care

Data Stores (CDS) with isolated information islands across

organizations face a common problem with large amount of data

various

they have in numerous systems. Such systems are unstructured

administrative processes,

and

unorganized,

integration.

requires

Researchers,

computational

medical

time

practitioners,

for

health

data care

providers and patients will not be able to utilize the knowledge stored in different repositories unless synthesize the information from disparate sources. This problem can be solved by Data warehousing. Data warehousing techniques share a common set of tasks, include requirements analysis, data design, architectural design,

implementation

and deployment.

Developing

Clinical

data warehouse is complex and time consuming but is essential to

hospitals,

departments,

laboratories

and

related

which are time consuming and

demanding reliable integration [7]. Data required to make informed medical decisions are trapped within fragmented, disparate,

and

heterogeneous

clinical

and

administrative

systems that are not properly integrated or fully utilized. Ultimately, health care suffer because medical practitioners and health care providers are unable to access and use this information to perform activities such as diagnostics, and treatment optimization to improve patient care.

deliver quality patient care. Data integration tasks of medical data store are much challenging when designing clinical data warehouse architecture. This research identifies prospects and complexities

of

Health

data

warehousing

and

Mining

in

Bangladesh perspective and proposes a data-warehousing model suitable for integrating data from different health care sources.

Keywords-Data Warehouse; Data Mining; Pathological Data; Health Data Preprocessing;

I.

INTRODUCTION

Informed decision making in health care is vital to provide timely, accurate, and relevant advice to the right person, to reduce the cost of health care and to improve the overall quality of health care services. Since medical decisions are very complex, making choices about medical decision-making processes, procedures and treatments can be overwhelming One of the major Information Technology challenge in clinical practice

is

how

to

integrate

several

disparate,

isolated

information repositories into a single logical repository. A massive amount of health records, related documents and medical images created by clinical diagnostic equipment are generated daily. Medical documents are owned by different hospitals, health professionals and patients. These valuable data are stored in various medical information systems such as HIS (Hospital Information System), PACS (Picture Archiving and Communications System), RIS (Radiology Information System) in various hospitals, departments and

diagnostic

laboratories. There are not only a tremendous volume of imaging

files (unstructured data), but also many medical

information such as medical records, diagnosis reports and This research is supported by the ICT Division, Ministry of Posts, Telecommunications and Information Technology, Government of the People's Republic of Bangladesh.

978-1-4673-6676-2/15/$31.00 ©2015 IEEE

Fig. l. Components of health data

Clinical or Health data refers to any information that is contained in a patient's medical record.

This information

may

from

be acquired

admission

from

notes

or a doctor's visit.

forms such as

text

or

derived

numbers

demographics, history,

laboratory

digital

EEG,

signals (ECG,

(histological, radiological, Furthermore, clinical

(patient data,

EMG,

ultrasound,

studies

a

hospital

This data comes in various

involve

identification,

etc),

analog

or

ENG

etc),

images

etc),

and

videos.

specimen

collection

from multiple patients. Further complicating the storage of this data

is

the

fact

that

because

patient

information cannot be publicly used.

identification

Health data are very

complex in nature. Fig. I represents the various components of health data. [3], [6], [9], [13].

The advantages and disadvantages of data warehousing are given below [8], [13]: Advantages: 1. Allow existing systems to continue in operation without

Successful healthcare data management is an important factor in developing support systems for the clinical decision making process. Traditional database systems do not satisfy the requirements for critical data analysis tasks of the clinical decision-making users. An operational database supports daily data transactional operations. Operational databases contain detailed data but they do not include important historical

any modification 2.

Consolidate unstructured data from various legacy

systems into one standard set 3. Improve quality of data and service 4. Allow users to retrieve necessary data by themselves

patient data, and since they are usually highly normalized,

Disadvantage:

they perform poorly for complex queries that need to join

I. Oevelopment cost and time

many relational tables or to aggregate large volumes of data in order to generate various clinical reports. This aggravates the data fragmentation. A clinical data warehouse is a data store that is different from the hospital's operational databases. It can be used for the analysis of consolidated historical data. Implementing a Oata Warehouse (OW) is a complex task containing two major phases. Firstly, in the configuration phase, a conceptual view of the warehouse is first specified

The main aim of this research is to identify the main obstacles for healthcare data integration and proposing a data warehousing model suitable for integrating fragmented data in respect to Bangladesh.

The result will contribute to the

advancement of knowledge in the field of medical informatics. The proposed data warehousing model will help in efficient medical decision making process.

according to user requirements (DW design). Secondly, the

The rest of this paper is organized as follows. In Section II

related data sources and the Extraction-Load- Transform

we have presented selected literature reviews on OW and

(ETL) process (data acquisition) are determined.

Finally,

Health DW. Section III describes briefly some design issues of

decisions about persistent storage of the warehouse using

National Health OW. In Section IV we have shown the

database technology and the various ways data will be

calculation of approximate size of our DW. Section V gives

accessed during analysis are made. After the initial load (the

readers ideas about how our OW will be used for knowledge

first load of the DW according to the configuration), during

discovery. Finally Section VI concludes the paper.

the

operation

phase,

warehouse

data

must

be

regularly

refreshed, i. e. , modifications of operational data since the last

II.

LITERATURE REVIEW

OW refreshment must be propagated into the warehouse such that data stored in the data warehouse reflect the state of the underlying operational systems. All the above steps must be carefully applied to fragmented healthcare data to guarantee the efficient and reliable retrieval at the point of care[5], [13]. According to Inmon[8] A data warehouse is a subject

Health informatics or healthcare informatics is an intersection of computer science and health care services. It deals with resources and methods needed to optimize the acquisition, storage, retrieval and use of information in medical research and applied to the areas of health care management, diagnosis,

oriented, integrated, non-volatile, and time-variant collection

clinical care, pharmacy, nursing and public health. Knowledge

of data in support of management's decisions.

discovery is the non-trivial process of identifying valid, novel,

The data

potentially useful and ultimately understandable patterns in

warehouse contains granular corporate data. •

Subject-oriented:

Classical operations systems are

organized around the applications of the company. Each company has its own unique set of subjects. •

Integrated:

is converted, reformatted, re-sequenced, summarized, and so forth. Non-volatile:

one of

the

steps in the process

of

learning algorithms that produce a particular enumeration of

A

Oata

Warehouse

(OW)

unifies

the

data

scattered

throughout an organization into a single centralized data structure

with

a

common

format.

It

is

a

repository

of

integrated information, available for querying and analysis [5]. Data warehouse data is loaded and

Thus, data warehousing may be considered a "proactive"

accessed, but it is not updated. Instead, when data in

approach to information integration, as compared to the more

the data warehouse is loaded, it is loaded in a

traditional

snapshot, static format. When subsequent changes

integration starts when a query arrives. [6].

occur, a new snapshot record is written. •

Data mining,

knowledge discovery, consists of applying data analysis and patterns over the data [1-4].

Oata is fed from multiple disparate

sources into the data warehouse. As the data is fed it

•

data.

Time-variant:

Every

unit

of

data

"passive"

approaches

where

processing

and

A medical data warehouse is a repository where healthcare in

the

data

warehouse is accurate as of some one moment in time. In some cases, a record is time stamped. In other cases, a record has a date of transaction.

providers can gain access to medical data gathered in the patient care process. Extracting medical domain information to a data warehouse can facilitate efficient storage, enhances timely analysis and increases the quality of real time decision making processes. Currently medical data warehouses need to

address the issues of data location, technical platforms, and

providing low-cost screening using disease models that require

data formats; organizational behaviors on processing the data

easily-obtained attributes from previous cases. It can perform

and culture across the data management population. Today's

automated analysis of pathological signals (ECG/ EMG), and

healthcare organizations require not only the quality and

medical images (CT scan, Ultrasonography etc.) [3], [15].

effectiveness of their treatment, but also reduction of waste and unnecessary costs. By effectively leveraging enterprise wide

data

on

labor

expenditures,

supply

utilization,

procedures, medications prescribed, and other costs associated with patient care, healthcare professionals can identify and correct wasteful practices and unnecessary expenditures[7].

Medical domain has certain unique data requirements such as high volumes of unstructured data (e.g. digital image files, voice

clips,

radiology

confidentiality.

information,

Data

warehousing

etc.)

and

models

data should

accommodate these unique needs. There are huge constraints and issues that limit the way the data mining is performed for

A Clinical Data Warehouse (COW) can facilitate efficient

medical datasets. Some of these issues are the way the data is

storage, enhances timely analysis and increases the quality of

collected; accuracy of the data, ethical, privacy and social

real time decision making processes.

Such methodologies

issues that comes with patient's records [15-16]. Agrawal et

share a common set of tasks, including business requirements

al.

analysis, data design, architecture design, implementation and

problems which were further divided into four classes in [18].

deployment[8].

They are classification, clustering, association and Sequences.

The

COW

is

a

place

where

healthcare

in [17] proposed three basic classes of data mining

providers can gain access to clinical data gathered in the patient care process. It is also anticipated that such data warehouse may provide information to users in areas ranging from research to management. The idea of using a

OW

for

accessing clinical data is not new, but there is no easy solution.

Clinical

information

systems,

like

any

other

enterprise information system, are diverse and individual. Even if based on a standard solution there has so to be done

Research is also done to find out impact of missing values and explore the impact of noise and how this can influence the output.

Zhu et al.

classified noises into class noise and

attributes noise. The research mainly concentrated on attribute noise it is more difficult to handle. Attribute noise include incorrect attribute values, missing or don't know attribute values and incomplete attributes or don't care values [19].

much customizing to fit the existing clinical processes and

Several researches have focused on the techniques that

documentation, so that there is no standardization yet to be

have built in mechanism to handle noise and missing values

usable for different hospitals.

In order to construct an

and

which

are

more

appropriate

to

use

for

medical

operational and effective OW it is essential to combine

applications. Few techniques that have been applied and are

process work, domain expertise and high quality database

more suited to medical data sets are studied in [20-21]. For

design [9]. This seems to correspond to concurring literature,

example decision tree, logic programs, K-nearest neighbour,

where we found either a holistic approach with theoretical

and Bayesian classifiers. Lee et al recommended that Bayesian

conceptualization but lacking practical implementation [10] or

networks and decision trees are the primary techniques applied

practical implementation

focused on one single domain, i. e.

lacking generalizability of such concepts [II].

in medical information systems [22]. Obenshain claimed that that neural networks performed better then logistic regression,

Electronic Health Records (EHRs) which describe the diseases and treatments of patients are normally stored in the

but the decision tree did better in identify active compounds most likely to have biological activity [23].

hospitals or clinics, where they are created. However, patients

Wang and Wang claimed that most process models focus

may be treated in different hospitals, clinics and, therefore,

on the results but not in gaining new knowledge. Medical data

there is a need for integrating health records from different

mining applications is expected to discover new knowledge

hospitals to enable any hospital to obtain a total overview of a

and should follow a five stage data mining development cycle:

patient's health history[12].

planning tasks, developing data mining hypotheses, preparing

The trend of adopting data warehouses for health systems in presented in [14], where the design experience in the University of Virginia Health System is reported. Here the

data, selecting data mining tools, and evaluating data mining resu Its [24]. Handling Missing Data in Pathology Databases using

data warehouse is used to provide clinicians and researchers

Multiple

with

Optimizing data collection for public health using feature

direct,

rapid

access

to

retrospective

clinical

and

Imputation

technique

is

discussed

in

[25].

administrative patients' data. In addition, they use the data

selection is studied in [26]. Cubillas et. al. proposed a model

warehouse also for educational and research aims.

for improvement in appointment scheduling in health care

Discovering patterns in medical datasets is difficult and challenging task but also rewarding [3]. Recently data mining techniques have improved significantly but it does not benefit the health professionals until the mined data is turned into

centers [27]. Hoque et. al. discussed present structure of pathological data, requirements to formulate efficient models and

the

necessity

to

reform

the

present

structure

for

predicative data mining in [28].

useful information. By using effective data mining tools and

Kumari and Singh used Neural Network{or the diagnosis

algorithms and a step by step data mining process it is possible

of diabetes [29]. Yilmaz et. at. proposed a modified K-means

to produce useful and new information from the dataset [4].

Algorithm based data preparation method for diagnosis of

With data mining the screening, diagnosis and detection of diseases may get more efficient by reducing both time and costs for the corresponding procedures. It can be effective in

heart and diabetes diseases [30]. Herland et. at. present recent research using Big Data tools and approaches for the analysis of Health Informatics [31].

III. SYSTEM DESIGN The architecture of National DW model is illustrated in

hospitals,

diagnostic

centers

and

other

sources,

different

Fig. 2. Health data from different govt. and private sources

preprocessing steps will be executed on data such as cleaning,

such as hospitals, clinics, diagnostic centers, research centers

noise

will be collected. Then data preprocessing, one of the major

Extraction-Load-Transform (ETL) process, the preprocessed

task for developing a DW from heterogeneous sources, will be

data will be integrated into a temporary repository. After that

performed.

data will be loaded into DW. Online Analytical Processing

[t

includes

data

cleaning,

missing

values

reduction

queries

and

and

normalization

imputation, normalization, transformation etc. As for National

(OLAP)

Health DW, data are coming from different public and private

performed over the pathological DW.

mining

techniques.

operations

can

be

Using

easily

Fig. 2. Brief Architecture of Data Warehouse 4D Pathological data cube that will be used for national DW development is shown in Fig. 3. Here O-D apex cube will provide highest level of summarization of national health data and 4-D base cube contains lowest level of summarization. Partial materialization is used rather than full materialization of cuboids to reduce huge space requirements.

involve multiple join operations. Most suitable Logical Data Warehousing Model is the Star Schema[13], [14], [24]. We have used

Star

requires little more space than Snowflake but Star Schema is much faster to response complex join queries.

Using the building blocks of the fact table and the various dimension tables, one has thousands of ways to aggregate the data.

For

clinical

analysis

purposes,

frequently

needed

aggregated datasets should be created in advance for the users. Having data readily and easily available is a major tenet of data warehousing. For our DW, some aggregated datasets could be: •

Patient count by Diagnosis, Gender, Age, Date

•

Count of Procedures by Provider and Date

•

Billing and discount information.

•

Count of retesting

Logical design of DW involves the definition of structures that enable an efficient access to information. There are many logical models like Flat schema, Star schema, Star Cluster schema, Snowflake schema, Fact Constellation schema etc. Among

them,

constellation

star

schema,

schema

are

snowflake mostly

schema

used

Schema in our design which is

illustrated in Fig. 4. Because of de-normalization, Star Schema

and

fact

commercially.

Efficiency is the most important factor in DW modeling because many queries access large amounts of data that may

Fig. 3. Pathological Data Cube

shown

in

Fig.

4.

Our

METADATA FOR TYP E ID GENERATION

TABLE!.

The major fact and dimension tables used in the OW development process are

sample

Colour

Sample_Count

Type_ID

the fact table and up to 1,88,000 records in each of the

Straw

9439

0

different dimension tables. Fact table records will grow very

Yellow

58

1

pathological data repository currently has 12, I 0,000 records in

fast comparing dimension tables' records and fact table records are dominant to calculate the size of the OW with time.

Reddish

32

2

L. Yellow

29

3

Milkly white

2

4

D. Yellow

2

5

Yellowish white

I

6

Reddish black

I

7

L.Reddish

I

8

Hazy

1

9

v. USE OF NATIONAL HEALTH OW

National Health OW can be used in many ways to improve national health standard, to provide better and prompt services to the patients and to facilitate health related research among the doctors, clinical researchers etc. Following is an example of the use of National Health OW.

Fraud Testing Awareness

Fig. 4. Fact Table and Dimension Tables

From OW we can easily find out support and confidence values of different attributes and generate association rules.

IV. DW SIZE-CASE STUDY According

to Directorate General

of

For example Health

Services

(DGHS) under the Ministry of Health and Family Welfare (MoHFW):

Age(X, Negative (X, T3)

[Support =67%, Confidence=96%]

Bangladesh Private Clinic and Diagnostic Owners Association

From support and confidence values it can be clearly

(BPCDOA), 8,000 diagnostic centers of the country have

identified that this test T3 has almost no impact of disease

DGHS approvals till December 2014.[32], [33], [34], [35]. So

diagnosis [36]. National awareness can be developed not to

minimum number of places where pathological tests are

perform the test at initial level for Young Males.

performed is 8717. If for simplicity of calculation we consider that average 500 patients go for diagnosis in each place every day and each patient go through 3 tests and each test has 5 attributes then:

In this way many other interesting patterns can be derived from National Health DW by using various data mining algorithms like association or clustering.

R= 8717 X 500 X 3 X 5 = 65377500 records per day. So more than 65 million records will be added in the fact tables of National Health OW. Considering 1 record takes O. IKE memory space than the DW will consume 6.23GB

memory each day. It is certainly falls under big data category [31] and Bangladesh Government should go for Cloud storage and services for this. A case of metadata generation for the test result of Urine

Colour Diagnosis is shown in Table I. This metadata is used in the normalization process of the data preprocessing step of National Health OW development.

VI. CONCLUSION This paper presented the developmental stages

of a DW

platform for the management, processing and analysis of large-scale Health data.

it summarizes the experience in

designing a national health data warehouse modeled for e health system. In this paper, widely accepted conceptual and logical

design

Considering

approaches

the

in

quality

OW

factors

design and

are

the

discussed. information

requirements, Star schema was chosen as the most suitable logical model for the purpose. Establishing a data warehouse gathering huge data from existing Health databases should give easier and better access to interesting data for both researchers,

health

service

providers and govt. authorities. In order to get maximum benefit

from

the

model

presented

in

this

research

in

be

Platt, S. Cohen, and W. a. Knaus, "Data mining and clinical data repositories: Insights from a 667,000 patient data set," Comput. BioI. Med.,vol. 36,pp. 1351-1377,2006.

There should be a document which clearly defines the

[15] K. Cios, "Uniqueness of medical data mining," Artificial intelligence in medicine, vol. 26,no. 1-2,pp. 1-24,2002.

Bangladesh,

the

conditions

mentioned

below

should

satisfied. I.

structure of the data tables currently used by the concerned

[16] A loham,"Discover Patterns in Adverse Drug Reaction," University of South Australia,B.Sc. Engineering Thesis,2010.

Pathological centers. 2. The stakeholders should clearly know what are the data retrieval operations going to be executed using the data warehouse. 3. There should be a cooperative mind among different health service providers to help the governmental bodies such as

Ministry of Health and Family Welfare and Ministry of

Posts, Telecommunications and Information Technology to enrich the health data warehouse so that the Govt. can achieve the

goal

of

Health,

Population

and

Nutrition

Sector

Development Program (HPNSDP 2011-2016). REFERENCES

[17] R. Agrawal, T. Imielinski, and A Swami, "Database Mining: A Performance Perspective," IEEE Transactions on Knowledge and Data Engineering 5(6): pp.914-925,1993. [18] A L. Houston, "Medical Data Mining on the Internet: Research on a Cancer Information System," Artificial Intelligence Review 13: pp.437466,1999. [19] X. Zhu, T. Khoshgoftaar, 1. Davidson, and S. Zhang, "Editorial: Special issue on mining low-quality data," Knowledge and Information Systems, vol. I I ,no. 2,pp. 131-136,2007. [20] M.L. Brown and J.F. Kros, "Data mining and the impact of missing data," Industrial Management and Data Systems, vol. 103, pp. 611-621, 2003. [21] N. Lavrai:, "Selected techniques for data mining in medicine, Artificial intelligence in medicine," vol. 16,no. I, pp. 3-23,1999.

[1]

U. M. Fayyad, G. P. Shapiro, and P. Smyth, "From Data Mining to Knowledge Discovery: An Overview," Advances in Knowledge Discovery and Data Mining,1-36. AAAI Press/MIT Press,1996.

[22] I.N. Lee,S.C. Liao,and M. Embrechts, "Data mining techniques applied to medical information," Medical Informatics and the Internet in Medicine,vol. 25,no. 2,pp. 81-102,2000.

[2]

R. Khosla and T. Dillon, "Knowledge Discovery, Data Mining and Hybrid Systems," In Engineering Intelligent Hybrid Multi-Agent Systems,Kluwer Academic Publishers pp.143-177,1997.

[23] M.K. Obenshain, "Application of Data Mining Techniques to Healthcare Data," Infection Control and Hospital Epidemiology, vo1.25, no 8, pp. 690-695,2004

[3]

J.F.Roddick, P. Fule, and W.J. Graco, Exploratory medical knowledge discovery: experiences and issues, SIGKDD Explor. Newsl., vol. 5, no. I, pp. 94-99,2003.

[24] H. Wang, and S. Wang, "Medical knowledge acquisition through data mining," ITME 2008. IEEE International Symposium on,Xiamen,2008.

[4]

AM. Wilson, L. Thabane and A Holbrook, Application of data mining techniques in pharmacovigilance, British Journal of Clinical Pharmacology,vol. 57,no. 2,pp. 127-134,2003.

[5]

[6]

[7]

W.H. Inmon,EIS and the data warehouse: a simple approach to building an effective foundation for EIS. Database Programming and Design, 5(11): 70-73,1992. N .Stolba, M. Banek, and AM. Tjoa, "The Security Issue of Federated Data Warehouses in the Area of Evidence-Based Medicine," Proc. of the First International Conference on Availability, Reliability and Security (ARES'06,IEEE),20-22 April,2006. T. R. Sahama, and P. R. Croll, "A Data Warehouse Architecture for Clinical Data Warehousing," In Roddick, J. F. and Warren, J. R., Eds. Proceedings Australasian Workshop on Health Knowledge Management and Discovery (HKMD 2007) CRPIT, 68, pages pp. 227-232, Ballarat, Victoria.

[25] S. Faisal, "Missing Data in Pathology Databases," MSc Thesis, Australian National University, 2011. [26] S. N. Partington, V. Papakroni, and T. Menzies, "Optimizing data collection for public health decisions: a data mining approach.," BMC Public Health, vol. 14,p. 593,2014. [27] J. J. Cubillas,M. 1. Ramos, F. R. Feito, and T. Urena, "An improvement in the appointment scheduling in primary health care centers using data mining.," J. Med. Syst.,Springer,vol. 38,p. 89,2014 [28] AS.M. L Hoque,S. Galib and M. Tasnim,"Mining Pathological Data to Support Medical Diagnostics," Proc.s of Workshop on Advances on Data Management: Applications and Algorithms, Department of Computer Science and Engineering,BUET,Dhaka,pages 71-74,2013. [29] S. Kumari and A Singh, "A data mining approach for the diagnosis of diabetes mellitus," IEEE 7th International Conference on Intelligent Systems and Control (ISCO),2013.

[8]

W. Inmon, Building the Data Warehouse, 3rd edition, Wiley-New York, 2002.

[30] N. Yilmaz,O. Inan, and M. S. Uzer, "A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases," J. Med. Syst.,Springer,vol. 38,2014

[9]

J. A Lyman, K. Scully, and J.H. Harrison Jr, "The development of health care data warehouses to support data mining," Clin Lab Med. 2008 Mar;28(l):55-71,vi.

[31] M. Herland, T. M. Khoshgoftaar, and R. Wald, "A review of data mining using big data in health informatics," J. Big Data,Springer, vol. I, p. 2, 2014.

[10] R. Kush, L. Alschuler, R. Ruggeri, S. Cassells, N. Gupta, L. Bain, K. Claise, M. Shah, and M. Nahm, "Implementing Single Source: the STARBRITE proof-of-concept study," J. Am. Med. Inform. Assoc. 14: 662-673,2007.

[32] HEALTH BULLETIN 2014,2nd Edition, Directorate General of Health Services, Ministry of Health and Family Welfare, Government of the People's Republic of Bangladesh,December 2014.

[11] c.J. Brooks, J.W. Stephen, D.E. Price, D.V. Ford, R.A. Lyons, S.L. Prior, and S.c. Bain, "Use of a patient linked data warehouse to facilitate diabetes trial recruitment from primary care," Prim Care Diabetes. 2009 Nov;3(4):245-8. Epub 2009 lui 14. [12]

G. Fette, M. Ertl, A Worner, P. Kluegl, S. Stork, and F. Puppe, "Information Extraction from Unstructured Electronic Health Records and Integration into a Data Warehouse," unpublished

[13] S. Nugawela, "Data Warehousing Model For Integrating Fragmented Electronic Health Records From Disparate And Heterogeneous Clinical Data Stores," M.Sc. Thesis, Queensland University of Technology, 2013. [14] I. M. Mullins, M. S. Siadaty, J. Lyman, K. Scully, C. T. Garrett, W. Greg Miller, R. Muller, B. Robson, C. Apte, S. Weiss, 1. Rigoutsos, D.

[33] http://www.dghs.gov.bdlindex.php/enlhealth-program-progress/hpnsdp20 11-16/84-english-root/ehealth-eservice/497-hpnsdp-20 11-16-brief Accessed 20 Feb 2015 [34] http://www.bpcdoa.com/clinics_and_diagnostics.htmI Accessed 20 Feb 2015 [35] http://www.thefinancialexpress-bd.com/2014!l2!l5/71077/print Accessed 20 Feb 2015. [36] H. Jiawei, K. Micheline, and P../ian, Data Mining Concepts and Techniques 3rd Edition,2012,Elsevier.