Performance Improvements to a Large Scale Public ...

4 downloads 84180 Views 466KB Size Report
A Case Study of a Large Public Health Data and Analytics. Program for a ... counting of death to more advanced data collection and comprehensive information.
Performance Improvements to a Large Scale Public Health Data and Analytics Platform: A Technical Perspective A Case Study of a Large Public Health Data and Analytics Program for a State Government in India Dr. Arun Sundararaman, Suresh Pargunarajan and Srinivasan Valady Ramanathan Health Analytics Solution Factory, Accenture, India {arun.sundararaman, suresh.pargunarajan, srinivas.ramanathan}@accenture.com

Abstract. Public Health planning and administration comprises 3 broad functions viz., health needs assessments, policy development, and administration of services. Health Informatics plays a significant role for effectiveness of each of these functions. This paper presents practical technical learnings from implementation of performance improvement measures for a large Open Source based Data Analytics platform for Public Health. This paper is unique in its attempt to categorize the problem on the above lines. The authors present a study of a real-life large data warehouse and public health data mining implementation that has faced significant challenges and discuss solutions to address those challenges. With rapidly changing technology landscape, specifically in data and analytics space, dynamic changes to population composition and the resultant expectations from public health, research focused on public health data integration and data mining needs higher attention. The paper concludes with summary of best practices discussed and future directions. Keywords: health informatics . public health metrics . KPI . data mining . insights . challenges . solutions . implementation . data quality . public health data research . Decision support systems . Open source stack .

1

Introduction

Public Health is the collective action taken by Society to protect and promote the health of the entire population. Public Health is broad and inclusive, although it is often considered from only a narrow medical perspective [1]. Public health accrues innumerable measurable indicators. Key performance indicators represent a set of measures focusing on organizational performance and outcomes that are most critical for the current and future success of the organization. KPIs are highly relevant in management of public health. They act as flags drawing attention where required, bring in accountability, enhance scrutiny, and help channel public resources to areas of need [2]. This paper is organized in 4 Chapters. Chapter 1 discusses literature on the topic of health analytics, public health data collection and analysis. Chapter 2 introduces

the context of the project discussed in detail in this paper as a case study i.e. State Health Data Resource Centre (SHDRC). Chapter 3 lists out the technical challenges faced & solutions implemented in this large scale public health data warehouse. Chapter 4 concludes the discussions with summary of learnings from the case study. Public health informatics necessitates collection of data from primary health care organization to tertiary organization inclusive of different federal and private bodies. The very fact attributes to multiple data sources and standards thereby adding to the complexity of data collection. OSS is rapidly becoming part of more public health applications [3]. The reasons are multifold viz., policy, lower total cost of ownership, version upgrades, auditability, flexibility and freedom. Published literature explains how Public Health Surveillance moved from mere counting of death to more advanced data collection and comprehensive information management as Health Informatics [4]. Another published report lists that the key challenges to be addressed in future will be lack of skilled workforce for analytics, database management, and inadequate computing resources [5]. It is in this backdrop that this paper presents as case study of a large scale data collection and analytics program in public health in India that aligns with published literature on the need for such works and addresses the growing domestic health intelligence expectations while aligning to the global recommendations described above.

2

Overview of State Health Repository

2.1 Program Background Healthcare delivery in India follows a very unique and balanced model. It follows a hybrid concept where it is neither completely market driven as in the US nor is it entirely public funded as in UK or Canada. Healthcare in India is a State subject. Each State gets to set up and run its healthcare delivery infrastructure independently. This is supported by National level programs that focus on a specific disease or a section of the population. The system follows a hub and spoke model for primary, secondary, and tertiary care. This is supplemented by various other networks like those of Medical Education, Revised National Tuberculosis Control Program (RNTCP), Indian Medicine (AYUSH), National AIDS control organization (NACO), ESI (Employees State Insurance), etc. With each of the arms providing a whole range of services, it is difficult to get a comprehensive view of population health statistics; however, each function is necessary for the valuable services they provide and to cater to the high population numbers in the country. Universal access and affordable health are the key drivers currently directing the health set up in the country [6]. While the country is focusing on providing its citizens access to basic healthcare, the need for data and analytics is all the more pertinent. It needs to focus on optimization of resources in the most appropriate manner to get the best program outcomes. As more and more people gain access, it is now encountering a situation called Double Burden of Disease [7] where

it has to handle communicable diseases as well as a growing number of noncommunicable diseases and injuries. The State of Tamil Nadu, the sixth most populous state in India with a population of 72 million [8], has always been proactive in improving its health situation. Today it is one of the healthier states in the country [9]. It has an Infant mortality rate of 21 against the country average of 40, similarly the maternal mortality rate stands at 90 against India’s 178 [10]. The State Government of Tamil Nadu has taken various steps to make its public sector health services more accessible and equitable to the general public and the poor alike through “Tamil Nadu Health Systems Project” (TNHSP). In order to promote research, improve health outcomes, initiate evidence based actions at State, District, Directorate & other Institutions and to enable research insights based policy formulation, TNHSP embarked on a progressive move to establish a comprehensive state-wide health data repository named State Health Data Resource Center (SHDRC). This progressive initiative is funded by The World Bank, State Government of Tamil Nadu, National Rural Health Mission, and Indian Council for Medical Research. Accenture is the System Integration partner responsible for infrastructure planning, design / build and implementation of this large Data Warehouse. Functional and technical Experts from Accenture Health Analytics Solution Factory architected the Solution and were steering end-to-end implementation. Open Source BI was the chosen platform for all layers of the Technical Solution (PostGRE SQL for RDBMS, Pentaho for ETL, Pentaho for Dashboards / Reporting / Visualization and Weka as the data mining and predictive Analytics engine). This project is successfully implemented and currently extensive organization wide adoption is taken up with training and enhancements. 2.2 Program Objectives The primary objectives of establishing SHDRC are listed below:      

Health data consolidation at the state-level from various health data sources. Implement, maintain and support the technical infrastructure required for the state data resource center. Devise and implement strategies for data quality improvement and data validation Develop and maintain health information management related policies and standards. Develop, implement/recommend training/capacity building strategies related to data monitoring and evaluation. Liaise and provide support to various Directorates in ensuring that the mandated data reporting requirements (such as the notifiable diseases reporting) from public and private institutions.



Strengthen health system research through independent research as well by providing mentorship to directorates and other stakeholders at various levels.

2.3 Conceptual view Fig. 1 provides a view of the Data Warehouse System (DWH) built with the various layers from source systems to staging to DWH and end user consumption layer.

Fig. 1. Conceptual view of the DWH System SHDRC

3

Challenges and Solutions

This section covers the challenges faced & solutions implemented in the process of consolidation, cleansing, integration and reporting of data in a large scale public health initiative, based on the implementation experiences of the project being presented as a case study i.e. SHDRC introduced in Section 2. The challenges are categorized as given in Table 1

Table 1. Category of Challenges S. No 1

Category Definition

2

Data governance Design & Performance

3

Description Challenges related to defining the data sets or defining the Key Performance Indicators required to meet the program objectives etc. Challenges related to collection of large scale public health data or their consolidation and integration. Challenges related to application design, solution & information architecture, development and consistency across health directorates; application, database, user interface, network, OLAP cube performance and tuning the former.

3.1 Definition Challenges Requirements. Being a pioneer project in the country, lack of precedence resulted in a huge challenge in arriving at requirements of the project. The health directorates were traditionally used to paper work. The skill and comfort level of the directorate to adapt to information technologies has been an additional effort to them, leaving them to expend very little time for requirement discussions. Data diversity. Data was sourced from 9 different sources and from ~300 web forms in different formats. ~900 Extraction/Transformation and loading routines are used for cleaning and integrating data. Data Volume. 5000+ indicators were rolled up into 300 KPIs. 3+ TB of data was sourced from 2200 hospitals and other institutions serving ~5 million beneficiaries. Dataset Definition. The challenge was to decide how to group the incoming data to facilitate storage as well as keep them malleable to various analyses.

Fig. 2. Logical groupings of KPI’s

Solution. The KPIs were organized logically into 5 categories as in Fig. 2 viz., input, process, output, outcome, and impact metrics. More than 10,000 indicators were analyzed along with more than 300 key reports to identify the few indicators which

would provide the exact picture of inputs, process, output and outcomes of any program/directorate. The problem of plenty was solved through an approach of “Divide and focus”. SHDRC was designed to streamline the analytics offerings organized into 6 layers explained in Table 2 Table 2. Presentation Layer by Zones S. No

Layer

1

Dashboard Zone

2

Reports Zone

3

Indicator Zone

4

Analysis Zone

5

Freehand Zone

6

Predictive Zone

Description Summary view of KPI performance against targets / thresholds with color schemes to highlight under or over performance. This layer was targeted at very senior level policy decision makers. Detailed reports with drill down at granular level data available for middle level management. Micro level indicators (several thousands) available for reference including history for comparison targeted at operational level personnel Pre-defined analytical templates that present insights for medical officers and research professionals Key data elements grouped as data sets and provided as palette for users to define their own templates and analysis views. Scientific analysis, data mining and predictive analytics leveraging Weka predictive engine, targeted at Policy research and medical outcome research.

3.2 Data Governance Challenges Data Availability. The first challenge faced here was to ascertain data availability. Data is collected reaching out to grass-roots level and setting up appropriate processes. Data Integrity. At most locations, data capture was being done by public health practitioners who face practical challenges of huge patient turnaround, hence poor attention is given to capture of complete and in many occasions correct data. This resulted high levels of data quality issues at the source systems itself. Due to the vastness of the system and speed of response and treatment being the primary objective at the field level, it was also a challenge to implement incremental checks and balances at the data capture stage to ascertain the integrity of the data. Data Duplicity. India is still in the process of setting up a universal identification number. This meant that the chances of data duplicity were high as there were very limited means of defining unique patient identifiers. The likelihood of one patient

visiting multiple times for the same or different illness getting clocked as different patients was very high. This meant potential anomalies in analyses and research insights. Solution. While Data integrity was maintained by training and education programs, we can manage to maintain the integrity for advanced analytics needs by applying imputation techniques like hot deck imputation etc. Data de-duplication by identifying a patient across visits using contact number, address, name helped reduce duplicates by 75%.

3.3 Design and Performance Challenges Portable design. The application had to cater 21 health directorates collecting data from compound sources. Design has to be maintained consistent, seamless and portable across directorates where system of records vary greatly. Schema objects and design components has to be mutually exclusive and finally integrated. Open source software was the preferred tool stack. Tool selections have to be made consciously to ensure security, features, interact with legacy systems and external/Third party application for data exchange needs. Simplified user interface. Design simplicity has been an admirable challenge. Data entry points should be at the comfort level of users. Designing complex UI will lead users to skip entries and thereby data capture loss. Presentation Layer consistency across directorates is very much appreciated since training across directorates is much easier and transfer of employees between directorates do not require big learning curve. The Design and performance challenges and solutions are given in Fig. 3 and Table 3.

Fig. 3. Performance Optimization

Table 3. Performance challenges and solutions Pre-Optimization

Proposed Solution

Post-optimization

BI reports refresh took ~4-5 minutes

Prompt Filters

BI reports refresh in ~30-40 sec

OLAP cube failure on data volume

Dynamic schema processing top parameterize OLAP cubes

Parameterized OLAP cube was able to handle large volume of data

Slow DB response

Partitioning, Aggregate tables

External data load took more time

Foreign data wrapper

FDW made external data access local to the system

Unrestricted access

Role restriction filters

Improved data security

Data retrieval across directorates took ~6 minutes to load

Virtual cubes

Consolidation of common data across directorates reduced load time ~1 minute

4

indexing,

Data retrieval within a minute

Conclusion

This Paper presented a summary view of experiences from a real-life, large scale public health data consolidation and integration project, presented as a case study. Besides data collection, integration and consolidation, this project also involved data summarization and consumption of data i.e. insights generation and their adoption through structured analysis of key performance indicators. This Section provides a conclusion summarizing the key learnings presented in this paper as below:  The approach to rank and prioritize the focus key performance indicators from a long list of indicators that public health administrators require for administration of public health service.  Approach to categorize information requirements in the structure of Input-Process-Output-Outcome-Impact metrics and how they are mapped to the data elements.  Approach to handle key design and performance challenges in the capacity of database, user interface, OLAP cube and presentation layer.

  

From a technology perspective, it is interesting to note that Open Source BI Technologies can be effectively used for large scale data integration and analysis in public domain. In future, tremendous scope exists for exploring data quality assessment and measurement methods integrated as part of large data and analytics initiatives for public health. Predictive analytics, even though still in nascent form in public health, is essentially the future of public health and the same needs to be developed systematically.

Acknowledgment The Authors would like to thank the Government of Tamil Nadu for the opportunity provided to Accenture to partner with this noble initiative. Thanks are also due to Shri. M.S.Shanmugam, IAS for his dynamic leadership and excellent guidance in his role as Project Director, Tamil Nadu Health Systems Project. Our sincere thanks to the Heads of all the participating Directorates for their excellent cooperation in ensuring success of the initiative.

References 1. R. Beaglehole, and R. Bonita, Public Health at the cross roads Achievements and prospects, 2nd ed. Cambridge University Press, pp.34-36 (2004). 2. Developing Key Performance Indicators - A Toolkit for Health Sector Managers, USAID. 3. Erin Hahn, Sheri Lewis, and David Blazes, The Use of Open Source Software to Enhance Public Health Initiatives 4. Bernard C. K. Choi, The Past, Present, and Future of Public Health Surveillance 5. CDC’s Vision for Public Health Surveillance in the 21st Century pp 36 6. India Embarks on Universal Health Coverage during 12th Plan, Press Information Bureau, Government of India 7. The Double Burden: Emerging Epidemics and Persistent Problems – The World Health Report (1999) 8. Census Report - Censusindia.gov.in 9. National Health Policy, 2015, pp. 5 10. National Health Mission, Ministry of Health and Family Welfare, Government of India, http://nrhm.gov.in/nrhm-in-state/state-wise-information/tamil-nadu.html#health_profile

Suggest Documents