research in financial big data modeling. For the next 5 to 6 years, I plan to continue research in the following areas d
Research Interest and Statement Hsing-Ming Chang
As a researcher in statistical methodology, I am interested in methodological advances in statistical theories and methods in epidemiological and public health research. I have special interest to further the research work of my doctoral thesis in the technique of finite Markov (FMCI) chain imbedding which provides a tool to model hospital emergency wait times and engineering queueing systems. The FMCI technique also provides a simple and flexible way to model time-to-event data. Another interest of mine is to continue my postdoctoral research in spatial statistical methods for detecting spatial clusters of disease related events. Such research has a wide application in other fields such as, ecological studies on distribution of vegetation and animals, and hot spot identification and spatial pattern analysis of crime data. Since 2007, there have been stages of financial instability observed with slow recoveries. Five major crises hit the global economy in August 2007, September 2008, April 2009, May 2010 and in August 2011 that affect interest rates and causing reforms in the banking industry. These events have sparked my interest to learn the methods banking institutes use to model interest rates and major indicators for risk management. I will be interested in collaborating in methodological research in financial big data modeling. For the next 5 to 6 years, I plan to continue research in the following areas dealing with statistical problems arising from their related disciplines.
FMCI Approach in Queueing Models In my thesis work, I learned the finite Markov chain imbedding technique and used it to model priority queues. This technique provides tool to study many types of queueing systems and has important applications in modeling ambulatory healthcare wait times. My thesis research made contributions to be the first in using FMCI technique to model preemptive and non-preemptive M/M/1 priority queues with thresholds. A major part of my thesis is to devise algorithms to obtain the tail distribution of waiting-time in priority queueing systems. This research has many practical applications, especially in stochastic processes of industrial engineering and in the building of predictive models for hospital emergency wait times. All scripts and computer simulation programs are implemented in the software MATLAB. In the journey of using MATLAB, I learned more advanced technique and logic in developing codes to utilize multi-core computers for parallel computation, to reduce unnecessary memory swapping between the physical memory and the virtual memory on hard drive, to reduce redundant computation and optimize algorithm allowing me to better control the amount of time needed to conduct computer experiments for scientific computation. I attempted to analyze the National Ambulatory Care Reporting System (NACRS) data set obtained from the Canadian Institute for Health Information through the Graduate Student Data Access Program, however, the modeling was not feasible due to lack of environmental variables and data censorship. I also spent much time studying another data set at the individual record 1
level (non-aggregated) obtained from the Erlin Branch of Changhua Christian Hospital in Taiwan. These informative data analysis gives us feed back on our modeling approach. I plan to extend this research by looking for collaborations with local emergency departments, and possibly some in Taiwan, and health researchers who have interest quantitative and qualitative research in this area. Priority queues under various settings not being discussed in my thesis are also of my interest to develop applications not only in ambulatory healthcare system, but also in manufacturing processes, transportation, communication systems and operational research. Priority queues subject to misclassification is a topic which receives less attention in studies. The first paper to ever consider the problem of misclassification in priority assignment was in Zee and Theil’s [1] 1961 paper. In my thesis, one chapter is devoted to review the matter of error in priority assignment. I am also interested in methodological research in this area to provide an approach to properly estimate parameters through estimating misclassification error rates.
Spatial Cluster Detection I joined the Department of Pediatrics at the University of Alberta as a postdoctoral fellow position since the November of 2011. My main research is to extend the spatial scan test based on Kulldorff and Nagarwalla [2] which is to assume each individual in the population either has a disease, and be called a case, or do not have a disease. However, multiple events (disease related) due to a case may provide more information. For example, utilization of emergency medical resources depend heavily on how often an ED is visited which vary by how many distinct individuals visit the ED and the frequency of visit by each individual. 5 people visited an ED independently is not the same as an individual visited the ED 5 times where inter-visits may be correlated. A natural extension of the work by Kulldorff [3] is to develop a spatial scan statistic for disease-related event cluster detection. This method relies on a maximum likelihood test and Monte Carlo simulations for analysis. Our initial research accomplishment [6] can be viewed in the application attachment. Currently I am developing a new model using a similar approach and incorporating the ability to carry out analysis accounting for population stratification based on key characteristics. The research finding will result in a separate manuscript soon for journal submission. For my experience in using MATLAB, I continue to implement new algorithm for analysis in MATLAB and use new results to analyze administrative health data base provided by Alberta Health, Canada. Working with large health database gives me a real understanding in the essence of data de-identification technique and its relevance in keeping the privacy of personal data. For large computer simulation work, I have learned to utilize the Numerical and Statistical Server and PBS system of the University of Alberta adding to my set of skills in using virtualization technology optimally for intensive high performance computing. My other duties include the extensive use of the open source statistical language R to manage and carry out analysis of high dimensional longitudinal health data. I do often assist other research assistants on the team and in research grant or fellowship application. I have further interest to continue the line of work in spatial scan adopting the finite Markov chain imbedding approach (see [4]). Such technique may allow us to deviate from the maximum likelihood method and will provide a new approach to model the number of cases and number of events over time and space in detecting or signaling possible clusters of rare disease or disease related events that can alert health authorities for further investigation. I am interested in using the FMCI approach to obtain the tail distribution for the underlying stochastic processes and be able to obtain the p-value directly for spatial scan analysis. The research will allow us to spatially and temporally model disease spread, forest fire, crime rate, and generate many applications in other fields including agriculture, ecology, and social sciences and engineering reliability theory. 2
I will also be excited to collaborate with colleagues with a mission to bring research to the challenges of modern data analysis problems, such as BIG data and social science (global social networking), by applying probabilistic models. I also have much interest in studying economic growth of Canada and Taiwan, and the changing of environmental factors and their effect on children’s physical and mental health. With the aim of the department, I may be able to offer new courses and co-supervise students to spark research ideas in social science and statistics.
Health Data and Biological Data Besides the above work, a portion of my time is spend on another project aimed to develop new method for marked point process. Marked point process is used for stochastic modeling of recurrent event data associated with additional information (often called marks). Data in many research discipline may be considered to have such property, for example, precipitation data associated with location and temperature etc. are used for weather forecast and modeling, death counts associated with disease infection and tumor size are used for survival analysis. I have interest in collaborating on research projects to continue in this research direction, to properly formulate problems related to modeling of time to event data, cancer care data, longitudinal data, and possibly utilize techniques of distribution theory of runs and patterns to solve research problems. My other interest is to join research teams to tackle on methodological research on detection of infectious disease as seen in [5] and data analysis of modern biological data, such as search in DNA and RNA sequencing, which draws much attention today.
References [1] van der Zee SP, Theil H. Priority assignment in waiting-line problems under conditions of misclassification. Operations Research 1961; 9(6)875–885. [2] Kulldorff M, Nagarwalla N. Spatial disease clusters: Detection and inference. Statistics in Medicine 1995; 14:799–810. [3] Kulldorff M. A spatial scan statistic. Communications in statistics - theory and methods 1997; 26(6): 1481–1496. [4] Fu JC, Lou WYW. Distributions theory of runs and patterns: a finite Markov chain imbedding Approach. World Scientific, 2003. [5] Yang Y, Longini IM, Halloran ME. A resampling-based test to detect person-to-person transmission of infectious disease. Annals of Applied Statistics 2007; 1(1): 211–228. [6] Rosychuk RJ, Chang H-M. A spatial scan statistic for compound Poisson data. Statistics in Medicine 2013; 32(29): 5106–5118. DOI: 10.1002/sim.5891
3