www.makrocare.com. W H I T E P A P E R. Using Greedy Algorithm in. Observational Studies by. Kiran Kumar Sandupatla, Rat
W H I T E P A P E R
Using Greedy Algorithm in Observational Studies by Kiran Kumar Sandupatla, Rathnakar Bollam and Dr. A. K. Mathai
www.makrocare.com
WHITE PAPER
Using Greedy Algorithm in Observational Studies
INTRODUCTION:
Randomized controlled trial (RCT) is the gold standard for evaluating a new drug in comparison with a standard drug or placebo. However, RCT has a few limitations such as a particular treatment regimen will be given to a patient based on his willingness for randomization, he may not be a representative sample of the clinical population due to stringent inclusion / exclusion criteria, his treatment management may be slightly different from routine clinical practice etc. These limitations may be minimized to some extent through observational studies. Observational studies are mostly conducted based on routine clinical practice without many restrictions. However, a proper control group is required to draw appropriate conclusion about the study group (cases). In general, there is no chance of occurring bias due to randomization in interventional studies whereas in observational studies, there is no prior randomization. Therefore, the group that received the treatment (cases) and the group that did not receive the treatment (controls) can be considerably different. In addition to that, there are many significant differences between characteristics of the cases and controls. These differences should be adjusted in order to reduce the bias in selection of treatment group as well as to determine the treatment effect. There are several techniques to reduce the bias of these differences and make two groups more similar. There are two types of matching algorithms. One is greedy match algorithm and other one is an optimal match algorithm. In this paper, we discuss about the utility of greedy algorithm in observational studies which will be used to match the cases and controls. PROPENSITY SCORES Propensity scores will be used to reduce the selection bias in observational studies. This score is also known as the predicted probability of an outcome (dependent variable) and always lies between 0 and 1. This score represents the relationship between multiple characteristics and the dependent variable as a single characteristic. PROC LOGISTIC option allow users to calculate and save the predicted probability of the dependent variable, the propensity score, for each observation in the data set. In an observational study, the dependent variable might be a treatment group. The propensity score would be the predicted probability of receiving the treatment. Generally, propensity score matching is used to compare between the outcomes of a group received the treatment and a control group. If only one characteristic influenced the outcome then selecting a suitable control would be easy. However, when we have several variables to consider simultaneously, the problem of choosing controls becomes more difficult. Many times, the choice of matching variables are mainly an investigator’s decision based on their knowledge and experience. GREEDY ALGORITHM In Greedy algorithm, matching starts with finding the nearest control for the first treatment case in the dataset. Once the nearest control identified, then that matched pair will be moved to the www.makrocare.com
2
WHITE PAPER
Using Greedy Algorithm in Observational Studies
matched dataset and deleted from the matching pool. That means these matched pairs will not be considered for further matching. Then, the nearest control for the second treatment case in the dataset is identified, and so on. Thus, the set of matches depends on the order of the dataset. If we change the sorting order then we get a different set of matched pairs. Greedy algorithm is most commonly used technique to match cases to controls with propensity score. So this can be called as case‐control matching on the propensity score. In this algorithm, a group (N) of cases is matched to a group (M) of controls in a set of (N) decisions. Here N must be less than or equal to M (N