P erspective For reprint orders, please contact
[email protected]
Parallelism: considerations for the development, validation and implementation of PK and biomarker ligand-binding assays The goal of this article is to discuss the fundamental key questions around parallelism assessments: why to do it, when to do it and how to do it, with consideration for different molecule types and the scientific rationale that drives different approaches. Current practices for both PK and biomarker assays regarding which samples to pick, whether to pool or not pool samples, as well as generally accepted acceptance criteria are discussed, while also highlighting the many outstanding questions that remain. In order to reach the long-term goal of understanding and developing best practices for implementation of parallelism testing for both PK and biomarker assays, industry and regulators will need to keep the conversations going, and commit to generating and reviewing data for the purposes of our own education. Means to easily share data in open forums to facilitate and build common understandings should continue and will be necessary to expedite resolution of many of these questions.
Parallelism has been a hot topic in ligand-bind-
ing assays (LBAs) in recent years. From the PK assay perspective, both regulatory guidance [1] as well as industry consensus [2–4] have weighed in with recommendations regarding the need (or not) to routinely perform parallelism assessments as part of PK assay validation. In the biomarker assay realm, parallelism assessments have long been recognized as having great utility from the early stages of method development [5,6]. In both instances, the assessment itself is similar to that employed to assess dilutional linearity. However, where dilutional linearity employs a spiked control sample to demonstrate that samples containing analyte above the assay ULOQ can be accurately measured by the assay after dilution in blank matrix, parallelism employs an incurred sample and/or samples containing endogenous analyte to demonstrate whether the sample dilution–response curve is parallel to the standard concentration–response curve [7]. For the purposes of this review, the term ‘incurred samples’ will refer to samples that contain the analyte of interest via in vivo administration of drug or samples that contain endogenous analyte. Since LBAs measure the analyte of interest via immunoreactivity rather than directly, events that affect the analyte’s interaction with the critical immunoreagents will impact results generated by the assay. In the case of PK assays, the binding of the therapeutic molecule of interest
may be affected by interferences from the presence of antidrug antibodies (ADA), soluble target proteins (endogenous binding partners) and moieties created by biotransformation of the therapeutic molecule, such as occurs with complex heterogeneous therapeutics such as antibody–drug conjugates (ADCs) [8]. These types of complex sample types cannot always be created, or necessarily predicted, outside the context of incurred samples. For biomarker assays where the standard calibrator material is typically either recombinant or purified material and therefore not the same as the endogenous analyte being measured, there is always the additional question around whether the binding reactivity of assay critical reagents is genuinely similar for both. Understanding these relationships can be critical to data interpretation. Despite the high level of interest and heated discussions on the topic of parallelism that have occurred in recent years, this remains a relatively immature area and there remain many more questions than answers regarding the specifics of approaches to use, setting of appropriate acceptance criteria and when/whether the assessment is necessary. In addition, the goals of the assessment may vary widely for a biomarker versus biotherapeutic molecule, or even among different classes of biomolecules (mAbs versus ADCs, for example). Therefore, the goals of this article are to discuss the fundamental key questions around parallelism assessments: why to do it, when to
10.4155/BIO.13.292 © 2014 Future Science Ltd
Bioanalysis (2014) 6(2), 185–198
Lauren F Stevenson* & Shobha Purushothama Biogen Idec, 14 Cambridge Center, Cambridge, MA 02142, USA *Author for correspondence: Tel.: +1 617 914 6479 Fax: +1 617 914 6531 E-mail: lauren.stevenson@ biogenidec.com
ISSN 1757-6180
185
P erspective |
Stevenson & Purushothama
Key Terms Parallelism: Demonstration
that the sample dilution response curve is parallel to the standard concentration response curve.
Dilutional linearity: An
a ssessment where a control sample spiked with a high concentration of analyte is tested at multiple dilutions targeted to span the assay range in order to demonstrate that such samples are measured accurately.
Incurred sample: For PK
assays, an in-study sample. For biomarker assays, may also refer to any sample containing endogenous analyte of interest that may be obtained within or outside the context of a study.
do it and how to do it, with consideration for different molecule types and the scientific rationale that drives different approaches. Current practices for both PK and biomarker assays will be discussed, while also highlighting the many outstanding questions that remain. Ideally, these questions will stimulate robust discussion and ultimately facilitate a better understanding across the industry. Why? The goal of a parallelism assessment, as typically articulated, is ‘to demonstrate that the sample dilution–response curve is parallel to the standard concentration response curve’. For LBAs this is an indicator that the interaction of the assay critical reagents with the standard calibrator material and the analyte of interest in the samples is similar, and that the assay is therefore suitable for measurement of the analyte. This provides the basic measure of why one performs parallelism. However, progressing from that basis come the questions: why does it matter? Or, why do we care to demonstrate this? In addition, the reason why one performs the assessment may depend upon when it is performed, and the resulting answer may be as dependent on how it is done as it is a complete or accurate reflection of what is going on in the sample or an indicator of the assay’s limits. All those caveats aside, however, there are some underlying answers to questions of why. PK
assays For PK assays, there are essentially two different perspectives, the first being to perform parallelism assessments routinely as part of assay validation and the second being to perform them when deemed scientifically warranted. From the first perspective, the ‘why’ represents a conservative approach, which might be seen as a means to ensure that parallelism has been proven, or more colloquially as ‘playing it safe’. Additionally, regulatory agencies are positioned to see cases where there were issues with nonparallelism. Therefore, the desire to perform routinely ‘just to be sure’ is understandable and perhaps this provides a generalized answer to ‘why’. Therefore, it is perhaps not unexpected that this approach has been advocated by at least one regulatory agency [1]. Consequently, in order to comply with a regulatory expectation, the ‘why’ from a pragmatic perspective, especially for laboratories primarily conducting regulated sample analysis, may simply be to ensure that 186
Bioanalysis (2014) 6(2)
they meet this expectation. On the other hand, some companies in the industry have adopted and advocate routine parallelism assessments as standard practice in their PK assay validations, but these practices have not been established in response to expectations by regulatory authorities. Instead, ‘why’ they perform the assessment has been driven by their scientific experiences with their therapeutic molecules. In these cases, the decision has been rooted in the nature of the composition of their portfolio, which may contain a number of molecule type(s) that are understood, a priori, to pose a greater likelihood of nonparallelism (ADCs, for example). Across the industry, however, and particularly in reference to mAb therapeutics, routine parallelism assessments have not been universally adopted. Another perspective, as advocated by the Global Bioanalysis Consortium [101] and not dissimilar in spirit from the companies that have incorporated routine assessments based upon their portfolio’s composition, promotes that a risk assessment be done to inform the need to perform parallelism or not. They recommend that the need to perform a parallelism assessment should be driven by the characteristics of the drug, its binding partners and the assay reagents’ specificity [9]. Rather than routinely performing the assessment just in case there is an issue, this perspective argues that a scientific rationale should exist that explains why the assessment is warranted. This point of view arises from the recognition that parallelism is not an issue for some large molecule moieties such as mAbs for which in vivo modification of the molecule is not expected. Furthermore, there is a concern that if the assessment is performed without a specific scientific rationale behind it, then the approach taken to assess it might not be informative and a ‘passing’ result (whatever that may be defined as) may offer a false sense of security and potentially even cause subsequent anomalous PK data to be overlooked or dismissed because parallelism had been ‘established’. In truth, there is a scientific basis underlying both perspectives and there is clearly common ground in that both desire to ensure that the results reported from PK assays are an accurate reflection of the molecule’s concentration in the sample. Essentially, it might be agreed that both perspectives originate from a risk assessment that diverge upon subsequent translation to relative risk tolerance. Regulators, who are positioned to review enormous numbers of applications, will have witnessed parallelism issues and therefore future science group
Development, validation & implementation of PK & biomarker ligand-binding assays
A
Biomarker
future science group
B
Measured conc. (pg/ml) × dilution factor
assays In the biomarker assay realm, the opportunity to perform parallelism early in assay development and the obvious advantages of doing so make for little to no controversy around whether the assessment is necessary or not. Most industry experts agree that parallelism is always explored for biomarker assays [12]. In addition, with assays for exploratory biomarkers not being subject to regulatory guidance, and with biomarkers used to support late-stage clinical (and regulatory authority) decision-making understood to require assays that stand up to PK assay scrutiny, there are presently no significant points of divergence in opinions between regulators and industry. For biomarker assays, the ‘why’ includes not only the reasons stated above for PK assays, but also the compelling reason that the parallelism assessment enables development of a better assay. Since these assays measure endogenous analyte, incurred samples can typically be obtained during the assay development stage and leveraged in a parallelism assessment to inform many aspects of the assay. In addition to the standard parallelism goal (demonstrating that sample dilution–response curve is parallel to
the standard concentration–response curve), determination of assay minimum required dilution (MRD), estimation of assay sensitivity with respect to endogenous analyte and testing of selectivity may also be achieved [5,6]. Notably, however, biomarker assays often do not show parallel responses and although efforts may be made to rectify by re-optimizing the assay or looking at alternative assay formats, limited reagent options and analyte complexity may prevent success. A parallelism failure, however, does not negate the use of the assay, but rather informs how the assay and the data resulting from the assay may be used. Figure 1A is an illustrative example of a parallelism failure where measured concentrations of analyte increase with increasing dilution, which puts the accuracy of measurements at all dilutions in question. Ideally, this could be ameliorated by
Measured conc. (pg/ml) × dilution factor
desire routine assessments to ensure that the question of parallelism has been considered. Companies that develop complex biotherapeutic molecules, such as ADCs or more labile molecules such as peptides, adapt as routine because they regularly reside in a high risk category with respect to parallelism. Other companies, whose portfolios may be primarily mAb therapeutics, will not have adapted as routine since their relative risk with respect to parallelism is low. For now the question remains as to whether the risk assessment will be left in the hands of the individual scientists or in the hands of regulators. Generally speaking, regardless of who is performing the risk assessment, the underlying ‘why’ should be to better understand the data and parallelism assessments can aid data interpretation when confounding interferences are present in vivo [10,11]. This assumes, however, that the samples harboring these confounding factors are in fact the ones that are tested in the parallelism assessment. Data from supporting assays, such as those measuring changes in soluble ligand levels, drug–ligand complexes, and so on, may be another means to provide similar insights or may inform which samples to investigate with respect to parallelism.
| P erspective
300 Sample 1 Sample 2
200
Sample 3 Control
100
0 0.0
0.5
1.0
1.5
1/dilution factor 300 Sample 1 Sample 2
200
Sample 3 Control
100
0 0.0
0.5
1.0
1.5
1/dilution factor
Figure 1. Example of a parallelism failure. (A) Nonparallelism: dilution-adjusted measured concentrations of samples increase with increasing dilution factor while control sample shows consistent concentration measurements at all dilution factors. Dilution factors from 1–16 are presented. (B) Parallelism: samples and control return consistent dilution-adjusted concentration measurements across multiple dilutions. Dilution factors from 1–32 are represented.
www.future-science.com
187
P erspective |
Stevenson & Purushothama
Key Term Common dilution method:
A means to set LLOQ for a biomarker assay by identifying the greatest dilution factor at which all individual samples in a given test set achieve a parallel response.
identifying a MRD that mitigates this apparent matrix effect. However, for many biomarker assays, additional dilution is not possible as the analyte is present at low levels and therefore assay sensitivity limits the range of dilution. Assuming (as in Figure 1A) that samples reliably rank order at all dilutions, then an assay such as this may be deployed for qualitative trend analysis, as long as samples from the same individual are tested at a single dilution and results are evaluated in reference to baseline levels. Conversely, an assay that shows reliable parallelism, as illustrated in Figure 1B, may be employed to test samples at multiple dilutions and will be able to provide relative quantitation. When? The question of when to do parallelism assessments has several components. The first, as discussed above, touches simply on when the assessment should be done, versus when not, with the answers ranging from it should always be done to it should be done only when necessary. That basic question aside, in the event that an assessment will be performed, new questions arise. When during the assay’s life cycle should parallelism be tested – during assay development, prestudy validation, in-study validation or at multiple phases? And when, due to sample availability, is it even feasible to do so? The answers to these questions will frequently differ for PK and biomarker assays and in each case, new sets of questions will also arise. During
assay development
PK assays
The requirement for incurred samples typically limits the opportunity to utilize parallelism assessments during the assay development phase for PK assays, unless the PK assay is being changed or improved after one or more studies have been conducted and suitably consented samples are available for the analysis. In most cases, therefore, any information on parallelism is typically obtained after assay development has completed. This highlights the need to be thinking about potential moieties that may occur in your study samples when designing the PK assay in order to ensure, as much as possible, that the assay will be able to accurately reflect your drug concentration. For example, the presence of binding partners or soluble target protein may be expected to change upon treatment, and therefore impact detection of drug in the assay [8]. Considering in advance what 188
Bioanalysis (2014) 6(2)
limitations or potentially misleading information the assay may provide will protect against surprises in study. Development of additional assays or orthogonal methods may also be considered in order to understand, for example, concentrations of free versus total ligand, which will help provide context for the data obtained from the PK assay [13]. In the event that suitable incurred samples are available, then these may be leveraged in parallelism assessments to inform assay development, especially if the therapeutic is a complex molecule such as an ADC where multiple moieties will be expected [14]. Indeed, the decision to develop or update the existing PK assay may have been driven by limitations noted in earlier studies and therefore a parallelism assessment that utilizes the incurred samples from these prior studies will be invaluable in determining whether the new assay will overcome those limitations. Biomarker assays
Since biomarker assays measure endogenous analyte, it is almost always possible to obtain incurred samples and employ parallelism as a valuable tool for assay development. This gives biomarker assays a distinct advantage in this arena versus PK assays. While in some cases samples may be available through Phase 0 studies in the relevant disease population, in many cases they will be obtained from commercial sources. In this latter scenario, there is less likelihood that the samples will be representative of the specific study population and quite often there will be limited or no information on the number of freeze–thaw cycles the samples have undergone or relevant detailed clinical demographics on the donors. Nevertheless, the ability to test such samples during this stage definitely provides an opportunity to inform the course of the assay’s development and enables modifications to be made in order to overcome or mitigate any issues observed with parallelism. Minimally, parallelism assessments at this stage enable a better understanding of what the assay limitations will be and how the resulting data may be used, which ultimately will define the level of decision making that may be supported by the data. As indicated previously, parallelism informs biomarker assay development at multiple points as it provides information on assay MRD, sensitivity and selectivity. Rather elegantly, although not always the case, all of these can be determined in a single experiment. Illustrative future science group
Development, validation & implementation of PK & biomarker ligand-binding assays
| P erspective
Table 1. Parallelism to set minimum required dilution. Sample dilution 1:2 1:4 1:8 1:16† 1:32 1:64 1:128 1:256 1:512 1:1024
Dilution-adjusted concentration (ng/ml; % nominal) 1
2
3
4
5
6
7
8
9
10
ALQ 71 (73) 103 (106) 97 (100)† 95 (98) 104 (107) 116 (120) 122 (126) BLQ BLQ
ALQ ALQ 131 (75) 175 (100)† 165 (94) 166 (95) 182 (104) 205 (117) 199 (114) BLQ
ALQ ALQ ALQ 199 (100)† 182 (91) 185 (93) 199 (100) 222 (112) 237 (119) BLQ
ALQ ALQ 128 (108) 119 (100)† 115 (97) 119 (100) 133 (112) 139 (117) BLQ BLQ
ALQ ALQ ALQ 245 (100)† 224 (91) 229 (93) 243 (99) 257 (105) 308 (126) 328 (134)
ALQ ALQ ALQ 344 (100)† 332 (97) 324 (94) 346 (101) 379 (111) 423 (123) 563 (164)
ALQ ALQ ALQ 200 (100)† 204 (102) 197 (99) 184 (92) 210 (105) 254 (127) BLQ
ALQ 72 (69) 110 (105) 105 (100)† 99 (94) 95 (90) 111 (106) 115 (110) BLQ BLQ
ALQ ALQ ALQ 317 (100)† 303 (96) 290 (91) 328 (103) 357 (113) 422 (133) 440 (139)
ALQ ALQ 108 (76) 142 (100)† 136 (96) 131 (92) 152 (107) 175 (123) 201 (142) BLQ
Indicates chosen minimum required dilution where all samples measure in assay range and multiple dilutions beyond the minimum required dilution yield accurate results. ALQ: Above LOQ; BLQ: Below LOQ. †
examples are provided in Tables 1–3. In order to determine assay MRD, parallelism results are examined to assess at what dilution matrix effects appear to be ameliorated and all samples return results within assay range. The sample concentrations determined at this provisional MRD are then set as the reference nominal concentrations against which results from subsequent dilutions are compared (Table 1). This exercise may require an iterative process where different provisional MRDs (and their associated reference nominal concentrations) are tested [6]. The minimum dilution at which all individual samples demonstrate acceptable accuracy from multiple subsequent dilutions can be considered the assay’s MRD. As for other aspects of biomarker assay development and validation, what constitutes acceptable relative recovery should be determined on a fit-for-purpose basis. In the same, or similar, experiment it may also be possible to determine assay sensitivity for the endogenous analyte. The goal of this
evaluation is to understand at what level the endogenous concentration is being accurately measured rather than simply setting assay sensitivity based upon the lowest standard point with acceptable precision and accuracy, as the calibrator material for biomarker assays differs from the endogenous analyte and interaction with the critical reagents will also be different to a greater or lesser degree. For this analysis samples are diluted to below the assay’s detection limit (as defined by the standard curve) and each sample dilution series is examined to identify the dilution range which produces a parallel response for each sample [15]. Setting the sensitivity, or LLOQ, may then be done in different ways and will depend upon how conservative an estimate one wants to make, which may in turn depend upon the anticipated sensitivity needs and how the data will be used. For example, a common dilution method may be employed whereby the greatest dilution at which all individual samples give a parallel response is identified. The highest
Table 2. Common dilution method for setting LLOQ. Sample dilution 1:16 1:32 1:64 1:128† 1:256 1:512 1:1024
Measured concentration (ng/ml) 1
2
3
4
5
6
7
8
9
10
6.06 2.97 1.63 0.91† 0.48 BLQ BLQ
10.9 5.16 2.59 1.42† 0.80 0.39 BLQ
12.4 5.69 2.89 1.55† 0.87 0.46 BLQ
7.44 3.59 1.86 1.04† 0.54 BLQ BLQ
15.3 7.00 3.58 1.90† 1.00 0.60 0.32
21.5 10.4 5.06 2.70† 1.48 0.83 0.55
12.5 6.38 3.08 1.44† 0.82 0.50 BLQ
6.56 3.09 1.48 0.87‡ 0.45 BLQ BLQ
19.8 9.47 4.53 2.56† 1.39 0.82 0.43
8.88 4.25 2.05 1.19† 0.68 0.39 BLQ
Indicates the greatest dilution (1:128) at which all samples returned accurate results. Concentrations at minimum required dilution (1:16) were used as nominal (Table 1) . ‡ The lowest concentration measured at this common dilution. BLQ: Below LOQ. †
future science group
www.future-science.com
189
P erspective |
Stevenson & Purushothama
Key Term
Table 3. Common concentration method for setting LLOQ.
Common concentration method: A means to set
Sample dilution
LLOQ for a biomarker assay by identifying the concentration at which all samples in a given test set demonstrate a parallel response.
1:16 1:32 1:64 1:128 1:256 1:512 1:1024
Measured concentration (ng/ml) 1
2
3
4
5
6
7
8
9
10
6.06 2.97 1.63 0.91† 0.48 BLQ BLQ
10.9 5.16 2.59 1.42 0.80 0.39† BLQ
12.4 5.69 2.89 1.55 0.87 0.46† BLQ
7.44 3.59 1.86 1.04 0.54† BLQ BLQ
15.3 7.00 3.58 1.90 1.00† 0.60 0.32
21.5 10.4 5.06 2.70 1.48‡ 0.83 0.55
12.5 6.38 3.08 1.44 0.82† 0.50 BLQ
6.56 3.09 1.48 0.87 0.45† BLQ BLQ
19.8 9.47 4.53 2.56 1.39† 0.82 0.43
8.88 4.25 2.05 1.19† 0.68 0.39 BLQ
Indicates the greatest dilution for each individual sample that returned concentrations within ±20% of nominal. Concentrations at minimum required dilution (1:16) were used as nominal (Table 1) . ‡ All samples measured accurately at or below 1.48 ng/ml regardless of dilution factor, so the common concentration that was accurate for all samples was 1.48 ng/ml. BLQ: Below LOQ. †
concentration observed at that dilution is then set as the LLOQ (Table 2). Alternatively, a common concentration method may be employed. Using this method, all the individual dilution series are examined and the lowest concentration for each sample that provides a parallel response is noted. The highest concentration from that set of samples is then set as the LLOQ since this concentration represents a concentration at which all of the samples had demonstrated parallelism (Table 3). Either of these approaches can also be used more liberally, for example, by taking the lowest concentration that achieved parallel response at a given dilution. This may be necessary when very low levels of analyte are anticipated in-study or from a desire to capture more data in-study. However, these more liberal approaches will generally be more acceptable for exploratory assays and should be supported by fit-for-purpose application. Lastly, the same, or similar, experiment may also be used as a selectivity assessment, since a demonstration of parallelism across multiple individuals effectively demonstrates that the endogenous analyte is being selectively measured in the context of complex matrix components [6]. Furthermore, the merits of spiking exogenous purified or recombinant material into matrix samples may be of limited value since this does not assess the performance of the assay with respect to measurement of endogenous analyte in matrix. In this manner, selectivity for biomarker assays differs from PK assays where the spiked analyte is identical to the assay’s standard calibrator material. Indeed, for biomarker assays the interaction of spiked standard calibrator material with endogenous analyte or other matrix components can 190
Bioanalysis (2014) 6(2)
potentially be confounding and spiked samples such as these are unlikely to represent the state of the endogenous analyte in study samples [5]. However, in cases where the samples with adequately high levels of analyte cannot be sourced during the assay development stage, spike recovery may be the only recourse. In those cases, as for PK assays, an in-study assessment may be considered. During
prestudy validation
PK assays
Parallelism assessments are rarely possible during prestudy validation due to unavailability of incurred samples at this stage. If possible to assess at this stage then it might be possible to leverage information gained to inform in‑study sample analysis. Biomarker assays
Unlike for PK assays, parallelism assessment for biomarker assays is often done at the assay validation stage for many of the same reasons the assessment is informative at the assay development stage. Not only is it important to understand whether there is in fact a parallel response between samples and standard material, but the assessment also confirms and documents performance observed during method development, defines sensitivity for endogenous analyte and serves as the selectivity assessment. During
in-study validation
PK assays
As previously mentioned, PK assays are at somewhat of a disadvantage compared with biomarker assays because the first opportunity to conduct parallelism typically does not come until the in-study validation stage, after future science group
Development, validation & implementation of PK & biomarker ligand-binding assays completion of assay development. So ‘during in-study validation’ seems the simplest answer to the question of when for PK assays. However, this answer is deceptively simple and upon consideration, a multitude of questions can be imagined. Should parallelism be tested in the first single-dose study? Would the situation be expected to be different for multiple dose studies? For a clinical assay, should parallelism wait until the final dosing paradigm has been established? Can one safely assume that an assay that ‘passes’ parallelism for a single study has passed once and for all? Does the assay pass parallelism or does parallelism simply tell us what sample compositions are accurately measured by the assay? Will a better understanding of relative levels of confounding factors such as presence of ADA or PD changes in binding partners help guide understanding of parallelism data? In order to begin to find answers to these ancillary questions we must circle back to the question of why the assessment is being done. For example, if in a given study an interfering factor is expected to be induced that would affect the detection of drug by the assay, then one might consider running parallelism in that study. It may be that a first study in healthy volunteers would not be expected to induce any interfering factors, but a later study, with multiple doses in disease state subjects, would be expected to do so. In such a case, assessment in the later study would be more scientifically appropriate. Similarly, if an assessment has been performed in an early study, this does not mean that parallelism should necessarily be revisited in multiple subsequent studies. Rather, the scientific relevance of the assessment that was performed should be considered to ensure that ‘good’ parallelism noted in an early study is not over-interpreted. As always, and as is true for all assay performance parameters, it is pr udent to look at the totality of available information and data and to investigate pa rallelism appropriately as needed. Biomarker assays
For biomarker assays, many parallelism questions may be answered prior to in-study validation. Repeating of the assessment at this stage will be guided by the same rationales posed above for PK assays, noting that this will be more important for late-stage decision-making biomarker assays which are expected to adhere to many PK assay principles and characteristics. future science group
| P erspective
Exploratory biomarker assays may be largely exempt from any analyses beyond that deemed necessary by the end-users of the data. When it matters PK assays Despite all the questions posed above, it is fairly inarguable that performing parallelism matters when doing so is necessary to understand the sample data. However, given the questions discussed above, is ‘one and done’ for parallelism assessments for PK assays an acceptable approach? Recent discussion with regulators at the 7th Workshop on Recent Issues in Bioanalysis seemed to suggest that most were content with the notion that once parallelism had been demonstrated for an assay, then it would not require repeating in subsequent studies [12]. However, such an approach may be misleading and should not be over-interpreted to convey a false sense of security that supersedes what the study data may be saying later. In some instances there may be no need to perform parallelism at all and in others it may require attention at more than one stage. When it comes to parallelism assessments, the assay and the samples are inextricably linked. An assay may have perfectly suitable parallelism for a certain population of samples, yet lack it for another population. Therefore, parallelism assessments matter when there is inadequate understanding of the assay’s performance with respect to the moieties and complexes expected to be present in study samples. This understanding can be derived from many sources other than the parallelism assessment including, but not limited to, data from ADA and PD biomarker assays which provide pertinent information on the composition of the samples. Biomarker
assays Biomarker assays offer ample opportunity to explore parallelism at multiple stages of assay development, and fit-for-purpose validation and implementation of the assay. Therefore, these assessments matter whenever one wants to: develop the best possible assay for the purpose; understand the assay’s limitations and how it may be implemented; understand how the resulting sample data may be used. Notably, for late-stage decision-making biomarker assays, the expectations from regulators’ perspective will be the same as for PK assays, therefore the questions and considerations posed above will apply in those cases. www.future-science.com
191
P erspective | Key Term Cmax: Peak serum/plasma concentration of a therapeutic drug.
Stevenson & Purushothama How? Touching upon a recurring theme, the question of how parallelism assessments should be done leads to some common ground as well as to many new questions. Generally speaking, ‘how’ typically speaks to the approach employed to perform the assessment. It is widely agreed that the general approach is similar to that for dilutional linearity assessments where multiple dilutions of a high concentration sample are performed to generate several samples with concentrations that span the assay’s range [7,9]. Additionally, there has been if not agreement, then at least lack of controversy, over the frequently applied criterion of ≤30% CV amongst the in-range measurements back-calculated to neat concentrations as a means to assess presence or absence of parallelism. The lack of controversy, however, is not necessarily indicative of agreement that this is best practice, but rather a reflection that best practice has not yet been established [12]. However, beyond these superficial areas of apparent common understanding, more specific questions surface around whether individual or pooled samples should be evaluated, how many samples to evaluate, sample selection, applicable acceptance criteria, or whether there should even be acceptance criteria. Sample
samples?
selection: individual versus pooled
PK assays
One of the first questions that arise around sample selection is whether to use pooled or individual samples. There are pros and cons to both approaches (Table 4). The decision to use one approach over the other will likely be linked to the question of why the assessment is being performed. For example, if the reason is to develop a broad understanding of the assay’s
ability to measure the biotherapeutic after in vivo exposure or to note whether relatively global biotransformation of the molecule has occurred in vivo, then pooled samples may provide an answer. If the goal is instead to understand what may be occurring in individual samples – for example, to investigate a sample with apparently aberrant PK, then that assessment will need to be done on an individual sample basis. A second question that arises in both cases is how many samples should be tested? For pooled samples, how many has two aspects: how many samples should be used to create the pool as well as how many pools should be tested? There are two frequently mentioned advantages associated with using pooled samples, the first being that multiple data sets are not generated for a given sample. Pooling the samples creates an anonymity, which negates the need for setting up a priori a system for handling more than one valid result for a given sample. Pooling also offers the advantage of creating a larger volume of sample to test, which can have practical advantages and in some cases enable a parallelism test to be performed that might otherwise be constrained by volume limitations. The downside of using pooled samples is that if nonparallelism is observed, then the source is unclear – is it reflective of all the samples in the pool or due to just one? Conversely, can nonparallelism in a single sample be masked when pooled with other samples that demonstrate parallelism? How might this affect data interpretation? The most obvious advantage of using individual samples is that when lack of parallelism is observed, it can be ascribed to a particular sample. A subsequent investigation can leverage other information that may be known about the sample, such as ADA status and correlation with changes in PD markers. In some cases, increased
Table 4. Parallelism using pooled versus individual samples. Sample type
Advantages
Pooled sample
Multiple data sets are not generated for a given sample
Practical value of having adequate sample volume for multiple dilutions
Individual sample
192
Lack of parallelism can be ascribed to a single sample and other data that inform the composition of moieties in that sample can be leveraged to understand overall results
Bioanalysis (2014) 6(2)
Disadvantages
Confounds data interpretation when nonparallelism is observed; is it from one sample or multiple samples in the pool?
Factors that could contribute to nonparallelism could be masked as a consequence of pooling Generates multiple results for a given sample necessitating clear criteria for reporting data
future science group
Development, validation & implementation of PK & biomarker ligand-binding assays dilution may be found to mitigate parallelism issues and the investigation may inform a strategy for re-testing such samples at dilutions which will be more likely to provide a better estimation of drug concentration. Minimally, it should serve as a trigger to look more closely at all the data coming from that sample and how those data might inform the composition of entities in the sample. The downside of using individual samples is the generation of multiple valid results for a given sample. Therefore, a data reporting plan needs to be put in place prior to analysis. There is presently no clear answer as to whether using pooled or individual samples is best and, like so much discussed herein, the decision should likely be made on a case by case basis and tied to why the assessment is being undertaken. Whether one uses pooled or individual samples, a clear goal either way is to better understand the sample’s composition and the assay’s ability to detect the analyte of interest. Biomarker assays
To date, the literature on parallelism in biomarker assays focuses on evaluations done in method development and prestudy validation phases, not on evaluation of incurred samples from studies that involved biotherapeutic drug administration [5,6]. This may simply be reflective of fit-forpurpose biomarker practices which are not being driven by guidance and because a great deal of biomarker work is for sponsor’s internal decision making. Presumably, if implementing a late-stage decision-making biomarker then expectations would align with those for PK assays and all the considerations mentioned above would apply. In fact, generally speaking, many of the questions posed above will apply to biomarker assays, although there are some other considerations that are particular to biomarker assays. Since parallelism is typically evaluated in the assay development stage there is no concern regarding the generation of multiple data sets for a given sample, so the use of individual samples is more widespread than for PK assays. Additionally, evaluations such as selectivity assessments using parallelism absolutely require the use of individual samples. However, where it is difficult to obtain samples in sufficient volumes, for example for assays in cerebrospinal fluid matrix, then pooling becomes a practical necessity. Although pooled samples may have utility in the assay development and validation stage, the absence of published data on parallelism assessments in study samples leaves a large gap in our understanding. future science group
Sample
| P erspective
selection: which samples?
PK assays
It is widely agreed that higher concentration samples need to be chosen in order to allow testing of multiple dilutions that span the breadth of the assay range. While Cmax samples may meet this criterion, a number of questions and considerations need to be taken into account. Assuming a multidose study, should samples be taken from early in study or late in study? Later in study may yield different results as there may be accumulation or other alteration in levels of binding partners that could affect results. If some sort of biotransformation of the molecule or significant changes in levels of binding partners such as soluble ligand is expected in vivo, then sample selection should take this into account. In a thorough evaluation, value may be added by testing Cmax and late postdose samples that have accumulated enough of the anticipated changes while still having enough drug concentration to produce multiple dilutions. In a similar vein, if the ADA status of the samples is known, should ADA+ or ADA- be chosen? Which ones are chosen can make the difference between passing and failing. Therefore, the goal of assessment (the why), needs to be considered. ADA+ samples might be avoided because they will be expected to fail. But, what if a parallelism assessment could define a series of dilutions where a relatively parallel response is obtained? Would this enable reporting of a more ‘accurate’ PK result for those samples? Is this even necessary since impact of ADA is well recognized? As long as scientific rationale dictates the choice and the ‘why’ is anchored to a better understanding of the data, then an argument may certainly be made for avoiding ADA+ samples. However, if only ADA- samples are tested, there should not be an assumption of parallelism for ADA+ samples. Biomarker assays
For biomarker assays, where at least for now parallelism assessments are most relevant in the assay development and validation phase, the question of ‘which samples’ may be as pragmatic as ‘whichever samples are available’. The use of commercially available samples is commonplace and samples from healthy donors are often the easiest to obtain. However, since the circulating levels of many prospective biomarkers are quite low in the normal population, these samples may offer little value. Even when disease state samples can be obtained, the www.future-science.com
193
P erspective |
Stevenson & Purushothama biomarker levels may still be too low to enable evaluation of a number of dilutions before the measured concentrations fall below the limit of quantitation, which will limit the extent of parallelism data that can be obtained. Whether additional parallelism evaluations in study samples will add value remains a question. As long as the bulk of biomarker evaluations remain exploratory it is unlikely that this question will be answered in the near term, but may deserve some consideration. Acceptance
criteria
PK assays
The major questions that come to mind regarding acceptance criteria for parallelism are: What should they be and do we need them? The criterion frequently applied requires ≤30% CV amongst the in-range measurements back-calculated to neat concentrations [7]. However, it is acknowledged that trends towards loss of parallelism can exist in data sets that ‘pass’ this criterion. Therefore, passing this criterion should not be over-interpreted as parallelism having been proven and excuse the need to carefully investigate study data that might suggest loss of parallelism. Discussions are ongoing in the bioanalytical community on potential alternative approaches and range from tying criteria to assay performance on a case-by-case basis where tighter criteria would apply to assays that demonstrate tighter precision and accuracy, to statistics-based approaches [12]. The relative merits of these approaches will only be forthcoming upon sharing of data that utilize these approaches and evaluating the relative impact on study data ana lysis. The value might be envisioned however, at least in the former case, if one imagines an assay with a demonstrated total error of 10% during prestudy and in-study validation that then just barely passes a parallelism evaluation with a 30% criterion. Scientifically speaking, it would be prudent to try to understand what was happening in the samples that would impact assay performance and account for the deviation from historical assay performance. Accepting the passing parallelism assessment without further thought in such a case would likely be shortsighted. Does this mean that the popularly applied criteria are inappropriate? The likely answer to that question is ‘it depends’ and leads to the second question. Do we need acceptance criteria for parallelism? There is a nuance in this question in that there need to be criteria against which one will 194
Bioanalysis (2014) 6(2)
do the evaluation, but how this relates to acceptability of the method itself is in question. The European Medicines Agency Guideline on bioanalytical method validation includes parallelism as part of its recommendations for LBA validation suggesting that the assessment is intended to evaluate the assay method [1]. But, does a parallelism failure mean that the assay failed or that the sample(s) tested failed? The assay and the samples are inextricably linked in that the same assay may pass parallelism with one set of samples and fail it with another set. In such a case, has the assay itself passed or failed? Practically speaking, the answer to that question could be determined and manipulated by which samples were chosen for testing. Scientifically speaking, however, the answer depends upon whether the assay is able to accurately measure the species/moiety that is most relevant to decision making. Therefore, it is challenging, or arguably impossible, to dictate a one-size-fits-all criterion that must be passed for a method to be considered acceptable. Given the inherent complexity and questions surrounding the setting acceptance criteria, should parallelism assessments, therefore, be viewed more as a characterization? Such an approach would require case-by-case scientific judgment to determine whether certain samples need to be ‘dealt with’ or whether the assay itself is in fact inappropriate or inadequate. Assuming that acceptance criteria (or evaluation criteria) are invoked, several additional questions arise, such as how many samples should be evaluated? If a single sample (or pooled sample) is tested and passes proposed criteria, has the assay passed parallelism? If multiple samples (or pools) are tested, how many of those samples need to pass individually in order to pass the overall assessment? Do all samples need to pass or is an approach similar to that used for selectivity appropriate, where 80% of samples need to pass in order for the assay to be deemed acceptable? Do failed samples require investigation? What does failure really mean? What mitigation options are available? For PK assays, developing answers to these questions that are mutually acceptable to both industry and regulators will be critical discussion points moving forward. Biomarker assays
The questions surrounding acceptance criteria for biomarker assays are in large part similar to those posed above for PK assays. However, given that fit-for-purpose approaches are commonly applied in the biomarker assay realm, there may be a future science group
Development, validation & implementation of PK & biomarker ligand-binding assays greater level of comfort in crafting scientifically defendable answers on a case-by-case basis. Future perspective As evidenced by the myriad of questions posed in this review, application of parallelism assessments for LBAs, particularly those in support of regulated bioanalysis, is relatively immature. This is made evident by a recent search of PubMed that looked for publications with ‘parallelism’ in the title and ‘immunoassay’ in the abstract. The results of this search yielded only five articles, none of which were related to regulated bioana lysis or description of best practices for parallelism assessments for PK and biomarker assays [16–20]. While many of the questions posed herein apply to both PK and biomarker assays, there is some greater urgency for answers in the PK assay realm as some regulatory agencies are implementing or considering requirements for parallelism testing as part of in-study validation for PK assays [1]. Therefore, clearer understanding and articulation of why, when and how to perform these assessments, is desired.
| P erspective
In the near term, clarification on sample selection for the parallelism evaluation and consensus on what constitutes appropriate acceptance criteria, as well as how and when such criteria should be applied, require consideration. Robust discussion as to whether parallelism assessments should be overtly recommended in regulatory guidance for PK assays or not also needs to occur. If routinely recommended, there is potential for value added in that it would result in larger scale generation of data. However, true value would come from broad sharing of high quality data. How will the data be shared in a meaningful way and how will data quality be ensured? The latter question arises from the potential for misuse if the assessment is being done with an eye to passing rather than towards scientific understanding. As previously discussed, sample selection can dictate whether the assessment passes or fails and it would be unfortunate if samples that might prove to be the most informative in understanding the assay’s overall performance or limitations were overlooked or excluded. In order for guidance to ensure high quality it would have to include answers to the
Table 5. Parallelism summary. Topic
PK assays
Why perform parallelism assessment?
To demonstrate that the (incurred) sample dilution response curve is parallel to the standard concentration response curve
To aid data interpretation when confounding interferences are present in vivo
When to perform parallelism assessment?
How to perform parallelism assessment?
future science group
Biomarker assays
When it is necessary to understand the study data
Typically performed during in-study validation
Multiple dilutions of a high (enough) concentration (of therapeutic analyte) in an incurred sample are performed to generate several samples with concentrations that span the assay’s range
To demonstrate that the (endogenous analytecontaining) sample dilution response curve is parallel to the standard concentration response curve
To inform assay development and optimization
To set assay minimum required dilution
To set LLOQ for endogenous analyte
To assess assay selectivity Typically performed at the method development and pre-study validation phases
Multiple dilutions of a high concentration (of endogenous analyte) sample are performed to generate several samples with concentrations that span the assay’s range
www.future-science.com
Key questions
For PK assays, should parallelism assessments be recommended in the regulations for all PK assays or should it be dictated by a risk-based analysis?
Should the risk assessment be left in the hands of the individual scientists or in the hands of regulators?
Is a ‘one and done’ approach appropriate?
Should parallelism be revisited when there are changes in patient population and/or dosing regimen? Which samples should be selected (e.g., Cmax, later time points or both)?
How many samples should be evaluated?
Should samples be evaluated individually or pooled?
What acceptance criteria, if any, should be applied?
What does failure really mean?
195
P erspective |
Stevenson & Purushothama many questions posed herein. However, this is unrealistic as such guidance would be highly prescriptive and not broadly implementable. Instead, due consideration should be given to applying a scientifically supported fit-for-purpose approach to parallelism assessments for PK assays.
In order to reach the long-term goal of understanding and developing best practices for implementation of parallelism testing for both PK and biomarker assays, industry and regulators will need to keep the conversations going and commit to generating and reviewing data for the purposes
Executive summary Why perform parallelism assessment?
PK assays: to demonstrate that the (incurred) sample dilution response curve is parallel to the standard concentration response curve and to aid data interpretation when confounding interferences are present in vivo.
Biomarker assays: to demonstrate that the (endogenous analyte-containing) sample dilution response curve is parallel to the standard concentration response curve and to inform assay development and optimization: to set assay minimum required dilution; to set LLOQ for endogenous analyte; to assess assay selectivity.
When to perform parallelism assessment?
During assay development PK assays: the requirement for incurred samples limits the opportunity to utilize parallelism assessments during the assay development phase unless the assay is being changed or improved after one or more studies have been conducted. Biomarker assays: the availability of samples with endogenous analyte provides an opportunity to inform the course of the assay’s development. Allows a better understanding of what the assay limitations will be and how the resulting data may be used.
During prestudy validation PK assays: rarely possible to perform parallelism during prestudy validation due to lack of incurred samples. Biomarker assays: confirms and documents performance observed during method development, defines sensitivity for endogenous analyte and serves as the selectivity assessment.
During in-study validation PK assays: typically the first opportunity to conduct parallelism. Biomarker assays: many parallelism assessments typically performed prior to in-study validation. Whether additional evaluations in study samples will add value remains a question.
When it matters When it is necessary to understand the study data and/or to inform assay development and optimization (primarily biomarker assays).
How to perform parallelism assessment?
Multiple dilutions of a high concentration incurred sample are performed to generate several samples with concentrations that span the assay’s range.
Sample selection: individual versus pooled samples? PK assays: pooled samples may help develop a broad understanding of the assay’s ability to measure the biotherapeutic after in vivo exposure or to note whether relatively global biotransformation of the molecule has occurred in vivo. Individual samples used to investigate individual instances of aberrant PK results. Biomarker assays: individual samples frequently used during method development stage. Pooling of samples may be required with limiting special matrices such as cerebrospinal fluid.
Sample selection: which samples? PK assays: consider choosing high concentration samples (Cmax) as well as samples from later time points that may have different sample composition. Biomarker assays: since typically assessed during assay development, will depend upon which samples are available. Unclear if repetition of the assessment with study samples will provide additional insights – requires generation of data sets.
Acceptance criteria Criterion of ≤30% CV amongst the in-range measurements back-calculated to neat concentrations is frequently applied, but recognized as imperfect. Debate ongoing regarding the need for prescriptive criteria at all.
Future perspective
Parallelism as applied to ligand-binding assays is an evolving area.
Currently more questions than answers.
Industry and regulators will need to continue open and robust dialogue and data sharing to enable development of data-driven recommendations for best practices.
196
Bioanalysis (2014) 6(2)
future science group
Development, validation & implementation of PK & biomarker ligand-binding assays of mutual education. Means to easily share data in open forums to facilitate and build common understandings will need to continue and will be necessary in order to expedite resolution of many of the outstanding questions (Table 5). In our view, during this time that we as a community gather data and build understanding, having a strong scientific rationale for why, when and how we do parallelism assessments (or not), rather than approaching it as a ‘check box’ exercise, is critical.
| P erspective
Financial & competing interests disclosure The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties. No writing assistance was utilized in the production of this manuscript.
References Papers of special note have been highlighted as: n of interest nn of considerable interest 1
nn
2
3
nn
4
5
nn
European Medicines Agency, Committee for Medicinal Products for Human Use. Guideline on Bioanalytical Method Validation. European Medicines Agency, London, UK (2011). Pertains to regulatory or industry perspectives/positions on generally accepted practices for PK and biomarker assays. Boterman M, Doig M, Breda M et al. Recommendations on the interpretation of the new European Medicines Agency Guideline on Bioanalytical Method Validation by Global CRO Council for Bioana lysis (GCC). Bioanalysis 4(6), 651–660 (2012). DeSilva B, Garofolo F, Rocci M et al. 2012 White Paper on recent issues in bioanalysis and alignment of multiple guidelines. Bioanalysis 4(18), 2213–2226 (2012). Pertains to regulatory or industry perspectives/positions on generally accepted practices for PK and biomarker assays. van Amsterdam P, Companjen A, Brudny-Kloeppel M et al. The European Bioanalysis Forum community’s evaluation, interpretation and implementation of the European Medicines Agency Guideline on Bioanalytical Method Validation. Bioanalysis 5(6), 645–659 (2013). Lee JW, Devanarayan V, Barrett YC et al. Fit-for-purpose method development and validation for successful biomarker measurement. Pharm. Res. 23(2), 312–328 (2006). Pertains to regulatory or industry perspectives/positions on generally accepted practices for PK and biomarker assays.
6
nn
7
nn
8
n
9
Valentin MA, Ma S, Zhao A, Legay F, Avrameas A. Validation of immunoassay for protein biomarkers: bioanalytical study plan implementation to support pre-clinical and clinical studies. J. Pharm. Biomed. Anal. 55(5), 869–877 (2011).
DeSilva B, Smith W, Weiner R et al. Recommendations for the bioanalytical method validation of ligand-binding assays to support pharmacokinetic assessments of macromolecules. Pharm. Res. 20(11), 1885–1900 (2003). Pertains to regulatory or industry perspectives/positions on generally accepted practices for PK and biomarker assays. Myler HA, Given A, Kolz K, Mora JR, Hristopoulos G. Biotherapeutic bioanalysis: a multi-indication case study review. Bioanalysis 3(6), 623–643 (2011). Pertains to assay format considerations and sample composition (e.g., receptors, antidrug antibodies) considerations. Stevenson L, Kelley M, Gorovits B et al. Consensus and recommendations of the L2 Global harmonization team, large molecule specific assay operation for ligand binding assays in support of pharmacometrics. AAPS J. doi:10.1208/ s12248-013-9542-y (2013) (Epub ahead of print). Troubleshooting PEG-hGH detection supporting pharmacokinetic evaluation in growth hormone deficient patients. J. Pharmacol. Toxicol. Methods 61(2), 92–97 (2010). Pertains to assay format considerations and sample composition (e.g., receptors, antidrug antibodies) considerations.
11 Myler HA, Phillips KR, Dong H et al.
Validation and life-cycle management of a
future science group
12 Stevenson L, Garofolo F, DeSilva B et al.
2013 White Paper on recent issues in bioana lysis: ‘hybrid’ – the best of LBA and LCMS. Bioanalysis 5(23), 2903–2918 (2013).
Pertains to regulatory or industry perspectives/positions on generally accepted practices for PK and biomarker assays.
10 Myler HA, McVay S, Kratzsch J.
n
quantitative ligand-binding assay for the measurement of Nulojix®, a CTLA-4-Fc fusion protein, in renal and liver transplant patients. Bioanalysis 4(10), 1215–1226 (2012).
www.future-science.com
13 Lee JW, Kelley M, King LE et al.
Bioanalytical approaches to quantify ‘total’ and ‘free’ therapeutic antibodies and their targets: technical challenges and PK/PD applications over the course of drug development. AAPS J. 13(1), 99–110 (2013). n
Pertains to assay format considerations and sample composition (e.g., receptors, antidrug antibodies) considerations.
14 Gorovits B, Alley, SC, Bilic S et al.
Bioanalysis of antibody–drug conjugates: American Association of Pharmaceutical Scientists Antibody–Drug Conjugate Working Group position paper. Bioanalysis 5(9), 997–1006 (2013). 15 Ciotti S, Purushothama S, Ray S.
What is going on with my samples? A general approach to parallelism assessment and data interpretation for biomarker ligandbinding assays. Bioanalysis 5(16), 1941–1943 (2013). 16 Gottschalk PG, Dunn JR. Measuring
parallelism, linearity, and relative potency in bioassay and immunoassay data. J. Biopharm. Stat. 15(3), 437–463 (2005). 17 Hammond GW, Noble GR, Smith SJ. Lack of
parallelism in antibody responses as measured by enzyme immunoassay after infection due to influenza virus A/USSR/77 (H1N1). J. Infect. Dis. 146(6), 827 (1982). 18 Hurley WL, Rejman JJ. beta-Lactoglobulin
and alpha-lactalbumin in mammary secretions during the dry period: parallelism of concentration changes. J. Dairy Sci. 69(6), 1642–1647 (1986).
197
P erspective |
Stevenson & Purushothama
19 O’Connor KA, Brindle E, Shofer JB et al.
Statistical correction for non-parallelism in a urinary enzyme immunoassay. J. Immunoassay Immunochem. 25(3), 259–278 (2004).
198
20 Plikaytis BD, Holder PF, Pais LB, Maslanka
SE, Gheesling LL, Carlone GM. Determination of parallelism and nonparallelism in bioassay dilution curves. J. Clin. Microbiol. 32(10), 2441–2447 (1994).
Bioanalysis (2014) 6(2)
Website 101 Global Bioanalysis Consortium.
www.globalbioanalysisconsortium.org
future science group