Evaluating the Role and Policy Implications of Using External Evidence in Survival Extrapolations: A Case Study of Axicabtagene Ciloleucel Therapy for Second-Line DLBCL

3.1 Targeted Qualitative Assessment of Agency Appraisals

The available evidence submitted by the company during each HTA is detailed below to provide context on the assessments from the eight HTA agencies. It should be noted that the company did not explicitly leverage an analytical approach to incorporating external evidence in their survival extrapolations, such as using a Bayesian framework or displaying the two trials graphically. However, the company did use the external evidence to qualitatively justify their preferred axi-cel OS distribution [7,8,9,10,11]. The approach that HTA agencies took to selecting a plausible and reflective axi-cel OS parametric distribution and cure fraction is described in this section. The axi-cel OS distributions and cure fractions used in the company’s base-case submissions, as well as those preferred by HTA agencies, are presented in Table 1.

Table 1 Key information in the company HTA submissions [7,8,9,10,11]3.1.1 Available Evidence Submitted by the Company in the HTA Appraisals

Appraisals were based on ZUMA-7, a pivotal phase III trial, where axi-cel demonstrated improved OS versus SoC (salvage chemoimmunotherapy then HDC-ASCT for responders) [25]. Different ZUMA-7 data cuts were used for the HTA in each country, depending on the availability of evidence at the time of submission. The Interim and Primary analyses used 25-month and 47-month follow-up OS, respectively. In the Interim analysis, the median OS was not reached in the axi-cel group and was 25.7 months (95% confidence interval 17.6, not estimable) in the SoC group [35]. In the Primary analysis, the median OS was not reached for the axi-cel group and was 31.1 months (95% confidence interval 17.1, not estimable) in the SoC group. The evidence that was submitted by the company is summarised in Table 1.

The modelling approach used in all submissions was the same; a three-state partitioned-survival model using event-free survival and OS data, utilising mixture-cure modelling for both axi-cel and SoC. When using mixture-cure modelling, a proportion of the population was considered ‘cured’. This approach was accepted by all agencies. The Interim and Primary OS curves, Kaplan–Meier and cure fractions are presented in Fig. 2. As can be seen, data from the Primary analysis of ZUMA-7 greatly reduced the spread of long-term survival extrapolations relative to the Interim analysis. All of the OS distributions converged with the Primary analysis, including the cure fractions (ranges of 24–54% for the Interim analysis converging to 50–54% for the Primary analysis).

Fig. 2figure 2

Axi-cel overall survival: Interim ZUMA-7 (left) and Primary ZUMA-7 (right). MCM mixture-cure modelling

Evidence on the long-term durability (median follow-up of 63.1 months) and curative potential for axi-cel as a 3L+ DLBCL treatment was available from a prior single-arm trial, ZUMA-1 [26, 27]. ZUMA-1 was used in the 2L DLBCL HTA submissions to demonstrate the plausibility of cure, as well as to provide an understanding on the consistency between cure fractions from earlier to later data cuts [29, 36].

As described prior, the survival data from ZUMA-1 could be used to inform which Interim ZUMA-7 axi-cel survival extrapolations are clinically appropriate. ZUMA-1 included people with DLBCL who were receiving their 3L+ treatment. The company considered ZUMA-1 data generalisable to ZUMA-7 because the indication was the same albeit at an earlier line of treatment. The key differences between ZUMA-1 and ZUMA-7 were that the ZUMA-1 population had a worse prognosis at baseline [37]. The populations matched closely in key characteristics, such as age, sex and the proportion receiving bridging therapy [27, 35].

Figure 3 displays both the Interim ZUMA-7 axi-cel OS curves and the ZUMA-1 log-logistic axi-cel OS curve, to provide a criterion for determining clinical plausibility of the ZUMA-7 survival extrapolations. The ZUMA-1 log-logistic extrapolation was included as it had the lowest Akaike information criterion and Bayesian information criterion of all the distributions.

Fig. 3figure 3

Axi-cel OS: Interim ZUMA-7 extrapolations based on parametric distributions and ZUMA-1 Kaplan–Meier (KM) and the extrapolation based on the log-logistic distribution. MCM mixture-cure modelling

The Interim OS curves that were viewed as not clinically appropriate by the company were those where the cure fraction or survival curve were lower than those in ZUMA-1. The exponential, and log-normal ZUMA-7 distributions predicted worse survival to that of ZUMA-1 (the shaded area below the ZUMA-1 log-logistic curve). Further, the cure fractions for exponential and log-normal were 24% and 25%, respectively, which were substantially lower than ZUMA-1 (41–45%). Further, the Interim log-logistic extrapolation was excluded from consideration by the company because it predicted a similar survival and cure fraction (44%) to that of the ZUMA-1 log-logistic distribution (42%). ZUMA-7 included a population with improved prognosis in an earlier line of treatment, and it was considered unlikely that the survival predictions would be similar to that of a later-line treatment population.

3.1.2 Submissions Using Interim Data Only

The NICE External Assessment Group experts acknowledged the plausibility in the choice of OS distribution and the relevance of ZUMA-1 data. The External Assessment Group ruled out the most pessimistic curves [log-normal (24%) and exponential (25%)] as not clinically appropriate. The decision came down to choosing between the remaining log-logistic and the three grouped curves (generalised gamma, Weibull and Gompertz) because Gamma was not included in these Interim submissions. These four curves were considered “equally clinically plausible” options by the External Assessment Group. NICE considered the log-logistic model (44%) to represent axi-cel OS most appropriately, and stated that despite being a conservative choice, it did not violate external validity (with ZUMA-1 survival at 5 years).

The Direktoratet for Medisinske Produkter (DMP) and Canada’s Drug Agency-L’Agence des Médicaments du Canada (CDA-AMC) did not formally utilise the ZUMA-1 survival data to inform the axi-cel OS extrapolation selection. The DMP deemed the ZUMA-7 data to be too immature to estimate the most likely cure fraction, stating that more follow-up data were required to establish the long-term magnitude of OS benefit. Further, the DMP stated that the ZUMA-7 axi-cel OS data could not be externally validated, but provided no rationale as to why ZUMA-1 was not considered. Ultimately, during HTA engagement, a description of the Primary OS analysis became available and DMP agreed with the company base-case parametric model choice.

CDA-AMC stated that it was unknown whether the ZUMA-1 results were generalisable to a 2L population. CDA-AMC considered the ZUMA-7 cure fractions that were used to justify the choice of curves to be highly optimistic, citing a comparison in cure fractions of 52% at the 2-year OS follow-up (ZUMA-1) and 30–50% at the 5-year event-free survival follow-up in 3L+ DLBCL (real-world evidence for tisagenlecleucel, an alternative CAR T treatment). This comparison could be considered inappropriate because the reported cure fractions were for 3L+ DLBCL and for event-free survival (rather than OS), for a population with lower anticipated survival, and with a different CAR T treatment. CDA-AMC considered the log-logistic model (44%) to represent axi-cel OS most appropriately, based largely on the goodness-of-fit statistics and clinical expert opinion.

3.1.3 Submissions Using Interim and Primary Data

Both the Medical Services Advisory Committee (MSAC) and Tandvårds-och läkemedelsförmånsverket (TLV) approached the Interim submissions in a conservative way and without considering the evidence in full. ZUMA-1 was not considered when selecting an axi-cel OS distribution. ZUMA-1 was not directly considered in the assessment, as MSAC stated that it was a single-arm study in a later line of therapy and offered limited information for comparison with SoC for longer term outcomes in the 2L setting. MSAC did not select a base-case OS distribution for axi-cel, citing too much uncertainty in the ZUMA-7 data.

The TLV did not formally assess the Interim analysis until the Primary analysis was available. For the second submission with the Primary analysis, which were supplemented with the Interim analysis, the TLV preferred the “most conservative form of distribution, log-normal” for axi-cel OS to account for the uncertainty.

3.1.4 Submissions Using Primary data Only

The submissions to the Haute Autorité de santé (HAS), Danish Medicines Council (DMC) and Zorginstituut Nederland (ZiN) all included Primary data, with no prior submission using the Interim data. Both HAS and the DMC used the parametric distribution that the company submitted with, which was generalised gamma (54%) and gamma (54%), respectively. ZiN preferred the exponential OS distribution for axi-cel, despite exponential ranking lowest with respect to goodness-of-fit statistics. This was selected intentionally to reflect a conservative estimate, as to counter the uncertainty in ZUMA-7 OS data where median survival was not met.

3.1.5 Summary of the Qualitative Assessment of Agency Appraisals

To address perceived uncertainty in the survival data, NICE, CDA-AMC, TLV and ZiN selected conservative axi-cel OS distributions with varied degrees of clinical and statistical justification. For the submissions with the Interim analysis specifically, external evidence was not considered to the full extent and, consequently, the log-logistic axi-cel OS distribution was selected by NICE and CDA-AMC. However, in retrospect, the Interim log-logistic distribution proved to underestimate the axi-cel survival benefit by a substantial margin (Fig. 4), whilst the company-preferred OS curve (generalised gamma) from the Interim analysis was reflective of long-term survival with Primary OS data. The Interim generalised-gamma curve was only considered appropriate by DMP when validated with the Primary OS data to confirm the long-term axi-cel OS.

Fig. 4figure 4

Axi-cel overall survival: Primary ZUMA-7 distributions, with the ZUMA-7 Interim company and agency preferred curves. CDA-AMC Canada’s Drug Agency-L’Agence des Médicaments du Canada, KM Kaplan–Meier, MCM mixture-cure modelling, NICE National Institute for Health and Care Excellence

ZUMA-1 was used inconsistently across all assessments. ZUMA-1 was not used beyond clinical plausibility considerations by any agency or by the submitted company. ZUMA-1 was utilised by NICE to exclude two distributions; the four other agencies that assessed Interim data did not consider ZUMA-1 in their choice of preferred OS distribution for axi-cel. The agencies opted to wait for more information (the Primary analysis). It has not been reported in the literature or during an HTA how the decision to include external evidence to varied degrees or to wait for longer term data impacts population health.

3.2 Quantifying the Cost and Benefits of Waiting3.2.1 Quantifying the Uncertainty Reduction

In Table 2, the per-person EVPI in the Interim analysis for each axi-cel OS curve is presented in Swedish Krona. The difference in EVPI between the Primary and the Interim analyses reflect the expected gain from reducing uncertainty through additional data collection; the higher the difference in EVPI, the greater the reduction in uncertainty. The differences in EVPI between the Interim and Primary analyses are presented relative to each respective curve, as well as relative to the average across all axi-cel OS distributions in Table 2. Figure 5 displays this information graphically. The results show that there is a substantial fall in the EVPI as the ZUMA-7 data matured. The average EVPI reduces from 310,755 kr (approximately €28,200) in the Interim analysis to 89,853 kr (approximately €8100) in the Primary analysis. This 71.09% decrease in the average EVPI demonstrates that the additional maturity of the Primary analysis reduced uncertainty by a large degree.

Table 2 Extent of uncertainty reduction demonstrated through the difference in the EVPI between Interim and Primary ZUMA-7 analysesFig. 5figure 5

Uncertainty reduction between the Interim and Primary analyses. EVPI expected value of perfect information

Table 2 also presents the average EVPI when the individual OS curves are grouped to reflect the views of the company and each agency on the OS estimates for axi-cel in the Interim analysis. The views of the company were supported by a more extensive use of the external evidence, where log-normal and exponential distributions were excluded and log-logistic was considered to be a less viable selection that the other remaining distributions. This was because the company’s view was that survival would be improved in earlier lines of treatment relative to later lines. Whereas each agency either did not consider this external evidence in the same capacity or they did not utilise it to remove the most pessimistic curves. The average EVPI falls as the curves with lower OS are excluded (their exclusion is supported through the consideration of ZUMA-1). For example, if comparing a scenario where only gamma is excluded to a scenario where log-logistic, exponential, gamma and log-normal are excluded (both scenarios in the Interim analysis), then the difference in the EVPI between the Interim and Primary analysis falls from 231,881 to 94,601 kr (a 59% reduction). This demonstrates that using ZUMA-1 data to exclude clinically inappropriate distributions reduced the value of a more mature follow-up and, therefore, decreased uncertainty in the decision (lower EVPIs). This means a lower likelihood of a suboptimal decision (i.e., a decision where the intended outcomes do not match the actual outcomes).

3.2.2 Quantifying the Costs and Benefits of Delayed Decisions

Table 3 reflects the competing dynamic of lowering the value of further evidence whilst considering the cost of collecting information within plausible time frames. As uncertainty in the Interim analysis reduces, the value in waiting for information will decrease. Further, the longer the time frame to collect evidence, the more costly it is to population health (through delayed access). It should be noted that 1.83 years (or 22 months) is the difference in the median follow-up between the Interim and Primary analysis data. There are two possible results for whether it was advantageous (or not) for the HTA agencies to wait until further evidence gathering was complete (ZUMA-7 Primary analysis). In instances where the answer is ‘Yes’, then the value of waiting is higher than the cost of waiting, resulting in a net gain of health between the stated time frames. In instances of ‘No’, then the value of waiting does not outweigh the cost of waiting, resulting in a net loss of health.

Table 3 Quantitative analysis results: was the value of waiting higher than the cost of waiting?

As described prior, the more ZUMA-1 is considered with respect to axi-cel OS distribution selection, the greater the reduction of uncertainty in the Interim decision. The value of collecting information is subsequently lowered because there is less uncertainty to be mitigated by further evidence gathering. Therefore, if ZUMA-1 were not available to inform axi-cel OS distribution selection, further evidence collection is more valuable than without it. Not considering external evidence (where no OS extrapolations are excluded) in favour of waiting for further evidence collection is only beneficial to population health when the time frame to evidence gathering is low enough (in 2 years or less). If the wait is over 3 years, the cost of waiting exceeds the value of waiting in all presented scenarios and waiting for the information would negatively impact population health. Utilising ZUMA-1 in full by excluding the log-normal, exponential and log-logistic distributions from the Interim analysis (the company’s preferred analysis) reduces the uncertainty in the Interim analysis and lowers the value of further evidence collection. This information provided by the Primary analysis was not worth waiting for unless it could be collected in 6 months or less.

When considering the NICE assessment specifically, it depends on what the committee considered as plausible during the final decision, which is not clear in the submission documents [20]. To reflect this, each perspective is presented as ‘NICE (1)–(3)’ in Table 3. It is, however, understood that log-normal and exponential distributions were excluded as survival was predicted to be lower than that of ZUMA-1 survival. The log-logistic curve, however, was closer to the ZUMA-1 data (see Fig. 3), and was considered amongst the clinically plausible options. When log-logistic was considered equally plausible as the other three remaining distributions (NICE-1), then the information is worth waiting for if the time to collect the information is 1 year or less. If the generalised gamma curve and log-logistic curve were considered equally plausible (NICE-2), the value of collecting the information is substantial because of increased uncertainty in the Interim analysis. The value of collecting the information outweighs the cost of collecting the information only if the wait is 1.83 years (or 22 months) or less, otherwise this decision negatively impacts population health. The selection of the log-logistic curve (NICE-3) increases the uncertainty in the Interim analysis (relative to all other scenarios in Table 3) and, therefore, raises the value of waiting substantially. A wait for evidence collection over 3 years is estimated to negatively impact population health.

Comments (0)

No login
gif