Kidney failure, which requires dialysis or kidney transplantation, is the most significant long-term complication of CKD for clinicians, patients, and their caregivers. As such, clinical trials aiming to develop new therapies for CKD have traditionally used kidney failure as a component of a composite end point together with doubling of serum creatinine, equivalent to 57% GFR decline, which reflects a large decline in kidney function.1,2 These end points only occur after a prolonged disease course that may extend 10–20 years, a timeframe which is not feasible for clinical trials. Surrogate end points that reliably reflect established disease outcomes could facilitate the conduct of clinical trials with smaller sample size and shorter duration. Progress in the validation of surrogate end points has led to the inclusion of smaller declines in GFR than 57% as a component of a composite kidney end point and the use of the rate of GFR decline (GFR slope) in some settings for full drug approval.3–5
Number (%) of individual components of the hierarchical composite kidney end point in each trial. (A) DAPA-CKD trial. (B) CREDENCE trial. (C) FIDELIO-DKD trial. (D) SONAR trial. (E) RENAAL trial. (F) IDNT trial. (G) ALTITUDE trial. ACM, all-cause mortality; ALTITUDE, Aliskiren Trial in Type 2 Diabetes Using Cardio-Renal Endpoints; CREDENCE, Canagliflozin and Renal Events in Diabetes with Established Nephropathy Clinical Evaluation; DAPA-CKD, Dapagliflozin and Prevention of Adverse Outcomes in CKD; FIDELIO-DKD, FInerenone in reducing kiDnEy faiLure and dIsease prOgression in Diabetic Kidney Disease; IDNT, Irbesartan Diabetic Nephropathy Trial; RENAL, Reduction of Endpoints in Non-insulin-dependent diabetes mellitus with the Angiotensin II Antagonist Losartan; SONAR, Study Of diabetic Nephropathy with AtRasentan.
The conventional method to assess treatment effects in clinical trials of CKD progression is to define the end point as the time to the first event of the composite outcomes without taking into account the severity or clinical importance of that first event (analyzed using Kaplan–Meier estimates, log-rank tests, or Cox proportional hazards models). This is particularly important when the effects on the different components vary or when the components of less clinical effect occur earlier. For example, a participant experiencing a 50% reduction in GFR decline after 9 months is considered to have reached the composite end point. The clinically more impactful event (e.g., requirement for dialysis or kidney transplantation), which may occur later, is ignored in the primary analysis. Moreover, a participant reaching 50% GFR decline after 9 months is considered to have had a worse outcome than another participant reaching dialysis after 11 months. Thus, the components of the composite end point receive equal weight in the analysis, irrespective of their clinical importance. Another issue with the conventional kidney end points is that the estimated effect of an intervention is determined by the number of patients who reach the outcomes included in the composite end point. In clinical trials in nephrology, these are patients with a faster progression of kidney disease. Other patients who do not experience a sustained large (e.g., 50%) decline in kidney function only contribute exposure and time at risk to the analysis. Ideally, an end point should capture the effect of the intervention in all trial participants. The GFR slope provides an estimate of the effect of an intervention in all participants, both fast and slow progressors, even when they do not experience events included in the conventional composite outcome. However, the conventional composite clinical end point cannot incorporate such a continuous quantitative measure.
To overcome some of these limitations, new approaches for the analysis of composite end points are emerging which take into account the prioritization of the severity of the components and combining dichotomous end points and quantitative (continuous) measures.6 The flexibility of such new end points, in particular the combination of different types of outcomes and the hierarchical structure of the end point components, makes them an attractive alternative to the established kidney end point. We refer to the accompanying review for more background details of hierarchical composite end point (HCE). In brief, the novel HCE is analyzed using win odds (WOs), which describes the odds of a patient receiving the active treatment of having a more favorable outcome compared with a patient treated with control. For both hazard ratio (HR) and WOs, a value of 1 corresponds with the null hypothesis of no treatment effect. However, unlike for HR where a result of <1 is indicative of a favorable treatment effect, a WOs of >1 corresponds with a treatment benefit, by signifying that treated patients are more likely to have a favorable outcome, compared with control patients. The aim of this study was to develop and validate a novel kidney HCE using a WOs approach.
Methods Overall Study DesignIn this post hoc analysis, we used data from completed Phase 3 placebo-controlled randomized clinical trials that assessed the efficacy and safety of two sodium–glucose cotransport 2 inhibitors, a nonsteroidal mineralocorticoid receptor antagonist, an endothelin receptor antagonist, two angiotensin receptor blockers, and a direct renin inhibitor on composite end points of kidney failure or death due to kidney disease with GFR decline thresholds of 40%, 50%, and 57% (as prespecified in each trial7–13). We selected these clinical trials because they recruited patients with CKD and demonstrated varying effects on the primary composite kidney end point. We included the Dapagliflozin and Prevention of Adverse Outcomes in CKD (DAPA-CKD) (ClinicalTrials.gov NCT03036150), Canagliflozin and Renal Events in Diabetes with Established Nephropathy Clinical Evaluation (CREDENCE) (NCT02065791), FInerenone in reducing kiDnEy faiLure and dIsease prOgression in Diabetic Kidney Disease (FIDELIO-DKD) (NCT02540993), Study Of diabetic Nephropathy with AtRasentan (SONAR) (NCT01858532), Reduction of Endpoints in Non-insulin-dependent diabetes mellitus with the Angiotensin II Antagonist Losartan (RENAL) (NCT00308347), Irbesartan Diabetic Nephropathy Trial (IDNT) (NCT00317915), and ALTITUDE (NCT00549757) trials.
End Point DefinitionsWe compared treatment effects on the established kidney end point as defined in each trial with the hierarchical composite kidney end point. The definitions of the established kidney end points in each trial are shown in Supplemental Table 1. The defined hierarchical composite kidney end point accounts for the clinical effect of events. We defined the HCE as a composite end point including seven components which we ranked in order of highest to lowest effect as (1) all-cause mortality; (2) kidney replacement therapy defined as dialysis for at least 28 days or kidney transplantation; (3) sustained GFR <15 ml/min per 1.73 m2 for at least 28 days; (4) sustained GFR decline from baseline for at least 28 days of 57%; (5) 50%; (6) 40%; or (7) GFR slope. In a Supplemental Analysis, we assessed treatment effects on a kidney-specific HCE which was defined in the same way as the primary HCE without inclusion of all-cause mortality. In this Supplemental Analysis, patients who died contributed to the analysis with their event of highest priority before dying.
Statistical AnalysesThe patients were analyzed using the intention-to-treat principle: All patients were followed and analyzed irrespective of their compliance to the planned course of treatment and included in the analysis as randomized. Hence, the dichotomous outcomes occurring during the 36 months of follow-up were included in the analysis irrespective of treatment discontinuation. The follow-up duration varied among trials. Therefore, not every patient had 36 months of follow-up (Table 1). To account for the variable follow-up when constructing the HCE, we extended the follow-up for patients with a shorter follow-up by using the clinically most important outcome from the observed follow-up for the analysis on month 36.
Table 1 - Baseline characteristics Characteristic DAPA-CKD (N=4304) CREDENCE (N=4401) FIDELIO-DKD (N=5674) SONAR (N=3668) RENAAL (N=1513) IDNT (N=1715) ALTITUDE (N=8561) Enrollment period 2016–2018 2014–2017 2015–2018 2013–2018 1996–1999 1996–1999 2007–2010 Median follow-up, yr 2.4 2.6 2.6 2.2 3.4 2.6 2.7 Characteristic Age, yr (SD) 61.8 (12) 63.0 (9) 65.6 (9) 64.5 (8.8) 60.2 (7.4) 58.8 (7.7) 64.5 (9.8) Female sex, N (%) 1425 (33.1) 494 (33.9) 1681 (29.8) 946 (25.8) 557 (36.8) 367 (32) 2735 (31.9) Race, n (%) Asian 1467 (34.1) 877 (19.9) 1440 (25.4) 1198 (32.7) 252 (16.7) 51 (4.4) 2714 (31.7) Black 191 (4.4) 224 (5.1) 264 (4.7) 224 (6.1) 230 (15.2) 141 (12.3) 277 (3.2) Other 356 (8.3) 369 (8.4) 378 (6.7) 136 (3.7) 296 (19.7) 103 (9.0) 696 (8.1) White 2290 (53.2) 2931 (66.6) 3592 (63.3) 2110 (57.5) 735 (48.6) 853 (74.3) 4850 (56.7) BP, mm Hg (SD) Systolic 137.1 (17) 140.0 (16) 138.0 (14) 133.3 (15) 152.5 (19.3) 159.4 (20.0) 137.3 (16) Diastolic 77.5 (11) 78.3 (9) 76.0 (10) 71.5 (10) 82.4 (10.4) 86.9 (11.4) 74.2 (10) Body weight, kg (SD) 81.7 (21) 87.0 (20.7) 87.2 (20) 85.7 (20) 82.2 (20.7) 87.3 (18.8) 82.7 (19.4) Hba1c, % (SD) 7.06 (1.7) 8.3 (1.3) 7.7 (1.3) 7.8 (1.5) 8.48 (1.6) 8.1 (1.7) 7.79 (1.6) GFR, ml/min per 1.73 m2 (SD) 43.1 (12) 56.2 (18) 44.3 (13) 42.3 (14) 38.6 (12.4) 47.2 (17.8) 56.9 (22.5) GFR, ml/min per 1.73 m2, n (%) ≥60 454 (10.5) 1769 (40.2) 656 (11.6) 468 (12.8) 79 (5.2) 265 (23.1) 2783 (32.5) <60 3850 (89.5) 2632 (59.8) 5018 (88.4) 3191 (87.0) 1434 (94.8) 873 (76.0) 5776 (67.5) UACR, mg/g (IQR) 949 (477–1885) 927 (463–1833) 852 (446–1634) 828 (458–1556) 1245.5 (558–2544) 1354 (1054, 1748) 283 (56–889) UACR, mg/g, n (%) >1000 2079 (48.3) 2053 (46.7) 2480 (43.7) 892 (24.5) 857 (56.6) 912 (79.4) 1904 (22.2) ≤1000 2225 (51.7) 2348 (53.3) 3191 (56.2) 2771 (75.5) 656 (43.4) 224 (19.5) 6527 (76.2) Baseline medications, n (%) ACEi 1353 (31.4) 1922 (43.7) 1942 (34.2) 1319 (36.0) 737 (48.7) 507 (44.2) 3792 (44.3) ARB 2870 (66.7) 2480 (56.4) 3725 (65.7) 2391 (65.2) 105 (6.9) 33 (2.9) 4787 (55.9) Diuretics 1882 (43.7) 2057 (46.7) 3214 (56.6) 3157 (86.1) 878 (58) 547 (47.6) 5872 (68.6) Insulin 1598 (37.1) 2884 (65.5) 3637 (64.1) 2315 (63.1) 910 (60.1) 644 (56.1) 4850 (56.7) Statins 2794 (64.9) 3036 (69.0) 4215 (74.3) 2707 (73.8) 507 (33.5) 299 (26.0) 5576 (65.1)DAPA-CKD, Dapagliflozin and Prevention of Adverse Outcomes in CKD; CREDENCE, Canagliflozin and Renal Events in Diabetes with Established Nephropathy Clinical Evaluation; FIDELIO-DKD, FInerenone in reducing kiDnEy faiLure and dIsease prOgression in Diabetic Kidney Disease; SONAR, Study Of diabetic Nephropathy with AtRasentan; RENAL, Reduction of Endpoints in Non-insulin-dependent diabetes mellitus with the Angiotensin II Antagonist Losartan; IDNT, Irbesartan Diabetic Nephropathy Trial; ALTITUDE, Aliskiren Trial in Type 2 Diabetes Using Cardio-Renal Endpoints;; SD, standard deviation; Hba1c, hemoglobin A1c; UACR, urinary albumin/creatinine ratio; IQR, interquartile range; ACEi, angiotensin-converting–enzyme inhibitor; ARB, angiotensin receptor blocker.
aMean follow-up duration provided for RENAAL and IDNT.
We used proportional hazards (Cox) regression models to assess the effect of the active intervention compared with placebo on the risk for first relevant composite kidney end point. We stratified the Cox models for factors used at randomization and adjusted for covariates as originally defined in each clinical trial.
To estimate treatment effect on total GFR slope, we used a two-slope mixed-effects model accounting for acute and chronic phase of each trial, where the acute phase was the period up to the first postrandomization visit when the acute treatment effect on GFR was considered fully present.14 The model adjusts for baseline GFR and accounts for different sources of variation in GFR between and within participants and treatment arms. Only on-treatment observations were selected for analysis of GFR slope to avoid potential bias in GFR slope which may result because of early discontinuation of treatments with acute reversible effects in GFR. Patients not experiencing any of the dichotomous events defined in the hierarchy contributed to the analysis with their individual GFR slope obtained from this two-slope model.
The HCE is analyzed using WOs,15 an adaptation of win ratio16 to include ties (a tie is considered a half loss and a half win for each group).17 Every patient in the active group is compared with every patient in the control group, and the patient with an event of a higher priority (more severe) loses against the other patient. The hierarchical comparison of the components of the kidney HCE is shown in Supplemental Table 2. After all possible comparisons are completed, the total number of the wins of the active treatment, the total number of losses, and the total number of ties are used to derive win statistics. The WOs is defined as total number of wins plus half of the ties divided by the total number of losses and half of the ties. It should be noted that for the proposed kidney HCE, the proportion of ties is negligible because of the use of timing of events and continuous GFR slope, and hence, WOs is essentially equal to win ratio. To account for the differential follow-up times between patients, we performed a Supplemental Analysis where the shared follow-up of two patients was used to select the outcome with the highest priority in a pairwise comparison of patients.16 We calculated WOs and its 95% confidence interval (CI).18
Maraca plots were developed to visualize HCE combining multiple time-to-event outcomes with a single continuous outcome.19 A maraca plot is formed by end-to-end adjoining, from left to right by declining severity of uniformly scaled Kaplan–Meier plots of times to each dichotomous outcome among those without more severe outcomes, with superimposed boxplot of the continuous outcome. The maraca plot visualizes the contribution of components of an HCE over time.
ResultsPatient characteristics of the participants in each clinical trial are shown in Table 1. Mean age ranged between 59 and 66 years, mean GFR between 39 and 57 ml/min per 1.73 m2, and median urinary albumin/creatinine ratio between 283 and 1354 mg/g. In all clinical trials, baseline characteristics were well balanced across randomized patient groups.7–13
Contributions of Individual Components to the Composite End PointThe components of the original primary kidney end point in each trial are shown in Supplemental Table 3. Declines in GFR of 40%, 50%, or 57% were the most common components of the primary composite kidney end point in each trial. For the HCE, all-cause mortality and 40% eGFR decline were the most common components (Figure 1). In comparing the original primary kidney end point from each trial with the HCE, the latter included more kidney failure events because all such events were included in patients who did not die during the observation period (Table 2 and Supplemental Table 3).
Table 2 - Comparison of time to first event analysis and win odds in the seven selected trials DAPA-CKD CREDENCE FIDELIO-DKD SONAR RENAAL IDNT ALTITUDE Treatment Comparisons Dapagliflozin versus Placebo Canagliflozin versus Placebo Finerenone versus Placebo Atrasentan versus Placebo Losartan versus Placebo Irbesartan versus Placebo Aliskiren versus Placebo n HR (95% CI) n HR (95% CI) n HR (95% CI) n HR (95% CI) n HR (95% CI) n HR (95% CI) n HR (95% CI) Event All-cause mortality 247 0.69 (0.53 to 0.88) 369 0.83 (0.68 to 1.02) 463 0.90 (0.75 to 1.07) 162 0.80 (0.67 to 0.96) 313 1.02 (0.81 to 1.27) 180 0.92 (0.69 to 1.23) 734 1.07 (0.92 to 1.23) Kidney replacement 174 0.66 (0.49 to 0.90) 176 0.74 (0.55 to 1.00) 258 0.86 (0.67 to 1.10) 287 0.70 (0.55 to 0.88) 341 0.71 (0.58 to 0.88) 183 0.77 (0.57 to 1.03) 229 1.09 (0.84 to 1.41) GFR <15 ml/min per 1.73 m2 204 0.67 (0.51 to 0.88) 203 0.60 (0.45 to 0.80) 366 0.82 (0.67 to 1.01) 114 0.76 (0.52 to 1.10) 409 0.76 (0.62 to 0.91) 196 0.61 (0.46 to 0.81) 175 1.12 (0.83 to 1.51) 57% GFR decline 201 0.61 (0.46 to 0.82) 156 0.41 (0.29 to 0.57) 412 0.68 (0.55 to 0.82) 103 0.62 (0.42 to 0.92) 359 0.74 (0.60 to 0.92) 166 0.65 (0.48 to 0.89) 304 1.10 (0.88 to 1.37) 50% GFR decline 313 0.53 (0.42 to 0.67) 262 0.53 (0.41 to 0.69) 638 0.73 (0.62 to 0.85) 193 0.58 (0.44 to 0.78) 443 0.80 (0.67 to 0.97) 248 0.61 (0.47 to 0.79) 468 1.08 (0.90 to 1.30) 40% GFR decline 538 0.63 (0.53 to 0.74) 454 0.59 (0.48 to 0.71) 1056 0.81 (0.72 to 0.92) 329 0.81 (0.65 to 1.01) 598 0.88 (0.75 to 1.04) 400 0.83 (0.68 to 1.01) 832 1.12 (0.98 to 1.28) GFR slopea 1.12 (0.80,1.43) 1.66 (1.30,2.00) 0.64 (0.40 to 0.89) 0.60 (0.23 to 0.97) 1.08 (0.40 to 1.76) 1.10 (0.47 to 1.74) -0.30 (-0.6 to 0.01) Treatment effect composite end point HR (Cox) 0.61 (0.51 to 0.73) 0.70 (0.59 to 0.82) 0.82 (0.73 to 0.93) 0.71 (0.58 to 0.88) 0.79 (0.66 to 0.94) 0.74 (0.59 to 0.94) 1.08 (0.95 to 1.23) WOsb 1.41 (1.32 to 1.52) 1.48 (1.38 to 1.58) 1.26 (1.19 to 1.34) 1.16 (1.07 to 1.25) 1.13 (1.00 to 1.27) 1.17 (1.02 to 1.34) 0.84 (0.80 to 0.88)Hazard ratios for the composite end point and components were calculated using Cox proportional hazards regression models. The win odds for the kidney hierarchical composite end point are shown in the bottom row. Values are n (%). Hazard ratios were calculated using Cox proportional hazards regression models and were adjusted for covariates as described in the primary publication of each trial. DAPA-CKD, Dapagliflozin and Prevention of Adverse Outcomes in CKD; CREDENCE, Canagliflozin and Renal Events in Diabetes with Established Nephropathy Clinical Evaluation; FIDELIO-DKD, FInerenone in reducing kiDnEy faiLure and dIsease prOgression in Diabetic Kidney Disease; SONAR, Study Of diabetic Nephropathy with AtRasentan; RENAL, Reduction of Endpoints in Non-insulin-dependent diabetes mellitus with the Angiotensin II Antagonist Losartan; IDNT, Irbesartan Diabetic Nephropathy Trial; ALTITUDE, Aliskiren Trial in Type 2 Diabetes Using Cardio-Renal Endpoints; HR, hazard ratio; CI, confidence interval; WO, win odd.
aIn each trial, the total GFR slope is defined as the annual decline in GFR from randomization until 36 months of follow-up time. Inclusion criteria differ between trials and do not allow direct comparison of results.
bWin odds were computed in a hierarchy: all-cause mortality, kidney replacement, GFR <15 ml/min per 1.73 m2, 57%, 50%, 40% GFR decline, and GFR slope.
In the DAPA-CKD, CREDENCE, FIDELIO-DKD, and SONAR trials, 15%–20% of participants experienced one of the time-to-event outcomes (death, kidney failure, or a 57%, 50%, or 40% GFR decline) during the follow-up. GFR slope contributed to the end point in the remaining 80%–85% of participants in each trial. By contrast, in the RENAAL and IDNT trials, approximately 50% of all participants died, experienced kidney failure, or an end point based on the different GFR thresholds.
Comparison of HR and WOsThe effects of the interventions on each component of the HCE, analyzed with Cox proportional hazards regression, were broadly consistent except that in the SONAR, RENAAL, and IDNT trials; the interventions reduced the risks of the kidney-related components but did not reduce the risk of all-cause mortality (Table 2). The effects on GFR slope in all trials were directionally similar when compared with the effects on kidney end points (Table 2).
The WOs in the six clinical trials that had shown a reduction in the risk of the primary composite kidney end point ranged from 1.13 to 1.41 indicating a more favorable outcome for a patient assigned to active compared with placebo treatment (Table 2). In the ALTITUDE trial, which showed no risk reduction of the kidney end point, the WOs was <1 (0.84), also indicating no benefit.
To visualize treatment effects for hierarchical composite kidney end points, we developed maraca plots (Figure 2). These plots can be used to visualize an HCE combining multiple time-to-event outcomes and a single continuous outcome. The maraca plot shows the cumulative percentages of patients experiencing each dichotomous outcome during the fixed follow-up period, among those who avoid worse outcomes during that period combined with a box–whisker plot showing the median and 25th–75th percentiles of the continuous GFR slope distribution for patients without a dichotomous event in each treatment group. The maraca plot demonstrates fewer dichotomous outcomes in the active compared to placebo group in all trials except for the ALTITUDE trial which did not demonstrate efficacy of active treatment. The box–whisker component demonstrates that the median rate of GFR decline for patient not experiencing dichotomous outcomes was slower (shift to the right in the maraca plot) in the active compared with the placebo group in all trials except for the ALTITUDE trial.
Maraca plots in each trial. The Maraca plots show the contribution of the different time-to-event end point components; the treatment effect on the different time-to-event components of the composite; and the treatment effect on the continuous GFR slope component, for patients not experiencing any of the dichotomous outcomes. (A) DAPA-CKD trial. (B) CREDENCE trial. (C) FIDELIO-DKD trial. (D) SONAR trial. (E) RENAAL trial. (F) IDNT trial. (G) ALTITUDE trial. CI, confidence interval.
In comparing the HCE with the original primary kidney trial end points, we observed similar directions and magnitudes of the treatment effect estimates (Figures 3 and 4 and Table 2). For example, in DAPA-CKD, the HR for the primary outcome of sustained 50% GFR decline, kidney failure, or renal death was 0.61 (95% CI, 0.51 to 0.73). The WOs for the HCE was 1.41 (95% CI, 1.32 to 1.52; Figures 3 and 4 and Table 2). Similarly, in FIDELIO-DKD, the HR for the primary outcome of sustained 40% GFR decline, kidney failure, or renal death was 0.82 (95% CI, 0.73 to 0.93), and the WO was 1.26 (95% CI, 1.19 to 1.34). Removing all-cause mortality from the kidney HCE did not substantially alter the results, but in some trials led to numerically higher WOs (Supplement Figures 1 and 2). The WOs from a shared follow-up approach demonstrated similar results compared with our main analyses (Supplemental Table 4), which supports the robustness of our findings. For example, the WOs using a shared follow-up time in the DAPA-CKD trial was 1.42 (95% CI, 1.32 to 1.52) versus 1.41 (95% CI, 1.32 to 1.52) in our main analysis (Supplemental Table 5).
Forest plot of the treatment effects on primary kidney end point as defined in each trial, GFR slope, and HCE. The left forest plot shows the HR of the treatment effect (time-to-first event) on the primary kidney end point; the forest plot in the middle, the treatment effect on GFR slope; and the right forest plot, the WOs of the treatment effect on the HCE. HCE, hierarchical composite end point; HR, hazard ratio; WO, win odd.
Scatter plot of treatment estimates on the primary kidney end point and HCE.
Sample SizeFigure 5 compares the sample size requirements and statistical power of the novel HCE using bootstrap resampling of clinical trials with the original primary kidney end point and GFR slope to detect the observed treatment effect for each end point. The resampling procedure used 1000 iterations at each sample size (n=200, 500, increments of 500 until 3000). In four of the six trials that reported a benefit of the examined intervention (except the RENAAL and IDNT trials), the sample size req
Comments (0)