Trial Design and Oversight
STHLM3-MRI was a prospective, randomized, population-based trial in men 50 to 74 years of age that evaluated various screening strategies for prostate cancer detection. Here, we report the findings of prespecified analyses in which we evaluated whether MRI followed by targeted and standard biopsy in participants in whom MRI indicated the presence of prostate cancer (experimental biopsy group) was noninferior to standard biopsy (standard biopsy group) for detecting clinically significant prostate cancer in men undergoing prostate cancer screening. Details of the trial design have been published previously13,14; details of the trial design and statistical analysis are provided in the Supplementary Appendix, which, along with the protocol (which adheres to the SPIRIT 2013 statement15), is available with the full text of this article at NEJM.org.
The STHLM3-MRI trial used a design that combined a paired-screen-positive step (in which two screening tests were used for all participants) and random assignment to either the experimental biopsy group or the standard biopsy group for all participants who had positive results on either of the two screening tests.16-18 In the paired step, we used a PSA test and the Stockholm3 test to assess the risk of prostate cancer among enrolled participants. The Stockholm3 test is a risk-prediction model that is based on clinical variables (age, first-degree family history of prostate cancer, and previous biopsy), blood biomarkers (total PSA, free PSA, ratio of free PSA to total PSA, human kallikrein 2, macrophage inhibitory cytokine-1, and MSMB), and a polygenic risk score for predicting the risk of prostate cancer with a Gleason score of 7 or higher.19,20 Participants with elevated PSA levels (≥3 ng per milliliter) or Stockholm3 scores (≥11%) were randomly assigned, in a 2:3 ratio, to the standard biopsy group or the experimental biopsy group (with the use of computer-generated blocks of five and stratified according to six cancer-risk strata, defined according to Stockholm3 risk distribution). This design establishes an analytic framework in which multiple different screening workflows (or strategies) can be compared according to combinations of conditions for biopsy referral (i.e., PSA only, Stockholm3 score only, or PSA or Stockholm3 or both) and biopsy method (i.e., standard, MRI-targeted, or MRI-targeted plus standard [with biopsy in the experimental group performed only in men with MRI results suggestive of cancer]).
The analyses reported here examined the safety and efficacy of standard biopsy as compared with a strategy that used MRI-targeted biopsy in a screening workflow in which the only condition for referral for MRI or standard biopsy was a PSA level of 3 ng per milliliter or greater, in accordance with the condition used in the European Randomized Study of Screening for Prostate Cancer (ERSPC). ERSPC provided level 1 evidence of lower prostate cancer mortality among men who were invited to undergo organized PSA screening than among men who were not invited to undergo screening.1,2 In other words, although either an elevated PSA level (≥3 ng per milliliter) or a positive result on the Stockholm3 test was used as the condition for random assignment and subsequent biopsy, the analysis presented here includes only participants who underwent randomization and who had PSA levels of 3 ng per milliliter or greater, irrespective of their Stockholm3 results. Results of analyses of other workflows, including the use of PSA as compared with the Stockholm3 test for biopsy referral, are not shown here.
The trial was approved by the regional ethics review board in Stockholm and monitored by an independent data and safety monitoring board (Section S2 of the Supplementary Appendix). Reporting adhered to START (Standards of Reporting for MRI-targeted Biopsy Studies) and CONSORT (Consolidated Standards of Reporting Trials) guidelines.21,22 The trial was designed by the authors, and data were collected by trial consortium members. The authors assume responsibility for the accuracy and completeness of the data and for the fidelity of the trial to the protocol. No one who is not an author contributed to the writing of the manuscript. The Swedish Research Council and the Swedish Cancer Society funded the trial but had no role in protocol development, data analysis or interpretation, or manuscript preparation.
Men 50 to 74 years of age living in Stockholm County, Sweden, were randomly selected by Statistics Sweden and invited by mail to participate. Men with a previous diagnosis of prostate cancer, a prostate biopsy within 60 days before the invitation, a contraindication to MRI, or severe illness (e.g., metastatic cancer, severe cardiovascular disease, or dementia) were not eligible to participate. Men who had undergone a previous prostate biopsy more than 60 days before the invitation as well as men who had never undergone a prostate biopsy were eligible to participate. Assessment of eligibility, documentation of informed consent, and evaluation of baseline characteristics were conducted through a secure Web portal, and digital laboratory referrals were created automatically.
The prespecified intention-to-treat population for this analysis included all participants with PSA levels of 3 ng per milliliter or greater who underwent randomization. The per-protocol population included participants with PSA levels of 3 ng per milliliter or greater who underwent randomization, adhered to their assigned intervention, and had complete data (MRI and pathology reports). Full details regarding the per-protocol population are provided in the Supplementary Appendix.
Participants provided blood samples (12 ml of blood plasma in EDTA collection tubes) at one of 60 laboratories in Stockholm County; at each laboratory, a trial nurse verified that the participant did not meet any exclusion criteria and that he understood the informed consent form. PSA was analyzed (B.R.A.H.M.S Kryptor compact PLUS) for all participants. Participants with PSA levels of less than 1.5 ng per milliliter were considered to be at low risk for clinically significant prostate cancer and were recommended to repeat testing in 6 years. For participants with PSA levels of 1.5 ng per milliliter or greater, the Stockholm3 test was performed at the A23 Laboratory (Uppsala, Sweden) as described previously.19,20 Men with PSA levels that were 1.5 ng per milliliter or higher but less than 3 ng per milliliter and who had Stockholm3 scores of less than 11% were judged to have nonelevated risk, did not undergo randomization, and were recommended to repeat testing in 2 years.
All biopsies were performed by experienced urologists (each of whom had performed >200 procedures) at one of four participating clinics. Men undergoing biopsy were given a prophylactic antibiotic (oral ciprofloxacin, 750 mg). Participants in the standard biopsy group underwent standard transrectal ultrasonography–guided prostate biopsies to obtain 10 to 12 biopsy cores from the peripheral zone of the prostate (apical, midgland, and base).
In the experimental biopsy group, T2- and diffusion-weighted images were obtained with the use of a biparametric (i.e., combined T2- and diffusion-weighted imaging without contrast enhancement) MRI protocol developed for high-throughput screening (<16 minutes), with 1.5T Magnetom Aera (Siemens) and 3T SIGNA Architect (GE Healthcare) scanners, without endorectal coil (details of the MRI protocol and quality control are provided in Section S3). Radiology readings were performed at Capio St. Göran’s Hospital, Stockholm, by three uroradiologists; consensus by at least two radiologists was required for each case. Regions suggestive of prostate cancer were scored according to Prostate Imaging Reporting and Data System (PI-RADS), versions 2.0 and 2.1, on a scale of 1 to 5, with higher scores indicating more clinically suspicious lesions; scores of 3 to 5 defined a positive MRI. A maximum of three clinically significant lesions were identified per participant and delineated for targeted biopsy with the use of dedicated software (MIM Symphony DX, MIM Software). For quality-control purposes, an external uroradiologist, who was unaware of the PI-RADS scores assigned by the study uroradiologists, reviewed 99 of the biparametric MRIs, randomly sampled by PI-RADS score. If no clinically significant lesions were identified, biopsies were not performed except in cases of Stockholm3 test scores of 25% or greater (which indicated a high risk of clinically significant cancer despite a negative MRI).23 Otherwise, we used the MRI-fusion technique (bkFusion, BK Medical) to perform transrectal sampling of 3 to 4 biopsy cores targeting each significant lesion. The urologist also obtained a standard 10-to-12–core biopsy specimen immediately after the targeted biopsy.
Pathological assessments were performed at Unilabs pathology unit (Capio St. Göran’s Hospital, Stockholm) by one of four experienced uropathologists. Gleason score and number of millimeters of cancer in each biopsy core were reported for each core according to International Society of Urological Pathology 2014 guidelines.24 The overall Gleason score was reported for each case and for each biopsy method; the reported score for the combined biopsy was the highest overall score across the two biopsy methods.
The primary outcome was the probability of detection of clinically significant prostate cancer, defined as the percentage of participants in each group who received a diagnosis of cancer with a Gleason score of 3+4 or greater (International Society of Urological Pathology grade ≥2). The Gleason score is composed of a primary (most predominant) grade plus a secondary (highest nonpredominant) grade; the sum is reached by adding the primary and secondary grades. The Gleason sum ranges from 6 to 10, with higher scores indicating a more aggressive form of prostate cancer. Secondary outcomes included the detection probabilities (i.e., proportions) of benign biopsies, clinically insignificant cancer (defined as a Gleason score of 3+3 or International Society of Urological Pathology grade 1 cancer), cancers with Gleason scores of 4+3 or greater (International Society of Urological Pathology grade ≥3), and serious adverse events (infections treated with antibiotics, hospitalization, or death within 30 days after the biopsy procedure) in each group.
All participants were followed for a minimum of 200 days after receiving PSA test results. Men who underwent biopsy were followed for at least 30 days after the biopsy for monitoring of adverse events, and participants who underwent radical prostatectomy before October 22, 2020, were followed until prostatectomy pathology results were available.
We planned to invite 50,000 men to participate in screening, assuming 25% participation and 13% of participants having PSA results of 3 ng per milliliter or greater.19 This number would yield 1625 participants with PSA levels of 3 ng per milliliter undergoing randomization. Using a noninferiority margin of 4 percentage points and an alpha of 2.5% and assuming a relative detection probability of clinically significant cancer of 1.3 in favor of the experimental biopsy group (on the basis of previous studies25), 80% adherence to the assigned intervention, and 18% detection probability of clinically significant cancer in the standard biopsy group, we estimated that the trial would have more than 90% power to show the noninferiority of the experimental biopsy strategy to the standard biopsy strategy. The noninferiority margin was agreed upon at a consensus group meeting that included urologists, oncologists, and statisticians.
For the primary and secondary outcomes, absolute differences in detection probabilities and 95% two-sided Wald confidence intervals were computed (without adjustment for the variable used for stratification at randomization). If the lower boundary for the two-sided 95% confidence interval in the absolute difference of clinically significant cancer between the experimental biopsy group and the standard biopsy group was greater than −4 percentage points, the experimental strategy would be deemed to be noninferior; if the lower boundary was greater than 0, the experimental strategy would be deemed to be superior. Prespecified subgroup analyses were performed according to age strata (50 to 59, 60 to 69, and 70 to 74 years), PSA strata (3 to 3.9, 4 to 9.9, and ≥10 ng per milliliter), and previous biopsy (yes or no). Analyses were performed in the intention-to-treat population (analyses of results in the per-protocol population were also performed for the primary outcome). In a prespecified sensitivity analysis, we used model-based multiple imputation to impute the outcome status with respect to clinically significant cancer for participants who did not undergo MRI or biopsy examinations (details regarding the imputation procedure are provided in Section S4). The imputation procedure was constructed to take into account the effect of the MRI result on missing outcome status.26
To further assess the effect of missing outcome status owing to participants not undergoing recommended MRI or biopsy procedures, we conducted two post hoc analyses. First, we assessed whether the results from the multiple imputation analysis were robust to deviations from the missing-at-random assumption by allowing the primary outcome to be missing-not-at-random. Second, to further account for incomplete adherence to protocol being dependent on baseline covariates — and in the experimental biopsy group dependent on MRI result — we estimated the difference in detection probabilities of clinically significant and insignificant prostate cancer and benign biopsy findings using inverse probability weighting.
We performed a prespecified analysis that ignored results of the standard biopsy in men with positive MRI results to estimate results of performing only targeted biopsy, and a post hoc analysis that ignored biopsy outcome for participants who had negative MRI results but who were at high risk (Stockholm3 score of ≥25%) in order to estimate results if these participants had not undergone biopsies. No adjustment for multiplicity was made. P values are reported only for the primary outcome.27 For secondary outcomes and subgroup analyses, the reported two-sided 95% confidence intervals for the individual contrasts have not been adjusted for multiplicity and should be interpreted with caution. The analysis plan was approved by the data and safety monitoring board.