Author + information
- Received February 24, 2014
- Revision received April 11, 2014
- Accepted April 15, 2014
- Published online October 1, 2014.
- Kazem Rahimi, DM∗,†,‡∗ (, )
- Derrick Bennett, PhD§,
- Nathalie Conrad, MSc∗,‖,
- Timothy M. Williams, MD∗,
- Joyee Basu, MD∗,
- Jeremy Dwight, MD‡,
- Mark Woodward, PhD∗,¶,
- Anushka Patel, PhD¶,#,
- John McMurray, MD∗∗ and
- Stephen MacMahon, PhD∗,¶
- ∗George Institute for Global Health, University of Oxford, Oxford, United Kingdom
- †Division of Cardiovascular Medicine, University of Oxford, Oxford, United Kingdom
- ‡Department of Cardiology, Oxford University Hospitals NHS Trust, Oxford, United Kingdom
- §Clinical Trial Service Unit and Epidemiological Studies Unit, University of Oxford, Oxford, United Kingdom
- ‖IBM, Global Business Services, Business Analytics & Optimization, Zurich, Switzerland
- ¶The George Institute for Global Health, Sydney, Australia
- #The George Institute for Global Health, Hyderabad, India
- ∗∗BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow, Scotland, United Kingdom
- ↵∗Reprint requests and correspondence:
Dr. Kazem Rahimi, George Institute for Global Health, Oxford Martin School, University of Oxford, 34 Broad Street, Oxford OX1 3BD, United Kingdom.
Objectives This study sought to review the literature for risk prediction models in patients with heart failure and to identify the most consistently reported independent predictors of risk across models.
Background Risk assessment provides information about patient prognosis, guides decision making about the type and intensity of care, and enables better understanding of provider performance.
Methods MEDLINE and EMBASE were searched from January 1995 to March 2013, followed by hand searches of the retrieved reference lists. Studies were eligible if they reported at least 1 multivariable model for risk prediction of death, hospitalization, or both in patients with heart failure and reported model performance. We ranked reported individual risk predictors by their strength of association with the outcome and assessed the association of model performance with study characteristics.
Results Sixty-four main models and 50 modifications from 48 studies met the inclusion criteria. Of the 64 main models, 43 models predicted death, 10 hospitalization, and 11 death or hospitalization. The discriminatory ability of the models for prediction of death appeared to be higher than that for prediction of death or hospitalization or prediction of hospitalization alone (p = 0.0003). A wide variation between studies in clinical settings, population characteristics, sample size, and variables used for model development was observed, but these features were not significantly associated with the discriminatory performance of the models. A few strong predictors emerged for prediction of death; the most consistently reported predictors were age, renal function, blood pressure, blood sodium level, left ventricular ejection fraction, sex, brain natriuretic peptide level, New York Heart Association functional class, diabetes, weight or body mass index, and exercise capacity.
Conclusions There are several clinically useful and well-validated death prediction models in patients with heart failure. Although the studies differed in many respects, the models largely included a few common markers of risk.
Heart failure is a common and complex condition (1–3). Despite recent advances in diagnosis and management, average outcomes in patients with heart failure remain poor and highly variable (4). Risks among subgroups of patients with heart failure often vary several-fold and may change substantially over time. Hence, understanding expected risks and communicating anticipated future disease trajectories to patients and their families constitutes important aspects of patient-physician interactions in heart failure (5,6). More specifically, knowledge of future risks can help patients and clinicians make informed decisions about the initiation and intensity of treatment, such as device therapy, disease monitoring, or end-of-life care according to the individual patient’s need and potential for benefit (7,8). Identification of low-risk patients, on the other hand, could help reduce patient anxiety and avoid costly interventions of questionable value (7,8).
However, how to best estimate risk in patients with heart failure is less clear (6,8,9). A substantial body of published data has shown that patients’ and clinicians’ intuitive judgments about future risk tend to be inaccurate and highly variable (10–14). This is partly due to our inability as individual people to simultaneously consider and process information about multiple factors. Furthermore, single predictors of risk are rarely sufficient for accurate estimation of risk for common conditions such as heart failure (15). A solution to this problem is to estimate risk from a combination of several predictors by using a statistical multivariable model (15–17).
There has recently been a rapid increase in the number of statistical models available. However, without a comprehensive overview, it remains unclear which, if any, should be applied in clinical care. Therefore, we reviewed contemporary published reports for multivariable statistical models for prediction of death, hospitalization, or both and assessed their utility for clinical decision making.
We undertook this systematic review according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines.
Search strategy for identification of relevant studies
We searched MEDLINE and EMBASE from January 1995 to March 2013 for articles with terms or subject terms “re-admission” or “mortality” or “death” or “model” or “predict” and “heart failure.” The search was limited to human studies; there was no language restriction. We also hand searched the reference lists of eligible studies as well as reviews relating to this subject for identification of additional relevant publications (the detailed search strategy is presented in Online Appendix 1).
Review methods and selection criteria
Two reviewers independently screened all titles and abstracts and made decisions regarding potential eligibility after full-text review. Discrepancies in judgment were resolved by a third reviewer. Studies were eligible if they reported multivariable models for prediction of risk of death, hospitalization, or death or hospitalization in people with heart failure; the derived model included at least 50 patients who experienced an event during the observation period, because studies with fewer cases are unlikely to be sufficiently robust for widespread clinical or administrative use; and they assessed model performance. We excluded studies that focused on single predictors of risk only, because these are prone to reporting overly optimistic findings due to a number of methodological limitations (15). We placed no restrictions on study setting, participant characteristics, or geographic regions.
For each included study, the following information was extracted: study and patient characteristics, candidate variables considered for model derivation, final model variables and their strength of association with the outcome, analytical methods, and model discrimination, calibration, and validation, as reported by the authors. Discrimination is the ability of a statistical model to distinguish those subjects who experience the outcome from those who do not. It is usually reported using the C statistic. A C statistic of 1 indicates perfect discrimination, whereas a C statistic of 0.5 indicates discrimination no better than chance. We defined a C index of <0.6 as poor, 0.6 to 0.7 as modest, and >0.7 as good. Calibration is defined as how closely observed estimates of absolute risk agree with expected estimates from the risk prediction model and is best assessed graphically. We also recorded internal or external validation of the model, with the former being an assessment of model fit and the latter being an assessment of model generalizability. Internal validation is determined on the basis of the same data used to develop the model and is usually assessed via bootstrapping (18). External validation is assessed by how well the developed model performs on an independent sample (19).
We explored whether a priori defined individual methodological characteristics were associated with the discrimination of the risk prediction model via Kruskal-Wallis nonparametric 1-way analysis of variance. Models from the same study that looked at the same outcome but over a different length of follow-up, or had also reported a variant of the main model, were excluded from these analyses. The methodological characteristics considered were type of outcome, derivation sample size, source population (i.e., administrative, trial, patient records), and study design (i.e., retrospective or prospective). For those studies that reported multiple models for the same outcome but with different follow-up times, we investigated whether the time horizon was associated with the level of discrimination. To identify the most consistent and strongest risk predictors across studies, we took the following approach. First, similar individual predictors were grouped together (e.g., brain natriuretic peptide [BNP] and N-terminal pro–B-type brain natriuretic peptide [NT-pro BNP] into BNP and systolic or diastolic blood pressure into blood pressure). Then, for each model’s top 10 predictors, we assigned a weight on the basis of the predictor’s discriminative ability rank as reported by the authors (usually chi-square, z score, or p values in the final model). For studies that indicated such ranking, the weight was set from 10 to 1, with the strongest predictor weighted as 10. For studies in which the ranking could not be extracted from the published report, all predictors were equally weighted as 5. We finally computed the weighted score by summing all weights for each predictor across all studies. Sensitivity analyses were performed by excluding models for which actual ranking of the predictors was not available, with no impact on the final selection of predictors.
Review of 2,678 abstracts and additional hand searches led to the identification of 43 main models for prediction of death, 10 main models for prediction of hospitalization, and 11 main models for prediction of death or hospitalization. Another 50 modifications or simplifications of the main models were identified.
Clinical settings for model development
The settings from which patients were identified varied widely between the studies. Twenty-five studies (52%) included hospitalized patients only (including those presenting to the emergency department), 10 studies (21%) included patients presenting to outpatient clinics, and the remaining 13 studies (27%) either had a mixed setting (i.e., hospital or outpatient clinics) or did not specify this further (Online Table 1). Among the studies that included hospitalized patients, the timing of data collection for predictors varied from a few hours after admission to up to the pre-discharge phase. Consequently, the time horizon of the models developed ranged from events occurring early during admission (i.e., over a few days) to more long-term outcome prediction after discharge (i.e., over a few years). A large proportion of studies developed models that used data collected in the United States only (23 studies; 48%), and a further 3 studies (6%) included U.S. sites. We did not find any studies that were from low- and middle-income countries (LMICs), although some studies were multinational studies that included data from LMICs (Online Table 1).
Heart failure subtypes
Sixteen studies (33%) were restricted to patients with heart failure who had left ventricular systolic dysfunction and 1 study to patients with preserved systolic function. The remaining 31 studies (65%) either did not make any reference to left ventricular systolic function or included all patients with heart failure. Fifteen of 26 studies that entered left ventricular ejection fraction as a categorical variable into the model retained ejection fraction as an independent risk predictor in the final model. Overall, ejection fraction was 1 of the top 5 predictors of risk of death across all studies evaluated in this review.
Candidate variables for risk models and data sources
Selection of candidate variables for model development and the burden of data collection were highly variable. For example, 1 study considered a list of more than 300 diagnoses from administrative records as predictors of outcome but did not include any clinical, physiological, or laboratory variables (20). Another study included psychological factors (21), whereas other studies included novel biomarkers (22). Overall, 14 studies (29%) used routinely available administrative databases for model development, of which 4 combined these with some information from patient records. Nine studies (19%) relied on patient records only and 26 studies (54%) prospectively collected information for the study purposes, of which 19 studies (40%) were on the basis of clinical trial data (Online Table 1).
Models for prediction of risk of death
All 43 studies reported the risk of all-cause mortality (Online Table 2); 1 study (23) also reported on coronary death and another (24) also reported on heart failure mortality. Models for these specific causes of death had a slightly better discriminatory power than those for all-cause mortality (Online Appendix 2). Two studies considered repeated measures of risk over time (in addition to baseline clinical information), which appeared to improve their predictive ability compared with measurements at a single point in time (25,26). Most models incorporated demographics, comorbidities, and physiological and laboratory measures for risk prediction. Fewer studies considered social and quality-of-life variables as potential predictors (Online Table 2).
The median number of final predictors for the most comprehensive model reported in these studies was 9 (range: 3 to 314). A few variables emerged as the most consistent and strongest predictors of risk: age, renal function, blood pressure, sodium level, ejection fraction, sex, BNP (or NT-pro BNP) level, New York Heart Association functional class, diabetes, weight/body mass index, and exercise capacity. Their inclusion as a candidate or final variable for each of the 43 models is presented in Figure 1. The discriminative ability of the models was modest to good (C index range of 0.60 to 0.89) (Online Table 2).
Models for prediction of risk of hospitalization
Admission to the hospital for any cause was the outcome in 8 models and heart failure readmission in the other 2 models. The median number of final predictors reported in these studies was 5 (range: 4 to 29). A few recurrent predictors emerged: age, sex, renal function, cardiovascular disease, and heart rate. However, because only 1 study reported the ranking of the predictive value of variables and the total number of studies (10) was relatively small, we could not reliably report the strongest predictors of risk across studies. The discriminative ability of the models varied from modest to good (C index range of 0.60 to 0.82) (Online Table 3).
Models for prediction of risk of death or hospitalization
The median number of final predictors for the main model reported in these studies was 10 (range of 5 to 12) (Online Table 4). Among individual predictors across all models, the following variables emerged as the most consistent and strongest predictors of risk: renal function, BNP (or NT-proBNP) level, history of heart failure, age, and blood pressure. The discriminative ability of the models was modest to good (C index range of 0.61 to 0.80).
Characteristics that were associated with the C index
We investigated whether the discriminative ability of the 60 main models that quantified the C index was associated with individual methodological or other design characteristics of the predictive models. The mean reported C index was strongly associated with the type of outcome assessed, with models of risk of death giving higher C indexes than models of death or readmission, or readmission only (p for heterogeneity = 0.0003) (Figure 2). There was no association between the reported C index and the derivation sample size, the source of the data, and the design of the study (p > 0.21 for all). We also found that when different time horizons were reported for the same models, the size of the C index was generally inversely correlated with the length of follow-up. However, it was not possible to conduct formal statistical analyses because many studies did not report a measure of precision for the C index.
We did not identify any studies that formally evaluated the impact of the reported risk prediction model on the management of patients with heart failure.
We identified 48 studies that reported 64 main multivariable models (43 for prediction of risk of death, 10 for hospitalization, and 11 for death or hospitalization). The results of our study showed that despite the multiple differences in clinical settings, population characteristics, and use of candidate variables, a few variables emerged as consistent and strong predictors of risk across different studies. For prediction of death, these variables comprised age, renal function, blood pressure, sodium level, ejection fraction, sex, BNP (or NT-proBNP) level, New York Heart Association functional class, diabetes, weight/body mass index, and exercise capacity. With the exception of the type of outcome to be predicted and the duration of follow-up, none of the other study features investigated were found to be significantly associated with the ability of the models to discriminate between those who are likely to experience an event and those who are not. In particular, there was no evidence to suggest that differences in sample sizes, sources of data collection (e.g., clinical trial or routine records), or study designs (prospective vs. retrospective) were significantly associated with reported C indexes.
Although similarities between the studies suggest that more than 1 risk model is likely to prevail for wider clinical use, we identified a few areas in which differences between the models were significant and could affect clinical decision making. One major source of heterogeneity in risk discrimination was the type of outcome to be predicted. On average, models designed to predict the combined outcome of death or hospitalization, or of hospitalization only, had a poorer discriminative ability than those designed to predict death. This may be because hospitalization is genuinely more difficult to predict than death (perhaps because the decision about who to admit to the hospital is much more dependent on health care supply ) or because there has been less focus on this type of outcome (as evidenced by the smaller number of published reports). Irrespective of the underlying causes of such differences, we believe that efforts are needed to increase the performance of such models in the future to make them clinically more useful.
Another important difference between the models reported was the extent to which they have been validated. Overall, 36% (23 of 64) of the identified models had validated their findings in an independent cohort; among these studies, the authors mainly reported the discriminatory ability of the externally validated model (i.e., how the model ranked risk) but not its calibration (i.e., the differences in observed and predicted absolute risks in an independent cohort). Because none of the identified models in our study were exclusively derived, recalibrated, or validated in patients from LMICs, validation and possible recalibration of existing models in LMICs will be a welcome addition to the existing evidence base.
From a clinical perspective, our study suggests that a number of risk prediction tools are suitable for use in clinical practice, in particular when the outcome of interest is death. For example, the recently reported model by Senni et al. (28) has a very good discriminatory ability for predicting death at 1 year (C-statistic of 0.88), has been externally validated (C-statistic of 0.83), and enables calculation of risk in a wide range of patients with heart failure on the basis of easily obtainable risk markers. Another useful recent model used information from 30 prospective studies and approximately 40,000 patients with heart failure to derive a simple risk calculator for prediction of death for up to 3 years (29). The very large size of this study and the derivation of patients from wide geographic regions provide a uniquely robust and generalizable tool to quantify the prognosis of individual patients. However, these 2 models did not include BNP because such information was not available and it may be that inclusion of such biomarkers could further improve the predictive ability of these models. For prediction of death early after presentation to the hospital or emergency department, we found the risk models reported by Peterson et al. (30) and Lee et al. (31) to be particularly valuable because of their high discriminative abilities, independent validation in large cohorts of patients with heart failure with a wide spectrum of risk, and the relative simplicity of the risk calculators from the users’ perspective.
If many useful risk calculators exist, why are they not routinely used in clinical practice? It could be argued that many of the models identified in our study have only been reported recently and that uptake will increase over time. However, in other clinical areas in which models have been available for some time, they still seem to be underused (32,33). Previous research on barriers to more widespread use of cardiovascular prediction tools has shown that many clinicians find risk calculation too time consuming and are not convinced of the value of information derived from these models (32). We certainly agree that there is much room for improving the usability of risk prediction tools, in parallel to further research into improving model performances. For example, development of automated data capturing systems and better techniques for risk visualization would help to minimize user burden and could facilitate communication of risks and uncertainties with patients and their families.
We identified more than 60 multivariable risk prediction models for death, hospitalization, or both in patients with heart failure. Although these models differed in many respects, a few common and strong markers of risk have emerged. Several risk calculators for prediction of death were identified that had sufficiently high performance properties for wider clinical use. However, the same was not the case for prediction of hospitalization.
Supported by the National Institute for Health Research Oxford Biomedical Research Centre Programme. The work of the George Institute for Global Health is supported by the Oxford Martin School. Dr. Rahimi holds a National Institute for Health Research Career Development Fellowship. Ms. Conrad is an employee of IBM. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- Abbreviations and Acronyms
- brain natriuretic peptide
- low- and middle-income country
- N-terminal pro–B-type brain natriuretic peptide
- Received February 24, 2014.
- Revision received April 11, 2014.
- Accepted April 15, 2014.
- American College of Cardiology Foundation
- McMurray J.J.V.,
- Stewart S.
- ↵Cleland J. National Heart Failure Audit: April 2011 to March 2012. National Institute for Cardiovascular Outcomes Research (NICOR), The Institute of Cardiovascular Science. London: University College, 2012. Available at: http://www.ucl.ac.uk/nicor/audits/heartfailure/documents/annualreports/hfannual12-13.pdf. Accessed August 2014.
- Dickstein K.,
- Cohen-Solal A.,
- Filippatos G.,
- et al.
- McMurray J.J.,
- Adamopoulos S.,
- Anker S.D.,
- et al.
- Allen L.A.,
- Stevenson L.W.,
- Grady K.L.,
- et al.
- Braunwald E.
- Grover S.A.,
- Lowensteyn I.,
- Esrey K.L.,
- Steinert Y.,
- Joseph L.,
- Abrahamowicz M.
- Moons K.G.,
- Kengne A.P.,
- Woodward M.,
- et al.
- Moons K.G.,
- Kengne A.P.,
- Grobbee D.E.,
- et al.
- Vazquez R.,
- Bayes-Genis A.,
- Cygankiewicz I.,
- et al.
- Wedel H.,
- McMurray J.J.,
- Lindberg M.,
- et al.
- Subramanian D.,
- Subramanian V.,
- Deswal A.,
- Mann D.L.
- Cowie M.R.,
- Sarkar S.,
- Koehler J.,
- et al.
- Wennberg J.E.
- Pocock S.J.,
- Ariti C.A.,
- McMurray J.J.,
- et al.
- Peterson P.N.,
- Rumsfeld J.S.,
- Liang L.,
- et al.