Division of Health AINorthwell Health
AboutTeamResearchPublicationsJoin Us
AboutTeamResearchPublicationsJoin Us
Division of Health AI

Northwell Health

  • About
  • Team
  • Research
  • Publications
  • Join Us
  • Feinstein Institutes
  • Northwell Health

© 2026 Division of Health AI, Northwell Health

Admin

Publications

58 peer-reviewed publications in journals including Nature Communications, PNAS, JAMA, and Nature Machine Intelligence.

All projectsPoint-of-care AIOperational AIPreclinical AIAutonomic Nervous System AIAnatomical Data AIWebsite

22 publications matching filters

Translational Vision Science & Technology

Artificial Intelligence-Driven Differentiation Between Uveal Melanoma and Nevus Based on Fundus Photographs: A Systematic Review and Meta-Analysis

Uveal melanoma (UM) is the most common intraocular malignancy in adults, with high metastatic risk and poor prognosis. Current screening and triaging methods for melanocytic choroidal tumors face inherent limitations, particularly in regions with limited access to specialized ocular oncologists. This systematic review and meta-analysis evaluated artificial intelligence-driven approaches for differentiating uveal melanoma from nevus based on fundus photographs. Analysis included machine learning models with pooled sensitivity of 85% (95% CI 82–87%), specificity of 86% (82–88%), and a C-index of 0.87 (0.84–0.90), with convolutional neural networks as the main method used. Deep learning models achieved AUC scores of 94-95%, outperforming ophthalmologists using standard risk assessment criteria.

Nature Communications

Beyond episodic early warning systems: a continuous clinical alert system for early detection of in-hospital deterioration

Efficient patient monitoring on medical-surgical wards is crucial to prevent adverse events. Standard episodic inpatient assessment of vital signs can miss changes in health status and delay risk recognition. This study developed a wearable-based deep learning model using only 9 inputs to identify the onset of deterioration earlier than traditional early warning systems. The model could generalize to produce clinical alerts ahead of rapid response team (RRT) interventions, unplanned intensive care unit (ICU) transfers, intubations, cardiac arrests, and in-hospital deaths. Using multiple stages of validation on 888 adult non-ICU inpatient visits, the RNN model predicted both periods of elevated MEWS scores (ROC AUC 0.89 +/- 0.3, PR AUC 0.58 +/- 0.14) and adverse clinical outcomes (accuracy: 81.8%) up to an average of 17 hours in advance.

Journal of Cardiovascular Electrophysiology

Leveraging Implantable Cardiac Defibrillator Remote Transmissions to Predict the Occurrence of Atrial Fibrillation

Atrial fibrillation (AF) and heart failure (HF) frequently coexist in patients, with development of AF often preceding HF decompensation. This study evaluated whether daily remote monitoring of implantable cardioverter-defibrillator (ICD) parameters could predict AF occurrence using machine learning techniques in a real-world cohort. Data from patients with primary prevention ICDs transmitted daily to the Northwell centralized remote monitoring center between 2012 and 2021 were analyzed. An XGBoost model was trained to predict AF occurrence with a 3-day time horizon using a 14-day data collection sequence in 207 patients (69.0% male, median age 65.0 years, median ejection fraction 30%). The model predicted AF occurrence within the following 3 days in 49 (23.7%) patients after a median of 36 months post-implant with an AUROC of 0.79 and AUPRC of 0.10. Key variables included RV and RA sensing amplitudes and pulse width, suggesting machine learning approaches have potential to predict AF from daily remote ICD monitoring.

Cancers

A Systematic Review of the Applications of Deep Learning for the Interpretation of Positron Emission Tomography Images of Patients with Lymphoma

This systematic review examined the applications of deep learning for the interpretation of lymphoma positron emission tomography (PET) images. From 71 papers initially retrieved, 21 studies with a total of 9402 participants were ultimately included. The proposed deep learning models achieved promising performance in various medical tasks, including detection, histological subtyping, differential diagnosis, and prognostication. AI-based analysis of lymphoma whole-body FDG-PET/CT can inform all phases of clinical management including staging, prognostication, treatment planning, and treatment response evaluation. AI methods demonstrated promising predictive performance (AUC range = 0.68–0.85) on PET-based images, with higher values for deep learning methods. AI techniques for lymphoma PET evaluation are designed to assist physicians in handling large volumes of scans through rapid and accurate calculations.

International Journal of Medical Informatics

Longitudinal dynamic clinical phenotypes of in-hospital COVID-19 patients across three dominant virus variants in New York

This study analyzed the dynamic clinical phenotypes of more than 35,000 COVID-19 patients admitted to New York hospitals over a two-year period (March 2020 to May 2022), encapsulating the identity of all major COVID-19 waves and three dominant virus variants (alpha, delta, omicron) into four distinct patient clusters demonstrating unique demographics, treatment profiles, and mortality outcomes. The temporal progression of these phenotypes throughout the COVID-19 pandemic demonstrated increased variability across the waves of three dominant viral variants. Four distinct clinical phenotypes remained robust in multi-site validation and were associated with different mortality rates. Although the lung phenotype with high inflammation was most prevalent at admission, the lung phenotype with low inflammation consistently prevailed thereafter. Most patients transitioned to other phenotypes as time progressed, highlighting the dynamic nature of disease progression during hospitalization.

Journal of Surgical Oncology

Machine learning to predict completion of treatment for pancreatic cancer

While chemotherapy enhances survival rates for pancreatic cancer patients after surgery, less than 60% complete adjuvant therapy, with a smaller fraction undergoing neoadjuvant treatment. This study aimed to predict which patients would complete pre- or postoperative chemotherapy through machine learning, grouping patients with resectable pancreatic cancer into those who completed all intended treatments and those who did not. Researchers applied logistic regression with lasso penalization and an extreme gradient boosting model for prediction. Among 208 patients with median age of 69 (49.5% female, 62% white), neoadjuvant and adjuvant chemotherapies were received by 26% and 47.1%, respectively, but only 49% completed all treatments. Negative prognostic factors included worsening diabetes, age, congestive heart failure, high body mass index, family history of pancreatic cancer, initial bilirubin levels, and tumor location in the pancreatic head. Predictive accuracy (AUROC) was 0.67 for both models, with performance expected to improve with larger datasets.

Cancers

Deep Learning Applications in Pancreatic Cancer

Pancreatic cancer is one of the most lethal gastrointestinal malignancies. Despite advances in cross-sectional imaging, chemotherapy, radiation therapy, and surgical techniques, the 5-year overall survival is only 12%. With the advent and rapid adoption of artificial intelligence (AI), specifically deep learning (DL), into healthcare systems, there is potential for utilizing AI applications across the entire pancreatic cancer patient journey. This review examines the current applications of DL and other AI modalities in the diagnosis, management, monitoring, and prognostic assessment of patients with pancreatic cancer. The scope covers diagnostic imaging, surgical planning, therapeutic monitoring, and development of novel biomarkers. We conducted a comprehensive review of English language publications from January 2019 to November 2023 in the PubMed database using keywords including pancreatic cancer, deep learning, radiomics, large language models, and generative adversarial networks.

JMIR Formative Research

Detection of Common Respiratory Infections, Including COVID-19, Using Consumer Wearable Devices in Health Care Workers: Prospective Model Validation Study

Background: Respiratory viral infections, including COVID-19, are difficult to detect based on symptoms alone. Consumer wearable devices that monitor physiological signals offer a new avenue for respiratory infection detection. Objective: To evaluate the performance of a consumer wearable physiology-based respiratory infection detection algorithm in health care workers. Methods: This prospective study included 577 participants from Northwell Health in New York between January 6, 2022, and July 20, 2022. Participants wore a smartwatch that generated real-time alerts using resting heart rate, respiratory rate, and heart rate variability measured during sleep. Alerts were cross-referenced with results from respiratory viral panel testing. Results: Across 512 alert instances involving a respiratory viral panel test result, 63 had confirmed positive test results (COVID-19 or other respiratory infections detected via polymerase chain reaction or rapid home test). The system provided advance warning of respiratory viral infections as well as other physical or emotional stress events.

Digital Health

Prediction of intrapartum fever using continuously monitored vital signs and heart rate variability

Background: Fever during labor is associated with maternal and neonatal morbidity. Early identification of at-risk patients would enable timely clinical intervention. Objective: To develop and validate a predictive model of intrapartum fever using continuously monitored vital signs and heart rate variability (HRV). Methods: This was a prospective cohort study of 1,155 women in active labor. Raw vital signs and calculated HRV metrics were evaluated for their ability to predict fever (temperature >38.0°C) using logistic regression. Results: Fever was detected in 48 women (4.2%). Compared to afebrile mothers, febrile mothers had significantly decreased heart rate variability measures (SDNN and RMSSD) at 2-3 hours before fever onset (P<0.001). A predictive model built using continuous vital signs data outperformed a model built from episodic vital signs, with area under the curve of 0.81.

Bioelectronic Medicine

A radiographic, deep transfer learning framework, adapted to estimate lung opacities from chest x-rays

Objective: To develop and validate a deep learning framework for estimating chest X-ray (CXR) lung opacity severity, which could assist radiologists in standardizing opacity assessment. Methods: We developed a transfer learning framework using 38,079 training CXR images and validated against expert radiologist annotations using 286 out-of-sample images. Three neural network architectures (ResNet-50, VGG-16, and ChexNet) were tested with different segmentation and data balancing strategies. Results: ResNet-50 with undersampling and no region-of-interest segmentation provided optimal performance. The model's opacity score predictions showed superior agreement with radiologist scores compared to inter-radiologist agreement. The framework provides automated opacity quantification while maintaining high concordance with expert radiologist assessments.

Clinical Imaging

The association of clinically relevant variables with chest radiograph lung disease burden quantified in real-time by radiologists upon initial presentation in individuals hospitalized with COVID-19

Objectives: We aimed to correlate lung disease burden on presentation chest radiographs (CXR), quantified at the time of study interpretation, with clinical presentation in patients hospitalized with coronavirus disease 2019 (COVID-19). Material and methods: This retrospective cross-sectional study included 5833 consecutive adult patients, aged 18 and older, hospitalized with a diagnosis of COVID-19. Lung disease burden was quantified in real-time by 118 radiologists on 5833 CXRs at the time of exam interpretation with each lung annotated by the degree of lung opacity as clear (0%), mild (1-33%), moderate (34-66%), or severe (67-100%). Results: COVID-19 lung disease burden quantified in real-time on presentation CXR was characterized by demographics, comorbidities, emergency severity index, Charlson Comorbidity Index, vital signs, and lab results. An absence of opacities in COVID-19 may be associated with poor oral intake and a prerenal state as evidenced by the association of clear CXRs with a low eGFR, hypernatremia, and hypoglycemia.

Patient-Centered Outcomes Research Institute (PCORI)

Developing and Testing Models for COVID-19 Health Outcomes

Background: Supporting decisions that patients who present at the emergency department with COVID-19 make requires accurate prognostication, but a highly dynamic pandemic poses special challenges for predicting patient outcomes. Objectives: We aimed to develop clinical prediction models (CPMs) to support shared decision-making for COVID-19 care. We also aimed to evaluate geographic transportability by assessing model performance across different data sets (from different countries) as well as temporal transportability by assessing model performance within the same data set across time periods. We convened focus groups with COVID-19 care providers, survivors, and surrogates to elicit feedback about care-related decision-making during the COVID-19 pandemic. Methods: Clinical prediction models to predict the probability of mortality, whether a patient will require mechanical ventilation (MV) or intensive care unit (ICU) admission, mortality if a patient is placed on MV, and length of stay (LOS) in the ICU were developed.

Medical Decision Making

US and Dutch Perspectives on the Use of COVID-19 Clinical Prediction Models: Findings from a Qualitative Analysis

Background: Clinical prediction models (CPMs) for COVID-19 may support clinical decision making. Objective: To understand attitudes toward using COVID-19 CPMs among health care providers, survivors, and surrogates in the United States and Netherlands. Methods: Qualitative study using online focus groups and interviews conducted between January 2021 and July 2021 in the United States (4 focus groups) and May to July 2021 in the Netherlands (3 focus groups and 4 interviews). Results: Many providers had reservations about CPM validity and patient-level outcome interpretation. However, survivors and surrogates indicated they would have found this information useful for decision making. Providers perceived CPMs as most useful for resource allocation, triage, research, and educational purposes. Conclusions: There is a disconnect between CPM development and clinical implementation, highlighting the need for better communication and integration of provider and patient perspectives.

Nature Communications

Development and Validation of Self-Monitoring Auto-Updating Prognostic Models of Survival for Hospitalized COVID-19 Patients

Objective: To develop and validate self-monitoring, auto-updating prognostic models of survival for hospitalized COVID-19 patients. Methods: We analyzed demographic, laboratory, and clinical data from 34,912 hospitalized COVID-19 patients (March 2020 to May 2022) using a 2,000-patient sliding window incremented at 500-patient intervals to detect calibration drift. Results: Calibration performance drift was immediately detected with only minor fluctuations in discrimination. Dynamically updated models significantly improved overall calibration compared to static models across different waves, variants, race, and sex. Net-benefit analyses showed positive benefits. Conclusions: This is the first study to perform dynamic updating of COVID-19 prognostic survival models to correct for calibration drift. The methodology can be extended to other clinical prognostic models, improving accuracy and clinical utility.

BMC Medicine

Prognostic models for COVID-19 needed updating to warrant transportability over time and space

Prognostic models for COVID-19 developed during the first pandemic wave were evaluated for transportability to subsequent waves and geographic settings. The study included patients presenting to the emergency department with suspected COVID-19 admitted to 12 hospitals in the New York City area and 4 large Dutch hospitals. Second-wave patients (September-December 2020) were used to evaluate models developed on first-wave patients (March-August 2020). Two prognostic models were evaluated: The Northwell COVID-19 Survival (NOCOS) model developed on NYC data and the COVID Outcome Prediction in the Emergency Department (COPE) model developed on Dutch data. Frequent updating of prognostic models is likely to be required for transportability over time and space during a dynamic pandemic.

Journal of Medical Internet Research

A Machine Learning Prediction Model of Respiratory Failure Within 48 Hours of Patient Admission for COVID-19: Model Development and Validation

Predicting early respiratory failure due to COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality. This study derived a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department. Data were collected from patients with COVID-19 admitted to Northwell Health acute care hospitals between March 1 and May 11, 2020. Of 11,525 patients, 933 (8.1%) were placed on invasive mechanical ventilation within 48 hours of admission. Three predictive models were trained and validated using cross-hospital validation. The XGBoost model had the highest mean accuracy (0.919; area under the curve=0.77), outperforming the other two models as well as the Modified Early Warning Score. Important predictor variables included the type of oxygen delivery used in the emergency department, patient age, Emergency Severity Index level, respiratory rate, serum lactate, and demographic characteristics.

Journal of Clinical Monitoring and Computing

Efficacy of continuous monitoring of maternal temperature during labor using wireless axillary sensors

This study aimed to determine whether continuous measurement of temperature during labor is feasible, accurate, and more effective than manual measurements for detecting fever. Women were recruited on admission in labor at greater than 35 weeks gestational age with less than 6 cm cervical dilation. Sensors were affixed in the axilla, which transmitted every 4 minutes by Bluetooth to a dedicated tablet. Conventional temperature measurements were taken every 3-6 hours per routine. Of 336 subjects recruited, 155 had both greater than 4 hours of continuous data and greater than 2 manual temperature measurements. Of 15 episodes of fever greater than 38 degrees C detected by both methods, 13 were detected earlier by continuous monitoring (9 of those more than 1 hour earlier). Manual measurements missed 32 fevers greater than 38 degrees C and 13 fevers greater than 38.5 degrees C that were identified by continuous monitoring. Continuous measurement of maternal temperature for the duration of labor is practical and accurate.

npj Digital Medicine

Let Sleeping Patients Lie, avoiding unnecessary overnight vitals monitoring using a clinically based deep-learning model

Sleep disruptions due to unnecessary overnight vital sign monitoring are associated with delirium, cognitive impairment, weakened immunity, hypertension, increased stress, and mortality. A recurrent deep neural network was developed that incorporates past values of a small set of vital signs and predicts overnight stability for any given patient-night. The model was trained and evaluated using data from a multi-hospital health system between 2012 and 2019, with approximately 2.3 million admissions and 26 million vital sign assessments. The algorithm is agnostic to patient location, condition, and demographics, and relies only on sequences of five vital sign measurements, a calculated Modified Early Warning Score, and patient age. The model enables safe avoidance of overnight monitoring for approximately 50% of patient-nights, while only misclassifying 2 out of 10,000 patient-nights as stable.

Nature Machine Intelligence

External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19

External validation of a previously published interpretable mortality prediction model for COVID-19 patients was conducted. The study demonstrated that the model does not perform as a triage tool based on the internal validation dataset provided by the original authors. The decision algorithm was not portable to the external validation dataset, both with unmodified and optimized parameters. Specifically, the precision was 0.48 for predicting mortality, meaning that over half of the patients that the model predicted would die actually survived. The accuracy was 0.88 and the F1 score was 0.41. These results emphasized the importance of externally validating models before their widespread adoption in actual clinical practice.

Bioelectronic Medicine

Machine learning to assist clinical decision-making during the COVID-19 pandemic

The number of cases from the coronavirus disease 2019 (COVID-19) global pandemic has overwhelmed existing medical facilities and forced clinicians, patients, and families to make pivotal decisions with limited time and information. While machine learning (ML) methods have been previously used to augment clinical decisions, there is now a demand for "Emergency ML." Throughout the patient care pathway, there are opportunities for ML-supported decisions based on collected vitals, laboratory results, medication orders, and comorbidities. This perspective highlights the utility of evidence-based prediction tools in a number of clinical settings, and how similar models can be deployed during the COVID-19 pandemic to guide hospital frontlines and healthcare administrators to make informed decisions about patient care and managing hospital volume.

JAMA

Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area

A total of 5700 patients with a median age of 63 years were included in this case series, with 39.7% female. The most common comorbidities were hypertension (3026; 56.6%), obesity (1737; 41.7%), and diabetes (1808; 33.8%). At triage, 30.7% of patients were febrile, 17.3% had a respiratory rate greater than 24 breaths/min, and 27.8% received supplemental oxygen. During hospitalization, 373 patients (14.2%) were treated in intensive care unit, and 320 (12.2%) received invasive mechanical ventilation.

IEEE Transactions on Biomedical Engineering

Towards personalized closed-loop mechanical CPR: a model relating carotid blood flow to chest compression rate and duration

The objective was to determine if it is possible to model the response of the carotid blood flow to different chest compression waveforms as a function of time during resuscitation from cardiac arrest. Several approaches were tested to predict the carotid blood flow generated by the next chest compression based on knowledge of the duration of resuscitation, the chest compression rate, and the last compression's carotid blood flow. A single physiological metric, carotid blood flow, combined with information about the duration of resuscitation and the compression rate was sufficient to model and predict carotid blood flow in the next compression. This suggests that closed loop mechanical CPR is a viable medical device target.