AI versus the spinal surgeons in the management of controversial spinal surgery scenarios

No Thumbnail Available
Authors
Mehmet, S.
Elmarawany, M. N.
Harding, I.
Bowey, A. J.
Andrews, J.
Chan, D.
Jayasuriya, R.
Srinivas, S.
Tomlinson, J.
Bayley, E.
Issue Date
2025-04-03
Type
Journal Article
Language
eng
Keywords
Artificial intelligence , Clinical scenarios , Complex Spine , Large Language Models , Spine
Research Projects
Organizational Units
Journal Issue
Alternative Title
AIMS: The use of artificial intelligence (AI) in spinal surgery is expanding, yet its ability to match the diagnostic and treatment planning accuracy of human surgeons remains unclear. This study aims to compare the performance of AI models-ChatGPT-3.5, ChatGPT-4, and Google Bard-with that of experienced spinal surgeons in controversial spinal scenarios. METHODS: A questionnaire comprising 54 questions was presented to ten spinal surgeons on two occasions, four weeks apart, to assess consistency. The same questionnaire was also presented to ChatGPT-3.5, ChatGPT-4, and Google Bard, each generating five responses per question. Responses were analyzed for consistency and agreement with human surgeons using Kappa values. Thematic analysis of AI responses identified common themes and evaluated the depth and accuracy of AI recommendations. RESULTS: Test-retest reliability among surgeons showed Kappa values from 0.535 to 1.00, indicating moderate to perfect reliability. Inter-rater agreement between surgeons and AI models was generally low, with nonsignificant p-values. Fair agreements were observed between surgeons' second occasion responses and ChatGPT-3.5 (Kappa = 0.24) and ChatGPT-4 (Kappa = 0.27). AI responses were detailed and structured, while surgeons provided more concise answers. CONCLUSIONS: AI large language models are not yet suitable for complex spinal surgery decisions but hold potential for preliminary information gathering and emergency triage. Legal, ethical, and accuracy issues must be addressed before AI can be reliably integrated into clinical practice.
Description
Citation
Mehmet S, Elmarawany MN, Harding I, Bowey AJ, Andrews J, Chan D, et al. AI versus the spinal surgeons in the management of controversial spinal surgery scenarios. European spine journal : official publication of the European Spine Society, the European Spinal Deformity Society, and the European Section of the Cervical Spine Research Society. 2025.
Publisher
Springer Nature
License
© 2025. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
Journal
European spine journal
Volume
Issue
PubMed ID
ISSN
EISSN