波谱学杂志

• •    

基于强化学习的核自旋状态控制方法

付高成, 张世纪, 黄凯, 魏达秀, 姚叶锋   

  1. 上海市磁共振重点实验室,医学磁共振与分子影像技术研究院,华东师范大学物理学院,上海 200241 
  • 收稿日期:2026-03-20 修回日期:2026-04-21 接受日期:2026-05-11
  • 通讯作者: 魏达秀;姚叶锋 E-mail:dxwei@phy.ecnu.edu.cn;yfyao@phy.ecnu.edu.cn
  • 基金资助:
    国家重点研发计划(2023YFF1204801)

A Method for Nuclear Spin State Control Based on Reinforcement Learning

Fu Gaocheng, Zhang Shiji, Huang Kai, Wei Daxiu, Yao Yefeng   

  1. Institute of Magnetic Resonance and Molecular Imaging in Medicine, Shanghai Key Laboratory of Magnetic Resonance, School of Physics, East China Normal University, Shanghai, 200241, China 
  • Received:2026-03-20 Revised:2026-04-21 Accepted:2026-05-11
  • Contact: Wei, Daxiu;YAO Yefeng E-mail:dxwei@phy.ecnu.edu.cn;yfyao@phy.ecnu.edu.cn

摘要: 目标自旋态的高保真度制备是核磁共振波谱实现高灵敏度检测的基础.传统的梯度上升(Gradient Ascent Pulse Engineering, GRAPE)算法对初始猜测敏感且易陷入局部极值,而单纯的强化学习(Reinforcement Learning, RL)在处理复杂量子系统时,常因奖励稀疏导致策略发散.针对此局限,本文提出一种由RL与GRAPE串联的混合优化框架.该框架利用RL的无模型特性进行全局演化轨迹搜索,随后结合GRAPE作局部连续梯度微调.以柠檬酸双1H自旋体系为例的液体核磁共振实验表明,针对pH变化引起的化学位移漂移,该方法稳定生成保真度大于0.99的鲁棒脉冲.此外,脉冲时长的显著压缩有效减轻了弛豫导致的相干性损耗.该兼顾演化时间与参数鲁棒性的控制策略,为临床波谱学中复杂代谢物检测提供了可靠支撑.

关键词: 核磁共振波谱, 优化控制脉冲, 柠檬酸, 强化学习

Abstract: High-fidelity preparation of target spin states is fundamental to achieving high-sensitivity detection in nuclear magnetic resonance (NMR) spectroscopy. The traditional gradient ascent pulse engineering (GRAPE) algorithm is sensitive to initial guesses and prone to trapping in local extrema. Conversely, pure reinforcement learning (RL) often suffers from policy divergence due to sparse rewards when applied to complex quantum systems. To address these limitations, this paper proposes a hybrid optimization framework that cascades RL with GRAPE. This framework leverages the model-free nature of RL for global evolutionary trajectory search, followed by local continuous gradient fine-tuning using GRAPE. Liquid-state NMR experiments on a two-spin 1H system of citric acid demonstrate that this method stably generates robust pulses with a theoretical fidelity exceeding 0.99, despite chemical shift drifts induced by pH variations. Furthermore, the significant compression of pulse duration effectively mitigates coherence loss caused by relaxation. This control strategy, which balances evolutionary time and parameter robustness, provides a reliable methodological foundation for the detection of complex metabolites in clinical spectroscopy.

Key words: Nuclear Magnetic Resonance Spectroscopy, Optimal Control Pulse, Citric Acid, Reinforcement Learning