基于对抗学习与交叉注意力的多任务阿尔茨海默病分类

图1 网络总体框架图

Fig. 1 Network framework diagram

1.2 CNN编码器和对抗学习

本文采用的CNN编码器通过用3D卷积神经网络对MRI和PET图像进行特征提取.该网络结构参考了VGGNet^[18]，但进行了简化处理，采用较少层数的3D卷积核，具体结构如图2所示.每个卷积块包含一个卷积层，步幅和填充均为1，一个批归一化层（BN），以及一个LeakyReLU激活层.每个全局最大池化层（Maxpool）或全局平均池化层（Avgpool）的步幅为2.假设输入的MRI和PET图像的尺寸均为1×H×W×D，经过CNN编码器处理后，提取的特征图尺寸为C×(H/16)×(W/16)×(D/16)，其中C代表通道数，大小为128.该模块的主要作用是为后续生成包含模态特征的初始特征表示.

图2

图2 CNN编码器

Fig. 2 CNN encoder

受生成对抗网络^[19]的启发，本文提出一种基于对抗学习的特征提取模块，用于将MRI和PET的相关特征映射到一个共享的潜在空间，其结构如图1所示.该模块使用一个判别器来区分输入的特征图是来自MRI还是PET，并通过应用对抗损失，优化CNN编码器提取的特征表示，使两种模态的特征在共享的潜在空间中逐步对齐.随着对抗损失的收敛，MRI和PET模态的特征会逐渐趋于一致，从而为后续的注意力机制提供统一且有效的输入特征表示.其中，判别器是由一个全连接层和一个ReLU激活层组成.在将特征图输入判别器之前，首先通过全局平均池化操作将特征图转换为特征向量.为了实现对抗性训练，本文在判别器之前添加了一个梯度反转层（Gradient Reverse Layer，GRL），该层在前向传播过程中会作为恒等变换直接传递数据，但在反向传播过程中，GRL会将梯度乘以负数，从而反转梯度的方向.通过GRL这种方式，整个网络可以以端到端的方式进行训练.对抗损失公式如(1)式所示：

(1)

$\begin{array}{c}{L}_{T}={\mathrm{min}}_{{G}_{\text{MRI}},{G}_{\text{PET}}}{\mathrm{max}}_{T}V(T,{G}_{\text{MRI}},{G}_{\text{PET}})\\ =\mathbb{E}\left[\mathrm{ln}T({G}_{\text{MRI}}(A))\right]+\mathbb{E}\left[\mathrm{ln}(1-T({G}_{\text{PET}}(B)))\right]\end{array}$

其中，$V$表示价值函数，$T$表示判别器，是一个概率函数，$T\in （0,1）$，${\mathrm{max}}_{T}$表示最优的判别器$T$的参数，使得价值函数$V$最大，${\mathrm{min}}_{{G}_{\text{MRI}},{G}_{\text{PET}}}$表示最小化最优的${G}_{\text{MRI}}$和${G}_{\text{PET}}$的参数，使得后面的价值函数$V$的值最小，A代表输入的MRI图像数据，B代表输入的PET图像数据，${G}_{\text{MRI}}$和${G}_{\text{PET}}$代表CNN编码器分别从MRI和PET中提取的特征，$\mathbb{E}$代表数学期望，对所有样本计算其价值函数的平均值．

1.3 交叉注意力融合模块

受多尺度交叉注意力机制^[20]的启发，本文设计一个多模态交叉注意力融合模块（Multimodal Cross-attention Fusion Module，MCFM）.该模块能够将一个分支中的分类标记（Class Tokens）和另一个分支中的图像块标记（Patch Tokens）进行有效交互，实现信息的高效传递与共享，从而更好地融合多模态特征，提升整体特征表达的能力.

MCFM中有两个子块，每个子块都包含两部分．第一部分主要包括一个多头交叉注意力机制（Multi-heads Cross-attention，MC），该部分在PET分支上如图3所示，首先从MRI的分支上获得${X}_{\text{patch}}^{\text{MRI}}\in {R}^{N\times D}$，然后将它和PET的class token${X}_{\text{cls}}^{\text{PET}}$连接起来得到${X}^{\text{'}\text{PET}}\in {R}^{(N+1)\times D}$，计算如式(4)所示：

(2)

${X}^{\text{'PET}}=\left[{X}_{\text{cls}}^{\text{PET}}‖{X}_{\text{patch}}^{\text{MRI}}\right]$

图3

图3 MC结构图

Fig. 3 Structure of MC

MC过程可以表示为：

(3)

$Q={X}_{\text{cls}}^{\text{PET}}{W}_{Q},K={X}^{\text{'}\text{PET}}{W}_{K},V={X}^{\text{'}\text{PET}}{W}_{V}$

(4)

$M=\text{Softmax}\left(\frac{Q{K}^{T}}{\sqrt{D/h}}\right)$

(5)

$\text{MC}({X}^{\text{'}\text{PET}})=MV$

其中，${W}_{Q},{W}_{K},{W}_{V}\in {R}^{D\times (D/h)}$是三个可学习的权重矩阵，$D$是输入特征的维度，h是头的个数．与全注意力机制的二次增长计算和内存复杂度不同，MC中只使用Class Token作为查询，在交叉注意力机制中生成注意力图M的计算过程是线性的，减少了计算量，从而提高了整个处理流程的效率．接着，将原始PET的Class Token${X}_{\text{cls}}^{\text{PET}}$与经过MC操作后的Class Token进行逐元素相加：

(6)

${y}_{\text{cls}}^{\text{PET}}={X}_{\text{cls}}^{\text{PET}}+\text{MC}({X}^{\text{'}\text{PET}})$

然后，将${y}_{\text{cls}}^{\text{PET}}$与${y}_{\text{patch}}^{\text{PET}}$进行拼接操作得到${z}^{\text{PET}}\in {R}^{(N+1)\times D}$：

(7)

${z}^{\text{PET}}={y}_{\text{cls}}^{\text{PET}}‖{y}_{\text{patch}}^{\text{PET}}$

第二部分主要由一个具有非线性激活函数的前馈神经网络（Feed-forward Network，FFN）和残差连接组成．该网络通过两个线性层对输入特征进行空间变换，然后将FFN的输出与${z}^{\text{PET}}$相加，进行残差连接，将残差连接后的结果再次进行层归一化，生成最终的输出特征${Z}^{\text{MRI}}$，从而增强了tokens的表示能力．${Z}^{\text{MRI}}$具体过程可表示为：

(8)

${Z}^{\text{MRI}}=\text{LN[FFN(LN}({z}^{\text{PET}}))+{z}^{\text{PET}}]$

1.4 多任务分支

多任务学习是一种高效的学习方法，能够同时处理多个相关任务，从而捕捉到任务之间的共同性与差异性.有研究表明，多种常见的脑部疾病中存在明显的大脑老化现象，并且大脑年龄和真实年龄之间的差距与AD之间存在一定的基因重叠关系^[17].这种关系表明脑龄预测与AD分类任务之间具有较强的相关性，同时软参数共享可能会导致较高的计算开销，因此本实验设计了一种采用硬参数共享的多任务学习框架.通过在特征提取层共享权重，有效地减少了模型的复杂度.

为促进两个任务的共同优化，模型引入了两种损失函数，即AD分类的损失函数${\mathcal{L}}_{\text{CE}}$和脑龄回归预测任务的均方误差（Mean Squared Error，MSE）损失函数${\mathcal{L}}_{\text{age}}$，其计算公式如下所示：

(9)

${\mathcal{L}}_{\text{CE}}=-\frac{1}{N}{\displaystyle \sum _{i=1}^{N}[{y}_{i}\mathrm{ln}({p}_{i})}+(1-{y}_{i})\mathrm{ln}(1-{p}_{i})]$

(10)

${\mathcal{L}}_{\text{age}}=-\mathrm{ln}\left[\text{Softmax}\left(\frac{1}{N}{\displaystyle \sum _{i=1}^{N}\frac{1}{2}}({y}_{i}-{p}_{i})\right)\right]$

整个多任务模型的损失函数，用于平衡两个任务之间的差异的数值，计算公式如下：

(11)

$\mathcal{L}={\lambda }_{1}{\mathcal{L}}_{\text{CE}}+{\lambda }_{2}{\mathcal{L}}_{\text{age}}$

其中，${\lambda }_{1}$和${\lambda }_{2}$分别是分类任务和回归任务损失函数的系数，系数计算为：

(12)

${\lambda }_{1}=\frac{{e}^{{\mathcal{L}}_{\text{CE}}}}{{e}^{{\mathcal{L}}_{\text{CE}}}+{e}^{{\mathcal{L}}_{\text{age}}}}$

(13)

${\lambda }_{2}=\frac{{e}^{{\mathcal{L}}_{\text{age}}}}{{e}^{{\mathcal{L}}_{\text{CE}}}+{e}^{{\mathcal{L}}_{\text{age}}}}$

通过对损失进行指数归一化的操作，自行地调整两个任务之间的权重．假设分类任务的损失比脑龄预测任务的损失大很多，则${\lambda }_{1}$会增大，${\lambda }_{2}$会减小，此时网络会更加注重分类任务，而相对减少对脑龄预测任务的关注．这种权重调整方式可以使得模型在训练过程中能够根据不同任务的损失自动平衡任务之间的影响，从而保证各个任务都能得到充分的优化．

2 实验设置

2.1 实验数据与参数设置

本实验使用阿尔茨海默病神经影像学倡议（Alzheimer's Disease Neuroimaging Initiative，ADNI）数据集^[21]，筛选了同时具有MRI和18F-FDG PET图像的受试者.其中，包括210例AD患者、257例MCI患者和267例NC.由于ADNI数据集并未直接对pMCI和sMCI进行具体区分，因此本研究基于ADNI公开的参与者诊疗跟踪记录，手动筛选出36个月内进展为AD的MCI患者^[22]，以及在此期间未进展为AD的MCI患者.最终，筛选得到152例pMCI和179例sMCI.此外，本实验在数据选取过程中记录了影像数据对应的真实年龄信息，并将其作为脑龄预测的标签来源.在数据预处理阶段，首先使用FMRIB软件库FSL^[23]（https://fsl.fmrib.ox.ac.uk/fsl/fslwiki.）对MRI图像进行颅骨剥离.接着，对PET图像进行线性配准，使其与对应的MRI图像对齐，然后将MRI图像通过仿射变换配准到MNI152_T1_1mm模板，并使用相同的变换参数对PET图进行配准.最后，对PET图像进行全宽半高6 m的高斯平滑处理，并将MRI和PET图像调整至统一的空间分辨率79×95×79，以确保数据的一致性.

本实验采用五折交叉验证来评估模型的性能.首先，将数据集随机划分训练集和测试集，比例为4 : 1.在训练过程中，训练集进一步划分为五个子集，其中四个子集用于模型训练，剩余一个子集用于验证.最终，以五次测试结果的平均值作为模型的性能指标.

本实使用python语言编写，并在12 G内存的NVIDIA RTX 3060上进行了80次的迭代训练，batch sizes设置为16，分类任务学习率设置为0.000 4，脑龄预测任务学习率设置为0.000 3，并使用Adam优化器进行优化.

2.2 评估指标

为了对所提出的AD诊断方法的有效性进行定量评估，计算了多个评估指标，包括准确率（ACC）、特异性（SPE）、灵敏度/召回率（SEN）、精确率（PRE）以及F1分数（F1）.计算公式如下：

(14)

$\text{ACC=}\frac{\text{TP+TN}}{\text{TP+FP+TN+FN}}$

(15)

$\text{SPE=}\frac{\text{TN}}{\text{FP+TN}}$

(16)

$\text{SEN=}\frac{\text{TP}}{\text{TP+FN}}$

(17)

$\text{PRE=}\frac{\text{TP}}{\text{TP+FP}}$

(18)

$\text{F1=}\frac{\text{2TP}}{\text{2TP+FN+FP}}$

其中，真阳性（TP）表示正样本的正确预测数，真阴性（TN）表示负样本的正确预测数，假阴性（FN）表示正样本的错误预测数，假阳性（FP）表示负样本的错误预测数.

在脑龄预测性能评估中，本文采用平均绝对误差（MAE）作为性能评估指标，衡量模型预测结果与真实年龄之间的误差，MAE的计算公式如(19)式所示：

(19)

$\text{MAE=}\frac{1}{n}{\displaystyle \sum _{i=1}^{n}\left|{y}_{i}-{\widehat{y}}_{i}\right|}$

其中${y}_{i}$表示预测值，${\widehat{y}}_{i}$表示真实标签．

3 实验结果与分析

3.1 脑龄预测

尽管目前医学上尚未对脑部年龄作出统一明确的定义，但在神经影像研究中，通常将健康个体的实际年龄作为脑龄预测的标签，用于训练模型，以预测给定脑影像所对应的脑部年龄.在本研究中，我们将NC组的实际年龄作为真实脑龄标签，构建了一个单任务脑龄预测模型进行训练.为了验证脑龄预测任务与AD疾病分类任务之间的相关性，本实验在NC组的数据上训练了单任务脑龄预测，并且在AD、MCI和NC三组数据上进行了测试，模型在训练过程中采用了五折交叉验证.脑龄预测结果如表1所示：

表1 脑龄预测结果

Table 1 Results of brain age prediction

分组	样本数量	MAE
AD	210	6.34
MCI	257	5.71
NC	267	3.52

从表1中可以观察到，脑龄预测模型在NC中表现出最佳的性能，测试集的MAE为3.52，在MCI中预测误差增大至5.71，而在AD中表现出最大的预测偏差，MAE达到6.34.实验结果表明，模型的脑龄预测误差随着疾病严重程度的加深而逐渐增大.对于认知功能受损严重的受试者，模型预测出的脑龄往往高于实际年龄.这反映了AD病理过程可能会导致脑结构加速老化，而脑龄预测在一定程度上能够捕捉到与AD进展相关的神经影像特征.因此，将脑龄预测作为辅助任务引入多任务学习框架，有助于提升AD分类模型对脑部病理改变的感知能力.这种设计不仅充分发挥了脑龄预测在捕捉脑部微观结构变化方面的优势，还利用疾病组与健康组之间的脑龄差异，为AD分类模型引入了具有生物学解释意义的辅助信息.

3.2 AD/MCI/NC和pMCI/sMCI分类实验结果与消融实验

为了验证各个模块的加入对于AD、MCI和NC分类任务的有效性以及单模态和多模态融合的效果，本文进行了消融实验，实验结果见表2.基线方法采用CNN模型，其中MRI和PET图像同时作为输入时，模型采用了简单的特征拼接方法.从表中可以看出，当仅使用单一模态时，使用PET图像的模型准确率高于使用MRI的准确率.这可能是因为18F-FDG PET能够检测大脑特定区域如颞叶、后扣带回和顶下的葡萄糖代谢异常，但这些变化在AD早期就已经出现，使得PET图像在早期诊断中的敏感性更高.相比MRI，其主要依赖脑组织的萎缩情况，而这种结构变化通常在疾病发展到一定程度后才更加明显.然而，使用MRI和PET图像同时作为输入的模型准确率比任意一个单模态模型的准确率高.ALMT是结合对抗学习和多任务学习的模型，结果显示，其准确率为87.66%.虽然模型中加入了对抗学习，使模型能够分辨出该图像是MRI还是PET，但是后续过程中，模型并没有利用两个模态之间的交互信息，对模型性能提升的作用可能较小.然而，ALMT还加入多任务学习，模型准确率相比使用MRI和PET图像作为输入的CNN模型提高了2.48%.ALMCFM是结合对抗学习和交叉注意力的模型，在对抗学习的作用下，减少了两个模态之间的差异，后续还有交叉注意力机制能够充分利用两种模态之间的作用.这一改进使得模型的准确率比ALMT得到了进一步的提高.MCFMMT代表结合了交叉注意力和多任务学习，虽然交叉注意力有助于模态间的信息交互，但由于没有使用对抗学习，MRI和PET之间仍然存在一定的特征差异，可能影响了特征融合的效果.但是MCFMMT中加入了脑龄预测的辅助任务，通过多任务学习提升了分类性能.尽管准确率低于ALMCFM，但与采用特征拼接的CNN相比，准确率提升了很多.MACNet是本文提出的模型（ALMCFMMT），实验结果表明，该模型效果最好，分类准确率和F1值最高，分别为91.10%和91.01%. 图4展示了添加不同模块的五折交叉验证的ROC曲线.

表2 不同模块的消融实验结果表

Table 2 Ablation results of different modules

模态	组合			ACC/(%)	F1/(%)	SEN/(%)	SPE/(%)	PRE/(%)
模态	AL	MCFM	MT	ACC/(%)	F1/(%)	SEN/(%)	SPE/(%)	PRE/(%)
MRI				74.79(±1.46)	72.38	73.39	87.38	75.31
PET				80.27(±2.65)	78.27	78.77	88.62	82.41
MRI+PET				85.18(±2.59)	83.15	83.60	91.61	83.16
MRI+PET+age	√		√	87.66(±2.48)	87.66	88.09	93.14	87.61
MRI+PET+age	√	√		90.00(±2.23)	89.50	89.26	94.95	90.28
MRI+PET+age		√	√	88.51(±2.45)	87.04	87.80	93.44	89.99
MRI+PET+age	√	√	√	91.10(±1.79)	91.01	90.53	95.53	90.90

图4

图4 不同模块的ROC曲线（AD、MCI和NC）. (a) MRI作为CNN模型的输入；(b) PET作为CNN模型的输入；(c) MRI和PET作为CNN模型的输入；(d) MRI、PET和年龄作为ALMT模型的输入；(e) MRI、PET和年龄作为ALMCFM模型的输入；(f) MRI、PET和年龄作为MCFMMT模型的输入；(g) MRI、PET和年龄作为MACNet模型的输入

Fig. 4 ROC curves of different modules (AD, MCI, and NC). (a) using MRI as the input to the CNN model; (b) using PET as the input to the CNN model; (c) using both MRI and PET as the inputs to the CNN model; (d) using both MRI, PET and age as the inputs to the ALMT model; (e) using both MRI, PET and age as the inputs to the ALMCFM model; (f) using both MRI, PET and age as the inputs to the MCFMMT model; (g) using both MRI, PET and age as the inputs to the MACNet model

为了识别在分类任务中起关键作用的脑区，本实验采用Grad-CAM^[24]对从测试集中随机选取的3名AD患者和3名MCI患者的MRI图像进行了可视化分析.图5展示了MACNet模型在这两类患者上的热力图结果.从图中可以观察到，神经网络在处理AD与MCI图像时所激活的脑区存在明显重合，且在不同个体之间，激活区域表现出高度一致性.主要涉及的脑区包括海马区、杏仁体、脑室、顶叶以及右侧颞叶等.

图5

图5 AD和MCI的神经网络热力图

Fig. 5 Neural network heatmaps of AD and MCI

为了评估模型的有效性，本文还对pMCI和sMCI的分类任务进行了实验，结果如表3所示.其中，SACNet采用单模态输入并使用自注意力机制.从表中可以看出，本文提出的MACNet模型在pMCI和sMCI分类任务中表现出更优的分类性能，相较于单独使用MRI和PET，MACNet的分类准确率分别提高了6.23%和3.20%.图6展示了SACNet在MRI和PET单模态输入下五次交叉验证AUC值以及MACNet模型的ROC曲线.从表3可以看出，MACNet的SPE低于SACNet，而PRE低于仅使用MRI的SACNet.

表3 pMCI和sMCI的分类结果

Table 3 Classification results of pMCI and sMCI

方法	模态		ACC/(%)	F1/(%)	SEN/(%)	SPE/(%)	PRE/(%)
方法	MRI	PET	ACC/(%)	F1/(%)	SEN/(%)	SPE/(%)	PRE/(%)
SACNet	√		70.30(±3.66)	69.91	64.49	77.99	79.67
SACNet		√	73.33(±3.65)	73.04	76.11	70.00	75.49
MACNet	√	√	76.53(±2.85)	76.36	85.00	67.33	75.82

图6

图6 不同模块的ROC曲线（pMCI和sMCI）. (a) MRI作为SACNet模型的输入；(b) PET作为SACNet模型的输入；(c) MRI和PET作为MACNet模型的输入

Fig. 6 Roc curves of different modules (pMCI and sMCI). (a) using MRI as the input to the SACNet model; (b) using PET as the input to the SACNet model; (c) using both MRI and PET as the inputs to the SACNet model

这可能是由于多模态融合时，两种模态的特征存在冗余或干扰.如果对MRI特征对负样本有较好区分度，那么PET特征在负样本区域会存在模糊信息，从而会降低SPE.另一方面，虽然PET模态的部分特征在正类区域是有一定的独特性，与MRI特征结合后增强了模型对正类样本的整体识别能力，但融合后的模型可能会受到PET特征的影响，使得MACNet的PRE未能超过单独使用MRI作为输入的模型.

3.3 对比实验

为了验证本文提出MACNet模型的有效性，本实验将MACNet与现有的3D VGG16、3D ResNet34与3D ViT模型进行了对比实验，评估其在AD、MCI和NC的分类任务上的表现.这些模型都具有明确的代表性，例如3D VGG16作为经典的深度卷积网络，在医学图像分析领域中广泛使用，通常作为性能基准，3D ResNet34通过残差连接结构有效缓解了深层网络的梯度消失问题，能够有效训练更深网络，是卷积神经网络的重要代表，3D ViT则基于自注意力机制，既能捕捉全局依赖关系，又能在不同模态特征之间实现交互与融合，体现了多模态学习方法的发展趋势.通过与这些不同类型架构进行对比，可以更全面地验证MACNet的分类性能.在实验中，将MRI和PET图像分别输入到上述对比模型中进行训练，并在分类阶段采用简单的特征拼接方法将两种模态的特征进行融合，最终输出分类结果.对比实验中使用的数据与本实验提出的方法所用的数据一致，参数设置与2.1节一致，并且利用五折交叉验证.表4展示了各网络的分类性能.实验结果显示，本文提出的网络MACNet分类效果最佳，其准确率为91.10%，灵敏度为91.01%，特异性为95.53%，精确率为90.90%.这些结果表明，MACNet模型能够有效地识别AD、MCI和NC三个类别，并且在分类任务中优于其他对比模型.在对比模型中，3D VGG16在特征提取过程中主要依赖浅层到深层的堆叠结构，导致其特征表达能力有限，导致分类准确率较低.3D ResNet34通过引入残差连接缓解了深层网络中的梯度消失问题，提升了特征提取能力，所以在分类性能上相比于3D VGG16有了一定的提高.3D ViT采用自注意力机制捕捉长程依赖关系，这使得其能处理较为复杂的特征.但是模型并未能充分利用两种模态之间的交互信息，导致其未能有好的表现.尽管MACNet的特异性略低于3D VGG16和3D ResNet34，但其准确率、灵敏度和精确率均显著高于所有对比模型.MACNet的特异性相对下降的原因可能在于，MACNet强化了对AD和MCI的识别能力，提高了疾病检出率，但也可能将部分影像特征不典型的正常样本误判为阳性.总体而言，本实验提出的MACNet在测试集上更突出地展示了其在AD、MCI和NC的分类性能，展示了其在医学影像分类中的潜力.

表4 对比实验结果

Table 4 Comparison results of experiments

模型	ACC/(%)	F1/(%)	SEN/(%)	SPE/(%)	PRE/(%)
3D VGG16	85.18(±2.59)	83.15	83.60	91.61	83.16
3D ResNet34	86.30(±2.41)	86.41	86.78	93.14	86.24
3D ViT	76.71(±2.63)	76.28	76.62	88.61	76.64
MACNet (本文)	91.10(±1.79)	91.01	90.53	95.53	90.90

3.4 与其他研究对比

表5展示了本论文提出的方法与其他论文的实验结果.Huang等^[25]使用1 211名受试者的2 145张MRI和FDG-PET图像提取海马感兴趣区域的特征，并将其输入到VGG，实现了AD与NC的90.10%的准确率. Song等^[9]将MRI图像与PET图像进行配准后，对灰质区域进行融合处理，并将融合结果输入3D Simple CNN模型进行分类识别.在AD、MCI和NC分类任务中取得了74.54%的准确率.尽管使用了图像融合策略，但是整体的分类性能不高.Zhang等^[26]提出了一种基于金字塔注意力机制的生成对抗网络，该方法将MRI图像中的灰质区域与PET图像相结合，实现了多模态医学图像的信息融合，并通过神经网络对融合后的图像进行分类与诊断.AD、MCI和NC三分类准确率达到89.9%.Abuhmed等^[27]提出了一种可解释的多任务回归模型，通过结合MRI、PET影像数据以及神经心理学量表、神经病理学和认知评分，在AD、MCI和NC分类任务中取得了84.95%的准确率.但是神经心理学和认知评分容易受主观评分的影响，可能会影响模型的结果，相比于以往研究，本文提出的模型根据影像数据和年龄信息对AD、MCI和NC的分类表现出较好的性能，其准确率为91.10%，灵敏度为90.53%，特异性为95.53%.

表5 不同研究的对比

Table 5 Comparison of different research works

文献	图像类型	分类模型	类别	准确率/（%）
Huang等^[25]	MRI+PET	VGG	1211例	90.10
Song等^[9]	MRI+PET	3D Simple CNN	95 AD/160 MCI/126 NC	74.54
Zhang等^[26]	MRI+PET	PA-Net	370例	89.90
Abuhmed等^[27]	MRI+PET+神经心理学+神经病理学+认知评分	BiLSTM+随机森林（多任务）	1371例	84.95
本文	MRI+PET+年龄	MACNet	210 AD/257 MCI/267 NC	91.10

4 总结

本文提出了一种新的基于对抗学习与交叉注意力的多任务模型，用于AD的早期诊断．由于MRI和PET特征之间存在特征差异，直接进行特征融合可能会影响分类性能．因此，本文引入了对抗学习方法，使两种模态的特征映射到一个公共空间，从而减少模态间的差异，提升融合效果．与传统的基于通道拼接的特征融合策略不同，本实验采用基于Transformer的交叉注意力机制，以更有效地融合特征并提升分类性能．此外，本文还利用脑龄预测任务作为AD、MCI和NC分类的辅助任务．结果表明，引入的多任务学习在提高模型的分类性能上有一定的效果．本文在公开的ADNI数据集上进行了实验，结果表明，所提出的模型在AD早期诊断任务中取得了较好的表现，准确率达到了91.10%，灵敏度为90.53%，F1分数为91.01%．为了验证本文模型的有效性，本文还在pMCI和sMCI上进行了实验，MACNet的分类准确率达到了76.53%．尽管如此，本文研究还存在一些局限性．首先，本文只对AD/MCI/NC以及pMCI/sMCI分别进行了分类，未来研究中可以扩展到AD/pMCI/sMCI/NC的分类，以便更全面地评估模型在不同疾病阶段的诊断性能．其次，本文采用了监督学习的方法，依赖标注数据进行训练，未来研究中可以考虑使用半监督分类方法，以减少对数据的依赖，从而更好地利用未标注的数据，提高模型的泛化能力．最后，考虑到本研究目前仅在ADNI公共数据集上进行了验证，虽然该数据集具有广泛的代表性，但仍不足以全面反映模型在真实临床环境中的表现，未来将考虑引入独立外部临床数据集，以进一步评估所提模型在实际应用场景中的稳定性和鲁棒性.

利益冲突

无

参考文献

原文顺序

文献年度倒序

文中引用次数倒序

被引期刊影响因子

[1]

ANDERSON

N D

State of the science on mild cognitive impairment (MCI)

[J]. CNS Spectrums, 2019, 24(1): 78-87.

DOI:10.1017/S1092852918001347 PMID:30651152 [本文引用: 1]

Mild cognitive impairment (MCI) represents a transitional stage between healthy aging and dementia, and affects 10-15% of the population over the age of 65. The failure of drug trials in Alzheimer's disease (AD) treatment has shifted researchers' focus toward delaying progression from MCI to dementia, which would reduce the prevalence and costs of dementia profoundly. Diagnostic criteria for MCI increasingly emphasize the need for positive biomarkers to detect preclinical AD. The phenomenology of MCI comprises lower quality-of-life, greater symptoms of depression, and avoidant coping strategies including withdrawal from social engagement. Neurobiological features of MCI are hypoperfusion and hypometabolism in temporoparietal cortices, medial temporal lobe atrophy particularly in rhinal cortices, elevated tau and phosphorylated tau and decreased Aβ42 in cerebrospinal fluid, and brain Aβ42 deposition. Elevated tau can be identified in MCI, particularly in the entorhinal cortex, using positron emission tomography, and analysis of signal complexity using electroencephalography or magnetoencephalography holds promise as a biomarker. Assessment of MCI also relies on cognitive screening and neuropsychological assessment, but there is an urgent need for standardized cognitive tests to capitalize on recent discoveries in cognitive neuroscience that may lead to more sensitive measures of MCI. Cholinesterase inhibitors are frequently prescribed for MCI, despite the lack of evidence for their efficacy. Exercise and diet interventions hold promise for increasing reserve in MCI, and group psychoeducational programs teaching practical memory strategies appear effective. More work is needed to better understand the phenomenology and neurobiology of MCI, and how best to assess it and delay progression to dementia.

[2]

CUMMINGS

, ZHOU

, LEE

, et al.

Alzheimer's disease drug development pipeline: 2023

[J]. Alzh Dement-TRCI, 2023, 9(2): e12385.

[3]

BEACH

T G

, MONSELL

S E

, Phillips

L E

, et al.

Accuracy of the clinical diagnosis of Alzheimer disease at National Institute on Aging Alzheimer Disease Centers, 2005-2010

[J]. J Neuropathol Exp, 2012, 71(4): 266-273.

[4]

LIU

, JIN

, Zeng

, et al.

Image enhancement guided object detection in visually degraded scenes

[J]. IEEE T Neur Net Lear, 2024, 35(10): 14164-14177.

[5]

KONG

, ZHANG

, ZHU

, et al.

Multi-modal data Alzheimer’s disease detection based on 3D convolution

[J]. Biomed Signal Proces, 2022, 75: 103565.

DOI:10.1016/j.bspc.2022.103565 URL [本文引用: 1]

[6]

ZHANG

, LI

, ZHANGH

, et al.

Multi-modal deep learning model for auxiliary diagnosis of Alzheimer’s disease

[J]. Neurocomputing, 2019, 361: 185-195.

DOI:10.1016/j.neucom.2019.04.093 URL [本文引用: 1]

[7]

MENG

, LIU

, FAN

, et al.

Multi-modal neuroimaging neural network-based feature detection for diagnosis of Alzheimer’s disease

[J]. Front Aging Neurosci, 2022, 14: 911220.

DOI:10.3389/fnagi.2022.911220 URL [本文引用: 1]

Alzheimer’s disease (AD) is a neurodegenerative brain disease, and it is challenging to mine features that distinguish AD and healthy control (HC) from multiple datasets. Brain network modeling technology in AD using single-modal images often lacks supplementary information regarding multi-source resolution and has poor spatiotemporal sensitivity. In this study, we proposed a novel multi-modal LassoNet framework with a neural network for AD-related feature detection and classification. Specifically, data including two modalities of resting-state functional magnetic resonance imaging (rs-fMRI) and diffusion tensor imaging (DTI) were adopted for predicting pathological brain areas related to AD. The results of 10 repeated experiments and validation experiments in three groups prove that our proposed framework outperforms well in classification performance, generalization, and reproducibility. Also, we found discriminative brain regions, such as Hippocampus, Frontal_Inf_Orb_L, Parietal_Sup_L, Putamen_L, Fusiform_R, etc. These discoveries provide a novel method for AD research, and the experimental study demonstrates that the framework will further improve our understanding of the mechanisms underlying the development of AD.

[8]

KUN

H A N

, HAIWEI

P A N

, WEI

, et al.

Alzheimer's disease classification method based on multi-modal medical images

[J]. J Tsinghua Univ (Sci Technol), 2020, 60(8): 664-671,682.

[9]

SONG

, ZHENG

, LI

, et al.

An effective multimodal image fusion method using MRI and PET for Alzheimer's disease diagnosis

[J]. Front Digit Health, 2021, 3: 637386.

DOI:10.3389/fdgth.2021.637386 URL [本文引用: 3]

Alzheimer's disease (AD) is an irreversible brain disease that severely damages human thinking and memory. Early diagnosis plays an important part in the prevention and treatment of AD. Neuroimaging-based computer-aided diagnosis (CAD) has shown that deep learning methods using multimodal images are beneficial to guide AD detection. In recent years, many methods based on multimodal feature learning have been proposed to extract and fuse latent representation information from different neuroimaging modalities including magnetic resonance imaging (MRI) and 18-fluorodeoxyglucose positron emission tomography (FDG-PET). However, these methods lack the interpretability required to clearly explain the specific meaning of the extracted information. To make the multimodal fusion process more persuasive, we propose an image fusion method to aid AD diagnosis. Specifically, we fuse the gray matter (GM) tissue area of brain MRI and FDG-PET images by registration and mask coding to obtain a new fused modality called “GM-PET.” The resulting single composite image emphasizes the GM area that is critical for AD diagnosis, while retaining both the contour and metabolic characteristics of the subject's brain tissue. In addition, we use the three-dimensional simple convolutional neural network (3D Simple CNN) and 3D Multi-Scale CNN to evaluate the effectiveness of our image fusion method in binary classification and multi-classification tasks. Experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset indicate that the proposed image fusion method achieves better overall performance than unimodal and feature fusion methods, and that it outperforms state-of-the-art methods for AD diagnosis.

[10]

LOGAN

, WILLIAMS

B G

, FERREIRA DA SILVA

, et al.

Deep convolutional neural networks with ensemble learning and generative adversarial networks for Alzheimer’s disease image data classification

[J]. Front Aging Neurosci, 2021, 13: 720226.

DOI:10.3389/fnagi.2021.720226 URL [本文引用: 1]

Recent advancements in deep learning (DL) have made possible new methodologies for analyzing massive datasets with intriguing implications in healthcare. Convolutional neural networks (CNN), which have proven to be successful supervised algorithms for classifying imaging data, are of particular interest in the neuroscience community for their utility in the classification of Alzheimer’s disease (AD). AD is the leading cause of dementia in the aging population. There remains a critical unmet need for early detection of AD pathogenesis based on non-invasive neuroimaging techniques, such as magnetic resonance imaging (MRI) and positron emission tomography (PET). In this comprehensive review, we explore potential interdisciplinary approaches for early detection and provide insight into recent advances on AD classification using 3D CNN architectures for multi-modal PET/MRI data. We also consider the application of generative adversarial networks (GANs) to overcome pitfalls associated with limited data. Finally, we discuss increasing the robustness of CNNs by combining them with ensemble learning (EL).

[11]

ZHANG

Y X

, WU

X H

, TANG

L L

, et al.

Alzheimer's disease classification method based on multimodal data

[J]. J Comput Appl, 2023, 43(S2): 298.

张昀枭, 吴晓红, 唐荔莉, 等.

基于多模态数据的阿尔兹海默病分类方法

[J]. 计算机应用, 2023, 43(S2): 298.

[12]

J J

, WANG

Y J

Hybrid attention and multiscale module for Alzheimer's disease classification

[J]. Chinese J Magn Reson, 2025, 42(2): 103-116.

顾佳佳, 王远军.

混合注意力和多尺度模块的阿尔茨海默病分类方法

[J]. 波谱学杂志, 2025, 42(2): 103-116.

DOI:10.11938/cjmr20243132 [本文引用: 1]

阿尔茨海默病是痴呆症中最常见的一种神经退行性疾病，其病程进展慢、影像学特征复杂多样，传统影像的阅片诊断过程非常耗时且准确率判断差异大．针对这一问题，本文提出了一种基于混合注意力和多尺度信息融合的分类方法（3D HAMSNet）．该方法基于影像数据，利用卷积神经网络，通过引入混合注意力机制增强模型对海马体、杏仁核和颞叶等区域的关注，并利用基于空洞卷积和软注意力的多尺度信息融合模块有效融合阿尔茨海默病的多种空间尺度特征，从而提高对阿尔茨海默病的早期诊断和预测能力．在198名阿尔茨海默病患者、200名轻度认知障碍患者和139名健康对照组的三分类任务中，所提出的方法分类准确率、特异性和F1分数分别达到了 94.14%、97.07%和94.17%，相较于基线网络分别提升了9.88%、4.94%和10.17%．该方法相较现有分类方法表现突出，为阿尔茨海默病的早期诊断提供了新的方法．

[13]

VASWANI

, SHAZEER

, PARMAR

, et al.

Attention is all you need

[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.

[14]

DOSOVITSKIY

, BEYER

, KOLESNIKOV

, et al.

An image is worth 16x16 words: Transformers for image recognition at scale. arXiv (2021-06-03) [2025-05-30]. https://arxiv.org/abs/2010.11929.

URL [本文引用: 1]

[15]

ZHU

, TAN

, LIN

, et al.

Efficient self-attention mechanism and structural distilling model for Alzheimer’s disease diagnosis

[J]. Comput Biol Med, 2022, 147: 105737.

DOI:10.1016/j.compbiomed.2022.105737 URL [本文引用: 1]

[16]

KUSHOL

, MASOUMZADEH

, HUO

, et al.

Addformer: Alzheimer’s disease detection from structural mri using fusion transformer

[C]// 2022 IEEE 19th I Symp Biomed Imaging (ISBI). IEEE, 2022: 1-5.

[17]

KAUFMANN

, VAN DER MEER

, DOAN

N T

, et al.

Common brain disorders are associated with heritable patterns of apparent aging of the brain

[J]. Nat Neurosci, 2019, 22(10): 1617-1623.

DOI:10.1038/s41593-019-0471-7 PMID:31551603 [本文引用: 2]

Common risk factors for psychiatric and other brain disorders are likely to converge on biological pathways influencing the development and maintenance of brain structure and function across life. Using structural MRI data from 45,615 individuals aged 3-96 years, we demonstrate distinct patterns of apparent brain aging in several brain disorders and reveal genetic pleiotropy between apparent brain aging in healthy individuals and common brain disorders.

[18]

SIMONYAN

, ZISSERMAN

Very deep convolutional networks for large-scale image recognition

[C]// International Conference on Learning Representations (ICLR). 2015: 1-14.

[19]

GOODFELLOW

, POUGET-ABADIE

, MIRZA

, et al.

Generative adversarial nets

[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems, 2014, 2: 2672-2680.

[20]

CHEN

C F R

, FAN

, PANDA

Crossvit: Cross-attention multi-scale vision transformer for image classification

[C]// 2021 IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 2021: 347-356.

[21]

JACK JR

C R

, BERNSTEIN

M A

, FOX

N C

, et al.

The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods

[J]. J Magn Reson Imaging, 2008, 27(4): 685-691.

DOI:10.1002/jmri.21049 PMID:18302232 [本文引用: 1]

The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a longitudinal multisite observational study of healthy elders, mild cognitive impairment (MCI), and Alzheimer's disease. Magnetic resonance imaging (MRI), (18F)-fluorodeoxyglucose positron emission tomography (FDG PET), urine serum, and cerebrospinal fluid (CSF) biomarkers, as well as clinical/psychometric assessments are acquired at multiple time points. All data will be cross-linked and made available to the general scientific community. The purpose of this report is to describe the MRI methods employed in ADNI. The ADNI MRI core established specifications that guided protocol development. A major effort was devoted to evaluating 3D T(1)-weighted sequences for morphometric analyses. Several options for this sequence were optimized for the relevant manufacturer platforms and then compared in a reduced-scale clinical trial. The protocol selected for the ADNI study includes: back-to-back 3D magnetization prepared rapid gradient echo (MP-RAGE) scans; B(1)-calibration scans when applicable; and an axial proton density-T(2) dual contrast (i.e., echo) fast spin echo/turbo spin echo (FSE/TSE) for pathology detection. ADNI MRI methods seek to maximize scientific utility while minimizing the burden placed on participants. The approach taken in ADNI to standardization across sites and platforms of the MRI protocol, postacquisition corrections, and phantom-based monitoring of all scanners could be used as a model for other multisite trials.(c) 2008 Wiley-Liss, Inc.

[22]

ESKILDSEN

S F

, COUPE

, GARCIA-LORENZO

, et al.

Prediction of Alzheimer's disease in subjects with mild cognitive impairment from the ADNI cohort using patterns of cortical thinning

[J]. NeuroImage, 2013, 65: 511-521.

DOI:10.1016/j.neuroimage.2012.09.058 PMID:23036450 [本文引用: 1]

Predicting Alzheimer's disease (AD) in individuals with some symptoms of cognitive decline may have great influence on treatment choice and disease progression. Structural magnetic resonance imaging (MRI) has the potential of revealing early signs of neurodegeneration in the human brain and may thus aid in predicting and diagnosing AD. Surface-based cortical thickness measurements from T1-weighted MRI have demonstrated high sensitivity to cortical gray matter changes. In this study we investigated the possibility for using patterns of cortical thickness measurements for predicting AD in subjects with mild cognitive impairment (MCI). We used a novel technique for identifying cortical regions potentially discriminative for separating individuals with MCI who progress to probable AD, from individuals with MCI who do not progress to probable AD. Specific patterns of atrophy were identified at four time periods before diagnosis of probable AD and features were selected as regions of interest within these patterns. The selected regions were used for cortical thickness measurements and applied in a classifier for testing the ability to predict AD at the four stages. In the validation, the test subjects were excluded from the feature selection to obtain unbiased results. The accuracy of the prediction improved as the time to conversion from MCI to AD decreased, from 70% at 3 years before the clinical criteria for AD was met, to 76% at 6 months before AD. By inclusion of test subjects in the feature selection process, the prediction accuracies were artificially inflated to a range of 73% to 81%. Two important results emerge from this study. First, prediction accuracies of conversion from MCI to AD can be improved by learning the atrophy patterns that are specific to the different stages of disease progression. This has the potential to guide the further development of imaging biomarkers in AD. Second, the results show that one needs to be careful when designing training, testing and validation schemes to ensure that datasets used to build the predictive models are not used in testing and validation.Copyright © 2012 Elsevier Inc. All rights reserved.

[23]

JENKINSON

, BECKMANe

C F

, BEHRENS

T E J

, et al.

FSL

[J]. NeuroImage, 2012, 62(2): 782-790.

DOI:10.1016/j.neuroimage.2011.09.015 PMID:21979382 [本文引用: 1]

FSL (the FMRIB Software Library) is a comprehensive library of analysis tools for functional, structural and diffusion MRI brain imaging data, written mainly by members of the Analysis Group, FMRIB, Oxford. For this NeuroImage special issue on "20 years of fMRI" we have been asked to write about the history, developments and current status of FSL. We also include some descriptions of parts of FSL that are not well covered in the existing literature. We hope that some of this content might be of interest to users of FSL, and also maybe to new research groups considering creating, releasing and supporting new software packages for brain image analysis.Copyright © 2011 Elsevier Inc. All rights reserved.

[24]

SELVARAJU

R R

, COGSWELL

, DAS

, et al.

Grad-cam: Visual explanations from deep networks via gradient-based localization

[C]// 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618-626.