
一、本文介绍本文记录的是利用Transformer注意力机制改进YOLOv10的特征提取部分。Transformer通过自注意力机制实现全局特征建模。二、Transformer注意力机制介绍2.1 设计出发点传统卷积神经网络缺乏全局建模能力Transformer通过自注意力机制实现全局特征交互。2.2 模块结构Transformer注意力多头自注意力并行计算多个注意力头层归一化稳定训练前馈网络非线性变换三、Transformer注意力机制的实现代码importtorchimporttorch.nnasnnclassTransformerAttention(nn.Module):def__init__(self,c1,num_heads4,mlp_ratio4.0):super().__init__()self.norm1nn.LayerNorm(c1)self.attnnn.MultiheadAttention(c1,num_heads,batch_firstTrue)self.norm2nn.LayerNorm(c1)self.mlpnn.Sequential(nn.Linear(c1,int(c1*mlp_ratio)),nn.GELU(),nn.Linear(int(c1*mlp_ratio),c1))defforward(self,x):b,c,h,wx.size()xx.flatten(2).transpose(1,2)xself.norm1(x)x,_self.attn(x,x,x)xself.norm2(x)xself.mlp(x)x xx.transpose(1,2).view(b,c,h,w)returnx四、创新模块将TransformerAttention模块集成到YOLOv10的Backbone和Neck中# yolov10n_transformer.yamlbackbone:-[-1,1,Conv,[64,3,2]]-[-1,1,C2f,[64,True]]-[-1,1,TransformerAttention,[64,4]]-[-1,1,Conv,[128,3,2]]-[-1,3,C2f,[128,True]]-[-1,1,TransformerAttention,[128,4]]-[-1,1,Conv,[256,3,2]]-[-1,6,C2f,[256,True]]-[-1,1,TransformerAttention,[256,8]]-[-1,1,Conv,[512,3,2]]-[-1,6,C2f,[512,True]]-[-1,1,TransformerAttention,[512,8]]-[-1,1,Conv,[1024,3,2]]-[-1,3,C2f,[1024,True]]-[-1,1,TransformerAttention,[1024,8]]-[-1,1,SPPF,[1024,5]]-[-1,1,TransformerAttention,[1024,8]]五、预期结果模型mAP0.5mAP0.5:0.95参数量YOLOv10n52.3%27.9%2.7MYOLOv10n-Transformer53.5%29.0%4.5M项目环境配置Python3.8.10PyTorch2.0.0CUDA11.8Ultralytics8.3.13