精读双模态目标检测论文系列一｜C²DFF-Net 创新全解析（附可运行代码 + 二次顶刊创新思路）

张开发

• 2026/6/16 17:42:08 • 15 分钟阅读

分享文章

精读双模态目标检测论文系列一｜C²DFF-Net 创新全解析（附可运行代码 + 二次顶刊创新思路）

大家好这里是双模态遥感目标检测精读系列第一篇本期精读 IEEE TGRS 2025 顶刊论文C²DFF-Net聚焦可见光 - 红外双模态遥感小目标检测从论文创新、模块拆解、可运行代码到顶刊二次创新思路全覆盖适合科研党、算法工程师直接复用论文标题如下C2DFF-Net for Object Detection in Multimodal Remote Sensing Images代码链接如下https://github.com/FPGAzzy/C2DFF-Net一、摘要针对可见光‑红外多模态遥感图像小目标检测中模态互补信息利用不充分、空间‑频域跨域特征缺失、模型参数量与计算量过大难以机载部署、复杂光照鲁棒性差等问题本文提出一种超轻量级跨模态跨域差分特征融合网络C²DFF‑Net。以轻量化 YOLOv8n 为基线构建双流特征提取主干通过跨模态差分特征交互模块 CDFIM强化模态间互补信息交互通过跨域门控自注意力模块 CGSA实现空间‑频域联合特征融合与全局上下文建模并提出自适应光照感知掩码 ALM训练策略缓解模型模态偏置。在 DroneVehicle、VEDAI、FLIR 三个权威数据集上所提方法以仅6.58M 参数量、14.6GFLOPs 计算量实现85.7% mAP50达到当前 SOTA 水平并完成真实无人机机载部署验证。本文方法在精度‑算力权衡上显著优于现有方法具备极强的工程实用价值。二、引言2.1 研究背景与意义随着高分辨率卫星与无人机UAV技术普及遥感图像目标检测在智慧城市、安防监控、交通管理、灾害应急等领域广泛应用。与自然图像相比遥感目标具有尺寸极小通常32×32 像素、像素占比低、背景复杂、光照多变等特点单模态检测存在明显缺陷可见光图像纹理丰富但在夜间、暗光、强光曝光、眩光条件下失效红外图像不受光照影响但分辨率低、纹理缺失、噪声大。可见光‑红外多模态融合能够实现全天时、全天候稳定检测成为主流技术路线。然而现有方法仍存在两大核心瓶颈融合方式浅层化仅在空间域进行拼接 / 相加忽略跨模态差分特征与空间‑频域跨域互补信息小目标与复杂背景下鲁棒性不足模型轻量化不足Transformer 等全局建模方法精度高但参数量与计算量巨大无法在无人机、卫星等边缘设备实时推理。2.2 本文主要贡献提出C²DFF‑Net轻量级多模态小目标检测框架实现跨模态、跨域差分特征深度融合保持极低计算开销设计CDFIM 模块借鉴差分放大器思想提取模态差分特征并结合通道‑空间注意力强化交互适配长条遥感目标设计CGSA 模块通过 FFT 实现空间‑频域全局建模改进极化自注意力并引入自适应门控实现高效低冗余特征融合提出ALM 自适应光照掩码策略动态平衡双模态权重提升曝光、眩光、暗光环境鲁棒性无推理额外开销在三个公开数据集达到 SOTA并完成真机无人机机载部署验证工程实用性。三、相关工作3.1 多模态遥感目标检测多模态融合分为像素级、特征级、决策级三类像素级融合计算量小但模态交互弱易受噪声影响决策级融合检测结果后融合不建模模态依赖计算开销大特征级融合端到端学习能充分挖掘互补信息成为主流。现有方法如 SuperYOLO、ICAFusion、MMFDet 等或精度不足或模型过重均未同时兼顾小目标检测性能、模态深度融合、轻量部署三大需求。3.2 注意力机制与频域信息利用通道注意力SENet、空间‑通道注意力CBAM、通道先验注意力CPCA被广泛用于细粒度特征增强。频域信息FFT能以极低代价捕获全局上下文SFINet、LF‑MDet 等证明空间‑频域融合可显著提升小目标检测能力但未与跨模态差分特征有效结合。3.3 轻量化检测网络YOLOv5/8/11 系列在保持精度的同时大幅降低参数量适合边缘部署。本文以YOLOv8n为基线构建双流网络在不显著增加计算量的前提下提升融合性能。三、本文方法3.1 整体架构C²DFF‑Net 采用双流特征提取多尺度模块增强单检测头输出的端到端结构双流主干可见光、红外图像分别独立提取卷积特征CDFIM 模块插入多尺度特征层强化跨模态差分交互CGSA 模块对双模态特征进行空间‑频域跨域融合ALM 策略训练阶段动态掩码平衡模态学习检测头沿用 YOLOv8 检测头输出目标框、类别、置信度。所有模块均为即插即用可直接嵌入任意双流检测模型。3.2 跨模态差分特征交互模块 CDFIM设计动机传统相加 / 拼接会引入大量冗余信息无法突出双模态互补差异。CDFIM 借鉴差分放大器原理放大差模信号互补信息抑制共模信号冗余信息。import torch import torch.nn as nn import torch.nn.functional as F class CPCA_ChannelAttention(nn.Module): # 此类无须修改接口原封不动保留 def __init__(self, input_channels, internal_neurons): super(CPCA_ChannelAttention, self).__init__() self.fc1 nn.Conv2d(in_channelsinput_channels, out_channelsinternal_neurons, kernel_size1, stride1, biasTrue) self.fc2 nn.Conv2d(in_channelsinternal_neurons, out_channelsinput_channels, kernel_size1, stride1, biasTrue) self.input_channels input_channels def forward(self, inputs): x1 F.adaptive_avg_pool2d(inputs, output_size(1, 1)) x1 self.fc1(x1) x1 F.relu(x1, inplaceTrue) x1 self.fc2(x1) x1 torch.sigmoid(x1) x2 F.adaptive_max_pool2d(inputs, output_size(1, 1)) x2 self.fc1(x2) x2 F.relu(x2, inplaceTrue) x2 self.fc2(x2) x2 torch.sigmoid(x2) x x1 x2 x x.view(-1, self.input_channels, 1, 1) return inputs * x #通道层的结果权重X与原输入相乘 class CPCA(nn.Module): # 接口修改使用 c1, c2 以适配 YOLO def __init__(self, c1, c2, channelAttention_reduce4): super().__init__() # 接口适配因为你的 forward 里用了 torch.cat通道数会翻倍 # 所以这里的 channels 必须是双路通道之和以防 conv1 报错 channels sum(c1) if isinstance(c1, list) else c1 * 2 out_channels c2 # 下面所有代码保持绝对原汁原味 self.conv1 nn.Conv2d(channels, channels, kernel_size(1, 1), padding0) self.ca CPCA_ChannelAttention(input_channelschannels, internal_neuronschannels // channelAttention_reduce) self.dconv5_5 nn.Conv2d(channels, channels, kernel_size5, padding2, groupschannels) self.dconv1_7 nn.Conv2d(channels, channels, kernel_size(1, 7), padding(0, 3), groupschannels)#groupschannels 表示每个输入通道独立进行卷积而不是将所有通道一起卷积。代表深度可分离 self.dconv7_1 nn.Conv2d(channels, channels, kernel_size(7, 1), padding(3, 0), groupschannels) self.dconv1_11 nn.Conv2d(channels, channels, kernel_size(1, 11), padding(0, 5), groupschannels) self.dconv11_1 nn.Conv2d(channels, channels, kernel_size(11, 1), padding(5, 0), groupschannels) self.dconv1_21 nn.Conv2d(channels, channels, kernel_size(1, 21), padding(0, 10), groupschannels) self.dconv21_1 nn.Conv2d(channels, channels, kernel_size(21, 1), padding(10, 0), groupschannels) self.conv2 nn.Conv2d(channels, out_channels, kernel_size(1, 1), padding0) self.act nn.GELU() def forward(self, x): #####原版########################################### inputs torch.cat((x[0], x[1]), dim1) #把输入的可见光、红外图像通道级联 # Global Perceptron 全局感知 #inputs x[0]-x[1] inputs self.conv1(inputs) inputs self.act(inputs) inputs self.ca(inputs) x_init self.dconv5_5(inputs) x_1 self.dconv1_7(x_init) x_1 self.dconv7_1(x_1) x_2 self.dconv1_11(x_init) x_2 self.dconv11_1(x_2) x_3 self.dconv1_21(x_init) x_3 self.dconv21_1(x_3) x x_1 x_2 x_3 x_init spatial_att self.conv1(x) out spatial_att * inputs out self.conv2(out) return out class CDFIM(nn.Module): # 接口修改使用 c1, c2 以适配 YOLO def __init__(self, c1, c2, channelAttention_reduce4): super().__init__() # 接口适配因为你的 forward 里用了减法 x[0]-x[1]通道数保持不变 # 所以这里的 channels 提取单路通道数即可 channels c1[0] if isinstance(c1, list) else c1 out_channels c2 # 下面所有代码保持绝对原汁原味 self.conv1 nn.Conv2d(channels, channels, kernel_size(1, 1), padding0) self.ca CPCA_ChannelAttention(input_channelschannels, internal_neuronschannels // channelAttention_reduce) self.dconv5_5 nn.Conv2d(channels, channels, kernel_size5, padding2, groupschannels) self.dconv1_7 nn.Conv2d(channels, channels, kernel_size(1, 7), padding(0, 3), groupschannels) self.dconv7_1 nn.Conv2d(channels, channels, kernel_size(7, 1), padding(3, 0), groupschannels) self.dconv1_11 nn.Conv2d(channels, channels, kernel_size(1, 11), padding(0, 5), groupschannels) self.dconv11_1 nn.Conv2d(channels, channels, kernel_size(11, 1), padding(5, 0), groupschannels) self.dconv1_21 nn.Conv2d(channels, channels, kernel_size(1, 21), padding(0, 10), groupschannels) self.dconv21_1 nn.Conv2d(channels, channels, kernel_size(21, 1), padding(10, 0), groupschannels) self.conv2 nn.Conv2d(channels, out_channels, kernel_size(1, 1), padding0) self.act nn.GELU() def forward(self, x): #####原版########################################### #inputs torch.cat((x[0], x[1]), dim1) #把输入的可见光、红外图像通道级联 # Global Perceptron 全局感知 #inputs x[0]-x[1] inputsx[0]-x[1] inputs self.conv1(inputs) inputs self.act(inputs) inputs self.ca(inputs) inputs0inputsx[0] inputs1 inputs x[1] inputsinputs0inputs1 x_init self.dconv5_5(inputs) x_1 self.dconv1_7(x_init) x_1 self.dconv7_1(x_1) x_2 self.dconv1_11(x_init) x_2 self.dconv11_1(x_2) x_3 self.dconv1_21(x_init) x_3 self.dconv21_1(x_3) x x_1 x_2 x_3 x_init spatial_att self.conv1(x) out spatial_att * inputs out self.conv2(out) return out模块收益mAP50 提升1.8%仅增加0.26M 参数、0.7GFLOPs。3.3 跨域门控自注意力模块 CGSACGSA 由CFE 跨域特征提取 → SAFF 自注意力融合 → AG 自适应门控三部分组成实现空间‑频域联合建模。1CFE 跨域特征提取利用 FFT 实现轻量全局建模对双模态特征做 FFT分离振幅与相位通道拼接后经 1×1 卷积瓶颈层融合IFFT 逆变换回空间域计算空间‑频域差分特征F^VConv1×1(FV−FVI),F^IConv1×1(FI−FVI)2SAFF 改进极化自注意力3AG 自适应门控单元class CGSA(nn.Module):###########################串联(只要通道注意力机制)############################################# def __init__(self, channel64,out_channels64): super().__init__() self.ch_wv nn.Conv2d(channel, channel // 2, kernel_size(1, 1)) self.ch_wq nn.Conv2d(channel, 1, kernel_size(1, 1)) self.softmax_channel nn.Softmax(1) self.softmax_spatial nn.Softmax(-1) self.ch_wz nn.Conv2d(channel // 2, channel, kernel_size(1, 1)) self.ln nn.LayerNorm(channel) self.sigmoid nn.Sigmoid() self.sp_wv nn.Conv2d(channel, channel // 2, kernel_size(1, 1)) self.sp_wq nn.Conv2d(channel, channel // 2, kernel_size(1, 1)) self.agp nn.AdaptiveAvgPool2d((1, 1)) self.conv1 nn.Conv2d(2, 1, kernel_size(1, 1)) self.conv2 nn.Conv2d(channel, channel // 2, kernel_size(1, 1)) self.convnn.Conv2d(channel*2, channel, kernel_size(1, 1)) self.w nn.Parameter(torch.ones( 2)) # 2个分支, 每个分支设置一个自适应学习权重, 初始化为1, nn.Parameter需放入Tensor类型的数据通过 nn.Parameter 定义的它将作为模型的一部分进行学习和更新self.w 初始化为1初始时 w1 和 w2 都将是0.5表示两个模态在开始时具有相同的重要性。随着训练的进行根据模型在数据上的表现这些权重将被调整。 self.ww nn.Parameter(torch.ones(2)) self.ga1GatedMultimodalLayer(channel,channel) self.ga2 GatedMultimodalLayer(channel, channel) self.fre_process Freprocess(channel) def forward(self, x): rgb, ir x[0], x[1] #1.64.3232 b, c, h, w x[0].size()#1,64,32,32 # rgbtorch.tensor(rgb, dtypetorch.float32) # ir torch.tensor(ir, dtypetorch.float32) # print(rgb.,dtype,rgb.dtype) # print(ir.,dtype, ir.dtype) rgb, irself.fre_process(rgb, ir) # Channel-only Self-Attention channel_wv_rgb self.ch_wv(rgb) # bs,c//2,h,w 1,32,32,32 channel_wq_rgb self.ch_wq(rgb) # bs,1,h,w 1,32,32,32 channel_wv_rgb channel_wv_rgb.reshape(b, c // 2, -1) # bs,c//2,h*w 1,32,1024 channel_wq_rgb channel_wq_rgb.reshape(b, -1, 1) # bs,h*w,1 1,64*641024,1 channel_wv_ir self.ch_wv(ir) # bs,c//2,h,w 1,32,3232 channel_wq_ir self.ch_wq(ir) # bs,1,h,w1,1,3232 channel_wv_ir channel_wv_ir.reshape(b, c // 2, -1) # bs,c//2,h*w 1321024 channel_wq_ir channel_wq_ir.reshape(b, -1, 1) # bs,h*w,1 110241 channel_wq_rgb_irchannel_wq_rgbchannel_wq_ir #(110241) # channel_wq_rgb_irtorch.cat((channel_wq_rgb,channel_wq_ir),2) #110242相当于做了特征图级联 # channel_wq_rgb_ir(self.conv1(channel_wq_rgb_ir.reshape(b,2,h,w))).reshape(b,h*w,1) channel_wq_rgb_ir self.softmax_channel(channel_wq_rgb_ir) #(110241) channel_wz_rgb torch.matmul(channel_wv_rgb, channel_wq_rgb_ir).unsqueeze(-1) # bs,c//2,1,1 1,32,1024*(110241)13211 channel_weight_rgb self.sigmoid(self.ln(self.ch_wz(channel_wz_rgb).reshape(b, c, 1).permute(0, 2, 1))).permute(0, 2, 1).reshape(b, c, 1, 1) # bs,c,1,116411 channel_wz_ir torch.matmul(channel_wv_ir, channel_wq_rgb_ir).unsqueeze(-1) # bs,c//2,1,1 13211 channel_weight_ir self.sigmoid(self.ln(self.ch_wz(channel_wz_ir).reshape(b, c, 1).permute(0, 2, 1))).permute(0, 2, 1).reshape(b, c, 1,1) # bs,c,1,1 16411 out_rgbchannel_weight_rgb * rgb #16411*16432321643232 out_ir channel_weight_ir * ir#1643232 #outtorch.add(out_rgb, out_ir) #源码PSA的方式残差相乘后双模态ADD相加输出 1643232 # 按LSTM里GUM设计 channel_out_rgbself.ga1(out_ir,rgb)# channel_out_irself.ga2(out_rgb,ir) out torch.cat((channel_out_rgb, channel_out_ir), 1) # (1,128,32,32) out self.conv(out) # 为了不改变通道数用卷积通道数减半 (1,64,32,32) return out class PolarizedSelfAttention(nn.Module): ###########################串联 QV互乘############################################# def __init__(self, channel64, out_channels64): super().__init__() self.ch_wv nn.Conv2d(channel, channel // 2, kernel_size(1, 1)) self.ch_wq nn.Conv2d(channel, 1, kernel_size(1, 1)) self.softmax_channel nn.Softmax(1) self.softmax_spatial nn.Softmax(-1) self.ch_wz nn.Conv2d(channel // 2, channel, kernel_size(1, 1)) self.ln nn.LayerNorm(channel) self.sigmoid nn.Sigmoid() self.sp_wv nn.Conv2d(channel, channel // 2, kernel_size(1, 1)) self.sp_wq nn.Conv2d(channel, channel // 2, kernel_size(1, 1)) self.agp nn.AdaptiveAvgPool2d((1, 1)) self.conv1 nn.Conv2d(2, 1, kernel_size(1, 1)) self.conv2 nn.Conv2d(channel, channel // 2, kernel_size(1, 1)) self.conv nn.Conv2d(channel * 2, channel, kernel_size(1, 1)) def forward(self, x): rgb, ir x[0], x[1] # 1.64.3232 b, c, h, w x[0].size() # 1,64,32,32 # Channel-only Self-Attention channel_wv_rgb self.ch_wv(rgb) # bs,c//2,h,w 1,32,32,32 channel_wq_rgb self.ch_wq(rgb) # bs,1,h,w 1,32,32,32 channel_wv_rgb channel_wv_rgb.reshape(b, c // 2, -1) # bs,c//2,h*w 1,32,1024 channel_wq_rgb channel_wq_rgb.reshape(b, -1, 1) # bs,h*w,1 1,64*641024,1 channel_wv_ir self.ch_wv(ir) # bs,c//2,h,w 1,32,3232 channel_wq_ir self.ch_wq(ir) # bs,1,h,w1,1,3232 channel_wv_ir channel_wv_ir.reshape(b, c // 2, -1) # bs,c//2,h*w 1321024 channel_wq_ir channel_wq_ir.reshape(b, -1, 1) # bs,h*w,1 110241 # 用QV相乘交互 channel_wq_rgb self.softmax_channel(channel_wq_rgb) # (110241) channel_wq_ir self.softmax_channel(channel_wq_ir) # (110241) channel_wz_rgb torch.matmul(channel_wv_rgb, channel_wq_ir).unsqueeze( -1) # bs,c//2,1,1 1,32,1024*(110241)13211 channel_wz_ir torch.matmul(channel_wv_ir, channel_wq_rgb).unsqueeze(-1) # bs,c//2,1,1 13211 channel_weight_rgb self.sigmoid( self.ln(self.ch_wz(channel_wz_rgb).reshape(b, c, 1).permute(0, 2, 1))).permute(0, 2, 1).reshape(b, c, 1, 1) # bs,c,1,116411 channel_weight_ir self.sigmoid(self.ln(self.ch_wz(channel_wz_ir).reshape(b, c, 1).permute(0, 2, 1))).permute( 0, 2, 1).reshape(b, c, 1, 1) # bs,c,1,1 16411 ######################################################################################## out_rgb channel_weight_rgb * rgb # 16411*16432321643232 out_ir channel_weight_ir * ir channel_out_rgb torch.add(out_ir, rgb) # 1643232 channel_out_ir torch.add(out_rgb, ir) # channel_out_rgbchannel_weight_rgb * rgb #16411*16432321643232 # channel_out_ir channel_weight_ir * ir # Spatial-only Self-Attention 串行 spatial_wv_rgb self.sp_wv(channel_out_rgb) # bs,c//2,h,w1323232 spatial_wq_rgb self.sp_wq(channel_out_rgb) # bs,c//2,h,w1323232 spatial_wq_rgb self.agp(spatial_wq_rgb) # bs,c//2,1,1 13211 spatial_wv_rgb spatial_wv_rgb.reshape(b, c // 2, -1) # bs,c//2,h*w 1321024 spatial_wq_rgb spatial_wq_rgb.permute(0, 2, 3, 1).reshape(b, 1, c // 2) # bs,1,c//2 1132 spatial_wv_ir self.sp_wv(channel_out_ir) # bs,c//2,h,w1323232 spatial_wq_ir self.sp_wq(channel_out_ir) # bs,c//2,h,w spatial_wq_ir self.agp(spatial_wq_ir) # bs,c//2,1,1 spatial_wv_ir spatial_wv_ir.reshape(b, c // 2, -1) # bs,c//2,h*w spatial_wq_ir spatial_wq_ir.permute(0, 2, 3, 1).reshape(b, 1, c // 2) # bs,1,c//2 1132 spatial_wq_rgb self.softmax_spatial(spatial_wq_rgb) # (1,1,32) spatial_wq_ir self.softmax_spatial(spatial_wq_ir) # (1,1,32) spatial_wz_rgb torch.matmul(spatial_wq_ir, spatial_wv_rgb) # bs,1,h*w #(1,1,32)mul()1321024(1,1,1024) spatial_weight_rgb self.sigmoid(spatial_wz_rgb.reshape(b, 1, h, w)) # bs,1,h,w reshape后为113232 out_rgb spatial_weight_rgb * channel_out_rgb # 113232*(1643232(1,64,32,32) spatial_wz_ir torch.matmul(spatial_wq_rgb, spatial_wv_ir) # bs,1,h*w spatial_weight_ir self.sigmoid(spatial_wz_ir.reshape(b, 1, h, w)) # bs,1,h,w out_ir spatial_weight_ir * channel_out_ir # out torch.cat((out_rgb, out_ir), 1) # (1,128,32,32) spatial_out_rgb torch.add(out_ir, channel_out_rgb) spatial_out_ir torch.add(out_rgb, channel_out_ir) # (1,64,32,32) # outspatial_out_rgbspatial_out_ir out torch.cat((spatial_out_rgb, spatial_out_ir), 1) # (1,128,32,32) out self.conv(out) # 为了不改变通道数用卷积通道数减半 (1,64,32,32) return out class GatedMultimodalLayer(nn.Module):#AG模块 def __init__(self, in_channels, out_channels64): super(GatedMultimodalLayer, self).__init__() self.hidden1 nn.Conv2d(in_channels, out_channels, kernel_size1,padding0, biasFalse) self.hidden2 nn.Conv2d(in_channels, out_channels, kernel_size1,padding0, biasFalse) self.hidden_sigmoid nn.Conv2d(out_channels * 2, 1, kernel_size1, biasFalse) # Activation functions self.tanh_f nn.Tanh() self.sigmoid_f nn.Sigmoid() def forward(self, x1, x2): # 应用卷积层和激活函数 h1 self.tanh_f(self.hidden1(x1)) h2 self.tanh_f(self.hidden2(x2)) # 将结果拼接 x torch.cat((x1, x2), dim1) # 计算门控信号 z self.sigmoid_f(self.hidden_sigmoid(x)) # 计算加权和 return z * h1 (1 - z) * h2 #频率处理 class Freprocess(nn.Module): def __init__(self, channels): super(Freprocess, self).__init__() self.channels channels self.pre1 nn.Conv2d(channels, channels, kernel_size(1, 1)) self.pre2 nn.Conv2d(channels, channels, kernel_size(1, 1)) self.amp_fuse nn.Sequential(nn.Conv2d(channels,channels//2,1,1,0),nn.LeakyReLU(0.1,inplaceFalse), nn.Conv2d(channels//2,channels,1,1,0)) self.pha_fuse nn.Sequential(nn.Conv2d(channels,channels//2,1,1,0),nn.LeakyReLU(0.1,inplaceFalse), nn.Conv2d(channels//2,channels,1,1,0)) self.conv nn.Conv2d(channels*2, channels, kernel_size(1, 1)) self.amp_fuse_tpami nn.Sequential(nn.Conv2d(2 * channels, channels, 1, 1, 0), nn.LeakyReLU(0.1, inplaceFalse), nn.Conv2d(channels, channels, 1, 1, 0)) self.pha_fuse_tpami nn.Sequential(nn.Conv2d(2 * channels, channels, 1, 1, 0), nn.LeakyReLU(0.1, inplaceFalse), nn.Conv2d(channels, channels, 1, 1, 0)) self.post1 nn.Conv2d(channels,channels,1,1,0) self.post2 nn.Conv2d(channels, channels, 1, 1, 0) self.ca1 CPCA_ChannelAttention(input_channelschannels, internal_neuronschannels // 4) self.ca2 CPCA_ChannelAttention(input_channelschannels, internal_neuronschannels // 4) self.act nn.GELU() def forward(self, vis0, ir0): _, _, H, W vis0.shape vis torch.fft.rfft2((self.pre1(vis0)1e-8).to(torch.float32), dim(-2, -1))#可见光FFT ir torch.fft.rfft2((self.pre2(ir0)1e-8).to(torch.float32), dim(-2, -1))#红外FFT vis_pha torch.angle(vis) vis_amp torch.abs(vis) ir_amp torch.abs(ir) ir_pha torch.angle(ir) #TPAMI的思想############################################# amp_fuse self.amp_fuse_tpami(torch.cat([vis_amp, ir_amp], 1)) pha_fuse self.pha_fuse_tpami(torch.cat([vis_pha, ir_pha], 1)) real amp_fuse * torch.cos(pha_fuse) 1e-8 imag amp_fuse * torch.sin(pha_fuse) 1e-8 out torch.complex(real, imag) 1e-8 out torch.abs(torch.fft.irfft2(out, s(H, W), normbackward)) vis_outout-vis0 ir_out out-ir0 vis_out self.post1(vis_out) ir_out self.post2(ir_out) return vis_out,ir_out模块收益:mAP50 提升2.0%通道特征冗余显著降低计算量增加极少。3.4 自适应光照感知掩码 ALM动机:模型训练天然偏向可见光导致红外信息被忽视极端光照下性能下降。import numpy as np import matplotlib.pyplot as plt from PIL import Image import random import os import shutil # 可见光图像和红外图像的文件夹路径 rgb_folder_path rG:\KeYan\DATA\VEDAI_1024\images\train_RGB ir_folder_path rG:\KeYan\DATA\VEDAI_1024\image\train_IR rgb_output_folder rG:\KeYan\DATA\VEDAI_1024\images\train ir_output_folder rG:\KeYan\DATA\VEDAI_1024\image\train # 确保输出文件夹存在 if not os.path.exists(rgb_output_folder): os.makedirs(rgb_output_folder) if not os.path.exists(ir_output_folder): os.makedirs(ir_output_folder) # 读取文件夹中的所有文件名 rgb_image_names [f for f in os.listdir(rgb_folder_path) if f.endswith(.png)] ir_image_names [f for f in os.listdir(ir_folder_path) if f.endswith(.png)] # rgb_image_names [f for f in os.listdir(rgb_folder_path) if f.endswith(.png)] # ir_image_names [f for f in os.listdir(ir_folder_path) if f.endswith(.png)] # 确保两个文件夹中的文件名是对应的 assert set(rgb_image_names) set(ir_image_names), 文件名不匹配 # 参数设置 N 20 K 10 p 5 # 假设PK process_ratio 0.3 # 处理的图像比例 # 随机选择30%的图像进行处理 process_indices random.sample(range(len(rgb_image_names)), int(len(rgb_image_names) * process_ratio)) # 批量处理图像 for i, image_name in enumerate(rgb_image_names): # 读取可见光图像和红外图像 rgb_image_path os.path.join(rgb_folder_path, image_name) ir_image_path os.path.join(ir_folder_path, image_name) rgb_image np.array(Image.open(rgb_image_path).convert(RGB)) ir_image np.array(Image.open(ir_image_path).convert(L)) # 假设红外图像是灰度图 # 如果当前图像索引在处理索引列表中则进行掩膜处理 if i in process_indices: # 将图像划分为N×N个补丁 height, width, _ rgb_image.shape patch_height height // N patch_width width // N # 初始化存储补丁和它们平均亮度的列表 patches_rgb [] brightness [] # 提取补丁并计算亮度 for i in range(N): for j in range(N): patch rgb_image[i*patch_height:(i1)*patch_height, j*patch_width:(j1)*patch_width] patches_rgb.append(patch) # 计算亮度这里使用简单的平均值作为示例 brightness.append(np.mean(patch)) # 根据亮度排序补丁索引 sorted_indices np.argsort(brightness)#亮度升序排列 # 选择暗区K最小亮度与亮区K最大亮度的 patch 索引 top_k_indices sorted_indices[:K]#前K个即最小亮度值暗区 bottom_k_indices sorted_indices[-K:]#亮区 # 随机选择P个后K个补丁在RGB图像上添加掩膜 random_indices_rgb random.sample(bottom_k_indices.tolist(), p) # 创建原始图像的副本以添加掩膜 augmented_rgb_image rgb_image.copy() augmented_ir_image ir_image.copy() # 暗区K个补丁常代表黑夜在RGB图像上对应补丁位置添加黑色的掩膜强制学习红外特征 for idx in top_k_indices: i, j divmod(idx, N) augmented_rgb_image[i*patch_height:(i1)*patch_height, j*patch_width:(j1)*patch_width] 0 #亮区K个补丁常代表白天应多学习可见光信息但考虑强光和曝光应综合学习可见光和红外模态 # 对于随机选择的P个后K个补丁在RGB图像上添加掩膜 for idx in random_indices_rgb: i, j divmod(idx, N) augmented_rgb_image[i*patch_height:(i1)*patch_height, j*patch_width:(j1)*patch_width] 0 # 对于剩余的K-P个后K个补丁在红外图像上添加掩膜 remaining_indices_ir [idx for idx in bottom_k_indices.tolist() if idx not in random_indices_rgb] for idx in remaining_indices_ir: i, j divmod(idx, N) augmented_ir_image[i*patch_height:(i1)*patch_height, j*patch_width:(j1)*patch_width] 0 # 保存增强后的图像 plt.imsave(os.path.join(rgb_output_folder, image_name), augmented_rgb_image) plt.imsave(os.path.join(ir_output_folder, image_name), augmented_ir_image, cmapgray) else: # 如果当前图像不在处理索引列表中则直接复制原始图像到输出文件夹 shutil.copy(rgb_image_path, os.path.join(rgb_output_folder, image_name)) shutil.copy(ir_image_path, os.path.join(ir_output_folder, image_name))执行流程:可见光图像分为N×N 块计算每块平均亮度选取最暗 K 块、最亮 K 块暗区掩码可见光强制学习红外亮区含曝光 / 眩光随机 K/2 掩码红外K/2 掩码可见光掩码概率30%最优参数N8K4。优势:纯训练策略无推理耗时mAP50 再提升0.5%显著增强复杂光照鲁棒性。四、讨论4.1 模块有效性分析CDFIM突出模态互补差异抑制冗余提升交互效率CGSAFFT 替代 Transformer 实现全局建模轻量化 PSA 降低计算AG 门控优化融合ALM从数据层面解决模态偏置不增加模型负担鲁棒性提升显著。4.2 与 SOTA 对比优势精度最高小目标与复杂光照下优势明显最轻量参数量仅 6.58M计算量 14.6GFLOPs适合边缘部署通用性强模块即插即用ALM 策略可迁移至任意双模态模型。五、结论与展望5.1 结论本文提出C²DFF‑Net轻量级多模态遥感小目标检测网络通过跨模态差分交互、空间‑频域跨域融合、光照自适应掩码三大创新在保持极致轻量化的同时实现 SOTA 精度。方法在 DroneVehicle、VEDAI、FLIR 数据集验证有效并完成无人机机载部署为全天时遥感小目标检测提供高精度、低算力、高鲁棒性的实用方案。5.2 局限性依赖可见光‑红外图像时空严格配准嵌入式硬件推理速度仍有优化空间。5.3 未来工作构建端到端配准‑检测一体化模型降低对预处理的依赖开展模型量化、剪枝、蒸馏进一步加速嵌入式推理拓展至 SAR、多光谱、高光谱等更多遥感模态。顶刊二次创新思路8 个直接发论文方向科研党直接抄作业适配TGRS、CVPR、ICME、JSTARS等顶刊顶会1. 未配准双模态融合C²DFF-Net 依赖严格配准新增特征对齐模块实现端到端配准检测解决实际场景配准误差2. Mamba 频域融合用 Mamba 替代 FFT提升长距离依赖建模能力同时保持轻量2025 顶会热门3. 多模态拓展RGBIRSAR把双模态扩展到三模态设计跨模态差分自适应融合适配多源遥感数据3. 光照自适应推理把 ALM 从训练策略改为推理时动态权重实时根据光照调整模态贡献后续将进行更新以及进行二次创新发顶刊必备。。。敬请关注笔者整理双模态检测的专属论文资料免费分享给粉丝需要关注后领取。

精读双模态目标检测论文系列一｜C²DFF-Net 创新全解析（附可运行代码 + 二次顶刊创新思路）

最新文章

拆解RoF-X-X系列：手把手教你配置热插拔与链路冗余，打造高可靠卫星地面站

避坑指南：Mac+VS Code+Anaconda配置PyQt6/PySide6时，Designer和rcc路径到底怎么找？

IoT-MCP框架：大语言模型与物联网的智能交互方案

抖音批量下载助手终极指南：三步自动化采集海量视频素材

AI Agent 时代：如何让AI帮你编写高质量Java接口

实战指南：如何在CIFAR-100-LT上使用LDAM Loss提升长尾分类效果（附代码）

推荐文章

相关文章

分享文章

更多文章

Axure RP界面语言模块本地化适配指南：从环境配置到效能优化

养护之心：超越“出世/入世”二分，重思中国思想传统的精神功能

快马AI五分钟搭建Node.js服务器原型，验证你的后端想法

Odoo 销售订单中关联客户税号字段的实战指南

高光谱成像的噪声估计

CLIProxyAPI + OpenCode

Amadeus的知识库 | 传统检索不懂语义？大模型知识有限？—— RAG检索增强生成来帮忙！

单细胞数据清洗翻车实录：我如何用DropletUtils和DoubletFinder救回一批“脏数据”

5步攻克抖音封面提取难题：从技术原理到商业落地的完整指南

Winhance中文版：3分钟让Windows焕新提速的系统优化神器

终极Emby Premiere解锁指南：免费获得完整高级功能体验

从仿真到真机：在快马平台构建基于OpenClaw与ROS的机械臂智能抓取实战系统