DHAFGan: A Dense Hybrid Attention Fusion Generative Adversarial Network for Infrared and Visible Image Fusion

Qiong Hong; Zhonghua Xu; Dongli Qin; Yuhui Zheng; Hao Zhang

DHAFGan: A Dense Hybrid Attention Fusion Generative Adversarial Network for Infrared and Visible Image Fusion

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.

Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Qiong Hong, Zhonghua Xu, Dongli Qin, Yuhui Zheng, Hao Zhang

Abstract

Aiming at the problems existing in the current infrared and visible light image fusion algorithms, such as insufficient perception of typical features, poor visual representation of the fusion results, and insufficient utilization of important secondary information, this paper proposes an infrared and visible light image fusion algorithm based on shallow-deep feature extraction and dual-channel hybrid attention. Firstly, a shallow-deep feature extraction module is constructed. This module utilizes shallow convolutional layers and deep multi-scale receptive field units to extract surface-level features and deep semantic information from the source images, respectively, thereby achieving multi-level multimodal feature extraction. Secondly, Dual-Channel Hybrid Attention Fusion Module (DCAFM) is constructed. Spatial attention is focused on the salient areas of the image, and channel attention is used to strengthen the feature channels, thereby enhancing the fusion ability of multimodal features. Finally, primary and secondary feature loss functions are formulated to constrain both the generator and discriminator, facilitating the extraction of latent secondary feature information from the source images. Experimental results on the DroneVehicle dataset demonstrate that the proposed algorithm achieves superior performance in both subjective visual evaluation and objective metrics. Quantitative evaluations show that our method outperforms seven state-of-the-art approaches, achieving the highest scores in standard deviation (SD=9.3541), mutual information (MI=2.4321), and peak signal-to-noise ratio (PSNR=65.7852), while ranking second in average gradient (AG=3.9854). The fused images generated by our method not only align with human visual perception characteristics but also retain rich detailed information, effectively preserving both dominant and subtle features from the source modalities.

DOI

https://doi.org/10.31223/X5V777

Subjects

Computer Engineering

Keywords

Image Fusion, Multi-Scale Receptive Field, Dual-Channel Hybrid Attention, Loss Of Primary and Secondary Features

Dates

Published: 2026-03-21 17:21

Last Updated: 2026-03-21 17:21

License

CC BY Attribution 4.0 International

Metrics

Views: 134

Downloads: 20