Simultaneous Enhancement and Super-Resolution for Underwater Imagery
Gautam TataSESR is a generative model designed to solve the problem of simultaneous enhancement and super-resolution (SESR) for underwater images. This paper introduces SESR to improve visual perception for underwater robot vision systems, addressing challenges like color degradation, blurriness, and low contrast. The model is built on a residual-in-residual network and operates efficiently, making it suitable for real-time applications.
Problem Definition and Approach
The task tackled by SESR is to enhance the perceptual quality of underwater images while also providing super-resolution at 2×, 3×, or 4× the original resolution. The underwater domain presents unique challenges due to optical properties like attenuation, refraction, and backscatter, which distort colors and reduce visibility. While traditional enhancement or super-resolution methods exist, they typically address these problems separately. The novelty here lies in the unified approach—combining enhancement and super-resolution to simultaneously solve both issues.
SESR Architecture
SESR utilizes a residual-in-residual dense network architecture. This design allows the model to learn hierarchical features across multiple scales efficiently. The model also incorporates a saliency prediction module to focus on important foreground regions in the image, improving global contrast and guiding the enhancement process. The hierarchical nature of the network is key for addressing the chrominance-specific distortions in underwater images.
Feature Extraction and Saliency
The network is composed of Residual Dense Blocks (RDBs) that facilitate multi-scale feature learning. Two branches with different kernel sizes (3×3 and 5×5) are used to extract local features, which are then fused to create a global feature map. The Auxiliary Attention Network (AAN) is introduced to predict a saliency map, which identifies key regions in the image. This saliency map helps the model focus on enhancing these regions, resulting in more accurate restoration of color, sharpness, and contrast.
Training and Objective Functions
Training the SESR model involves a multi-modal objective function. The model is trained using the UFO-120 dataset, a new dataset introduced in the paper with 1500 training samples and 120 test samples. The training process includes the following loss functions:
- Saliency Loss: Measures the accuracy of the predicted saliency map using a cross-entropy-based loss.
- Contrast Loss: Quantifies the recovery of foreground pixel intensity, addressing the lack of contrast caused by the greenish-blue hue typical in underwater images.
- Color Loss: Compares the enhanced image with ground truth, using wavelength-dependent chrominance terms to specifically handle underwater color degradation.
- Sharpness Loss: Evaluates the recovery of sharpness by analyzing image gradients, which helps to mitigate blurriness in underwater images.
- Content Loss: Ensures the model restores high-level features by comparing outputs with the ground truth using the features from a pre-trained VGG-19 network. These loss functions guide the model to simultaneously enhance image quality and increase spatial resolution, even under challenging underwater conditions.
Results and Evaluation
The SESR model was evaluated on several underwater datasets, including UFO-120. The model's performance was measured using standard metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and the Underwater Image Quality Measure (UIQM). The results showed that SESR outperforms state-of-the-art models in both enhancement and super-resolution tasks. Specifically:
SESR achieved higher PSNR and SSIM values than existing methods, indicating better reconstruction quality and structural preservation. The model also excelled in UIQM, demonstrating superior color restoration, contrast enhancement, and sharpness recovery. Moreover, the authors performed ablation studies to analyze the contribution of each loss function. The results revealed that removing the saliency-driven contrast loss led to significantly lower contrast in the enhanced images, confirming the importance of this component in guiding the model’s enhancement process.
Real-Time Feasibility
One of the key advantages of SESR is its computational efficiency. The model is capable of near real-time performance on a single-board computer (e.g., Nvidia AGX Xavier), running at 7.75 frames per second. This efficiency is achieved through design choices such as skip connections and dense feature extraction, which reduce the computational load while maintaining high performance.
The model’s memory footprint is only 10 MB, making it suitable for deployment in embedded systems like underwater robots. These robots can benefit from real-time image enhancement and super-resolution to improve their navigation and operational capabilities in murky underwater environments.
Generalization to Terrestrial Imagery
Although SESR is primarily designed for underwater imagery, it also generalizes well to terrestrial images. The model was evaluated on standard datasets like Set5 and Set14, where it achieved competitive results in terms of PSNR and SSIM. This suggests that the model’s architecture and training process can be applied to a broader range of image enhancement and super-resolution tasks beyond the underwater domain.
Conclusion
The paper presents SESR as a unified solution for the simultaneous enhancement and super-resolution of underwater images. By leveraging a residual-in-residual network with saliency prediction, the model achieves state-of-the-art performance on both tasks, offering real-time capability for deployment in visually guided robots. The introduction of the UFO-120 dataset also sets a new benchmark for underwater image enhancement and super-resolution, facilitating future research in this area.
SESR’s performance on both underwater and terrestrial images makes it a promising tool for a variety of applications, from underwater exploration to general image restoration tasks.