Author: HOU Xinjiang |
Researchers from the Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, have developed a novel autofocus method that harnesses the power of deep learning to dynamically select regions of interest (ROI) in grayscale images.
Their findings, published in the journal Sensors, offer a new solution to the challenges faced by traditional autofocus methods. This groundbreaking technology addresses the limitations of these traditional approaches, paving the way for more precise and efficient image focusing in various applications.
Autofocus technology is crucial in civilian fields, especially where rapid and accurate target capture is essential. Traditional autofocus methods can be divided into active and passive categories. Active focusing relies on external sensors, increasing costs and complexity. In contrast, passive focusing assesses image quality to control focus, but fixed focusing windows and evaluation functions often lead to focusing failures, especially in complex scenes.
Moreover, the lack of comprehensive datasets has hindered the widespread adoption of deep learning methods in autofocus. Traditional image-based autofocus solutions suffer from issues like misjudging light spots and focal breathing, where changes in camera zoom and light intensity during focusing can affect image sharpness evaluation.
To overcome these challenges, the research team embarked on a three-step approach. First, they constructed a comprehensive dataset of grayscale image sequences with continuous focusing adjustments, capturing diverse scenes from simple to complex and at varying focal lengths. This dataset serves as a valuable resource for training and evaluating autofocus algorithms.
Next, they transformed the autofocus problem into an ordinal regression task, proposing two focusing strategies: full-stack search and single-frame prediction. These strategies enable the network to adaptively focus on salient regions within the frame, eliminating the need for pre-selected focusing windows.
Finally, the team designed a MobileViT network equipped with a linear self-attention mechanism. This lightweight yet powerful network achieves dynamic autofocus with minimal computational cost, ensuring fast and accurate focusing.
Experimental results demonstrate the effectiveness of the proposed method. The full-stack search strategy achieved a mean absolute error (MAE) of 0.094 with a focusing time of 27.8 milliseconds, while the single-frame prediction strategy achieved an MAE of 0.142 in just 27.5 milliseconds. These results underscore the superior performance of the deep learning-based autofocus method, especially in complex scenes where traditional approaches struggle.
This research presents an advancement in autofocus technology, offering several key benefits. By dynamically selecting regions of interest, the method ensures that the most important features within the scene are accurately focused. This is particularly important in applications such as surveillance, photography, and machine vision, where capturing crisp images is crucial.
Moreover, the lightweight MobileViT network with its linear self-attention mechanism makes the method computationally efficient, enabling real-time autofocus even on resource-constrained devices. This opens up new possibilities for embedded systems and mobile devices that require fast and accurate focusing capabilities.
The successful development of this deep learning-based autofocus method underscores the potential of AI in enhancing traditional imaging technologies. Future research could explore the application of this method to color images and video sequences, further broadening its impact. Additionally, optimizing the network architecture and focusing strategies could lead to even faster and more accurate focusing, pushing the boundaries of autofocus technology.
WU Chuan
Changchun lnstitute of Optics, Fine Mechanics and Physics
E-mail: wuchuan@ciomp.ac.cn