溫室番茄場(chǎng)景單目RGB模態(tài)向深度模態(tài)轉(zhuǎn)換模型研究

doi:10.6041/j.issn.1000-1298.2025.06.046

首頁(yè) > 過(guò)刊瀏覽>2025年第56卷第6期 >499-508，574. DOI:10.6041/j.issn.1000-1298.2025.06.046

溫室番茄場(chǎng)景單目RGB模態(tài)向深度模態(tài)轉(zhuǎn)換模型研究
DOI:
                        10.6041/j.issn.1000-1298.2025.06.046
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者單位:
作者簡(jiǎn)介:
通訊作者:
中圖分類(lèi)號(hào):
基金項(xiàng)目:國(guó)家重點(diǎn)研發(fā)計(jì)劃項(xiàng)目（2022YFD2002303-01）和遼寧省教育廳基本科研項(xiàng)目面上項(xiàng)目（JYTM20231303）

Monocular RGB to Depth Conversion Model for Greenhouse Tomato Scene

Author:

Affiliation:

Fund Project:

摘要

圖/表

訪問(wèn)統(tǒng)計(jì)

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

資源附件

文章評(píng)論

摘要:

在溫室場(chǎng)景下，針對(duì)番茄的表型解析、自主采摘、多模態(tài)聯(lián)合分割等任務(wù)，快速、高精度、低成本地獲取場(chǎng)景深度信息對(duì)農(nóng)機(jī)視覺(jué)系統(tǒng)至關(guān)重要。本研究提出了一種嵌入注意力機(jī)制的RGB模態(tài)向深度模態(tài)轉(zhuǎn)換的單目深度估計(jì)網(wǎng)絡(luò)(RGB to depth conversion network，RDCN)，以解決傳統(tǒng)算法無(wú)法充分挖掘編碼器的特征提取能力、深度估計(jì)精度低以及邊界模糊問(wèn)題。首先以ResNext101替換原來(lái)的ResNet101骨干網(wǎng)絡(luò)，提取各個(gè)不同層級(jí)的特征圖并將其融合到拉普拉斯金字塔分支，強(qiáng)調(diào)特征的尺度差異性并強(qiáng)化特征融合的深入與廣泛性;同時(shí)為了增強(qiáng)模型獲取全局信息以及上下文信息交互的能力，引入了置換注意力模塊(Shuffle attention module, SAM)，以減少下采樣過(guò)程造成的局部細(xì)節(jié)信息丟失;其次，為了改善預(yù)測(cè)深度圖的邊界模糊問(wèn)題，嵌入深度細(xì)化模塊(Depth refinement module, DRM)，感知預(yù)測(cè)特征圖物體附近的深度變化;實(shí)現(xiàn)了溫室場(chǎng)景下番茄植株圖像深度信息的精準(zhǔn)預(yù)測(cè)。試驗(yàn)結(jié)果表明，RDCN在測(cè)試集上的平均相對(duì)誤差、均方根誤差、對(duì)數(shù)均方根誤差、對(duì)數(shù)平均誤差相比于基準(zhǔn)模型分別降低了20.5%、10.3%、8.3%、21.8%，在1.25、1.252、1.253閾值下的準(zhǔn)確率分別提高3.2%、1.2%和1.0%;并且網(wǎng)絡(luò)生成的深度圖像視覺(jué)上全局完整清晰且有較多的紋理細(xì)節(jié);研究表明，RDCN在溫室場(chǎng)景下能夠基于RGB信息獲得高質(zhì)量的深度信息，可為基于單目傳感器的溫室場(chǎng)景農(nóng)機(jī)導(dǎo)航以及深度圖像在多模態(tài)任務(wù)中的應(yīng)用提供技術(shù)支持。

Abstract:

In greenhouse environments, fast, high-precision, and low-cost acquisition of scene depth information is crucial for agricultural machine vision systems in tasks such as tomato phenotype analysis, autonomous harvesting and multimodal joint segmentation. An attention-embedded RGB-to-depth conversion network (RGB to depth conversion network，RDCN) for monocular depth estimation was proposed, addressing issues in traditional algorithms such as insufficient feature extraction capability of encoders, low depth estimation accuracy, and blurred boundaries. Firstly, ResNext101 was employed to replace the original ResNet101 backbone network, extracting feature maps from different levels and integrating them into the Laplacian pyramid branches. This approach emphasized the scale differences of features and enhances the depth and breadth of feature fusion. To enhance the models capacity for capturing global information and contextual interactions, a shuffle attention module (SAM) was introduced. This module also helped minimize the loss of local detail information caused by the down-sampling process. This module also mitigated the loss of local detail information caused by the downsampling process. Secondly, to address the issue of blurred boundaries in the predicted depth maps, a depth refinement module (DRM) was embedded to capture depth variations near object edges in the predicted feature maps. For the study, an RGBD image acquisition platform for tomatoes was constructed in a daylight greenhouse environment using an Azure Kinect DK depth camera. To ensure diversity in the dataset, images were collected at different times of the day based on varying light intensities in the greenhouse environment. The training set was then augmented by using three methods: horizontal mirroring, random rotation, and color jittering, resulting in a total of 8515 aligned RGBD image sets of tomatoes. Experimental results indicated that by introducing the shuffle attention module and the depth refinement module, the model achieved accurate depth information prediction in greenhouse scenes. Compared with the baseline model, the visualized depth maps generated by the network demonstrated global completeness and clarity, with more texture details, especially in regions with complex geometries and significant depth variations, exhibiting superior visual effects. Experimental results showed that, compared with the baseline model, RDCN reduced the mean relative error, root mean square error, log root mean square error, and log mean error on the test set by 20.5%, 10.3%, 8.3%, and 21.8%, respectively. Additionally, accuracy under the 1.25, 1.252, and 1.253 thresholds was improved by 3.2%, 1.2%, and 1.0%, respectively. Moreover, the depth images generated by the network were visually complete and clear, with abundant texture details. Studies showed that RDCN can obtain highquality depth information from RGB data in greenhouse environments, providing technical support for agricultural machine navigation in greenhouse scenarios using monocular sensors, as well as for the application of depth images in multi-modal tasks.

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

引用本文

高旺,鄧寒冰,邢志鴻,朱彥強(qiáng).溫室番茄場(chǎng)景單目RGB模態(tài)向深度模態(tài)轉(zhuǎn)換模型研究[J].農(nóng)業(yè)機(jī)械學(xué)報(bào),2025,56(6):499-508，574. GAO Wang, DENG Hanbing, XING Zhihong, ZHU Yanqiang. Monocular RGB to Depth Conversion Model for Greenhouse Tomato Scene[J]. Transactions of the Chinese Society for Agricultural Machinery,2025,56(6):499-508，574.

復(fù)制

文章指標(biāo)

點(diǎn)擊次數(shù):
下載次數(shù):
HTML閱讀次數(shù):
引用次數(shù):

歷史

收稿日期:2024-09-22
最后修改日期:
錄用日期:
在線發(fā)布日期: 2025-06-10
出版日期:

期刊瀏覽

EI收錄結(jié)果

引用本文

相關(guān)視頻

分享

文章指標(biāo)

歷史

文章二維碼