3次元積分イメージングとディープラーニングを用いた劣化環境下でのヒトジェスチャー認識

/ /

日本語AIでPubMedを検索

PubMedの提供する医学論文データベースを日本語で検索できます。AI(Deep Learning)を活用した機械翻訳エンジンにより、精度高く日本語へ翻訳された論文をご参照いただけます。

Opt Express.2020 Jun;28(13):19711-19725. 432792. doi: 10.1364/OE.396339.

3次元積分イメージングとディープラーニングを用いた劣化環境下でのヒトジェスチャー認識

Human gesture recognition under degraded environments using 3D-integral imaging and deep learning.

Gokul Krishnan
Rakesh Joshi
Timothy O'Connor
Filiberto Pla
Bahram Javidi

PMID: 32672242 DOI: 10.1364/OE.396339.

抄録

本論文では，3次元積分イメージングと深層学習を用いた劣化環境下での時空間的な人間ジェスチャー認識アルゴリズムを提案する．提案するアルゴリズムは，積分イメージングと深層学習の利点を活かし，オクルージョンや低照度などの劣化環境下での効率的な人間ジェスチャー認識システムを提供する．積分イメージングを用いて得られた3次元データは、畳み込みニューラルネットワーク（CNN）への入力となります。ニューラルネットワークの畳み込み層とプーリング層で抽出された空間特徴は、双方向長短期記憶（BiLSTM）ネットワークに供給される。BiLSTMネットワークは入力データの時間的変動を捉えるように設計されている。提案アプローチを従来の2Dイメージングと、サポートベクターマシン(STIP-SVMs)と歪み不変の非線形相関ベースフィルタを用いた時空間的な関心点を用いた以前に報告されたアプローチと比較した。我々の実験結果は、提案されたアプローチが、特に劣化した環境下で有望であることを示唆している。提案されたアプローチを用いることで、これまでに発表された手法よりも大幅に改善され、3D積分イメージングが従来の2Dイメージングシステムよりも優れた性能を提供することを発見した。我々の知る限りでは、劣化した環境下での人間の活動認識のために、3D積分イメージングをベースとしたディープラーニングアルゴリズムを検討した最初の報告である。

In this paper, we propose a spatio-temporal human gesture recognition algorithm under degraded conditions using three-dimensional integral imaging and deep learning. The proposed algorithm leverages the advantages of integral imaging with deep learning to provide an efficient human gesture recognition system under degraded environments such as occlusion and low illumination conditions. The 3D data captured using integral imaging serves as the input to a convolutional neural network (CNN). The spatial features extracted by the convolutional and pooling layers of the neural network are fed into a bi-directional long short-term memory (BiLSTM) network. The BiLSTM network is designed to capture the temporal variation in the input data. We have compared the proposed approach with conventional 2D imaging and with the previously reported approaches using spatio-temporal interest points with support vector machines (STIP-SVMs) and distortion invariant non-linear correlation-based filters. Our experimental results suggest that the proposed approach is promising, especially in degraded environments. Using the proposed approach, we find a substantial improvement over previously published methods and find 3D integral imaging to provide superior performance over the conventional 2D imaging system. To the best of our knowledge, this is the first report that examines deep learning algorithms based on 3D integral imaging for human activity recognition in degraded environments.