[1]Agrawal A, Lu J S, Antol S, et al. VQA: visual question answering[J]. International Journal of Computer Vision, 2017, 123(1): 431.
[2]闫悦, 郭晓然, 王铁君, 等. 问答系统研究综述[J/OL]. 计算机系统应用. (20230612)[20230615]. https:doi.org/1015888/j.cnki.csa.009208.
[3]王源顺, 段迅, 吴云. 一种新的seq2seq的可控图像字幕的生成方法[J]. 计算机应用研究, 2021, 38(11): 35103516.
[4]陈巧红, 孙佳锦, 孙麒, 等. 基于多层跨模态注意力融合的图文情感分析[J]. 浙江理工大学学报(自然科学版), 2022, 47(1): 8594.
[5]Le T, Nguyen H T, Le Nguyen M. Multi visual and textual embedding on visual question answering for blind people[J]. Neurocomputing, 2021, 465: 451464.
[6]Liu B, Zhan L M, Xu L, et al. Medical visual question answering via conditional reasoning and contrastive learning[J]. IEEE Transactions on Medical Imaging, 2023, 42(5): 15321545.
[7]Fukui A, Park D H, Yang D, et al. Multimodal compact bilinear pooling for visual question answering and visual grounding[C]Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016: 457468.
[8]BenYounes H, Cadene R, Thome N, et al. BLOCK: bilinear superdiagonal fusion for visual question answering and visual relationship detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 81028109.
[9]Lao M R, Guo Y M, Pu N, et al. Multistage hybrid embedding fusion network for visual question answering[J]. Neurocomputing, 2021,423: 541550.
[10]Chen K, Wang J, Chen L C, et al. ABCCNN: An attention based convolutional neural network for visual question answering[EB/OL]. (20160403)[20230615]. https:arxiv.org/abs/151105960.
[1]陈巧红,漏杨波,孙麒,等.基于多模态门控自注意力机制的视觉问答模型[J].浙江理工大学学报,2022,47-48(自科三):413.
CHEN Qiaohong,LOU Yangbo,SUN Qi,et al.Visual question answering model based on multimodal gate selfattention mechanism[J].Journal of Zhejiang Sci-Tech University,2022,47-48(自科六):413.