[1]AndersonP,HeX,BuehlerC,etal.Bottom-upandtop-downattentionforimagecaptioningandvisualquestionanswering[C]//2018IEEE/CVFConferenceonComputerVisionandPatternRecognition.SaltLakeCity:IEEE,2018:6077-6086.
[2]CorniaM,StefaniniM,BaraldiL,etal.Meshed-memorytransformerforimagecaptioning[C]//2020IEEE/CVFConferenceonComputerVisionandPatternRecognition(CVPR).Seattle:IEEE,2020:10575-10584.
[3]闫茹玉,刘学亮.结合自底向上注意力机制和记忆网络的视觉问答模型[J].中国图象图形学报,2020,25(5):993-1006.
[4]GaoP,JiangZ,YouH,etal.Dynamicfusionwithintra-andinter-modalityattentionflowforvisualquestionanswering[C]//2019IEEE/CVFConferenceonComputerVisionandPatternRecognition(CVPR).LongBeach:IEEE,2019:6632-6641.
[5]PengG,YouHX,ZhangZP,etal.Multi-modalitylatentinteractionnetworkforvisualquestionanswering[C]//2019IEEE/CVFInternationalConferenceonComputerVision(ICCV).Seoul,Korea(South):IEEE,2019:5824-5834.
[6]XieN,LaiF,DoranD,etal.Visualentailment:Anoveltaskforfine-grainedimageunderstanding.(2019-01-20)[2021-11-09].
[7]GurariD,LiQ,StanglAJ,etal.VizWizgrandchallenge:Answeringvisualquestionsfromblindpeople[C]//2018IEEE/CVFProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition.SaltLakeCity:IEEE,2018:3608-3617.
[8]RenFJ,ZhouYY.CGMVQA:Anewclassificationandgenerativemodelformedicalvisualquestionanswering[J].IEEEAccess,2020,8:50626-50636.
[9]ChenK,WangJ,ChenLC,etal.Abc-cnn:Anattentionbasedconvolutionalneuralnetworkforvisualquestionanswering.(2016-04-03)[2021-11-09].
[10]LuJS,YangJW,BatraD,etal.Hierarchicalquestion-imageco-attentionforvisualquestionanswering[C]//Proceedingsofthe30thInternationalConferenceonNeuralInformationProcessingSystems.Barcelona:MIT,2016,29:289-297.
[1]陈巧红,漏杨波,方贤.基于空间关系聚合与全局特征注入的视觉问答模型[J].浙江理工大学学报,2023,49-50(自科六):764.
CHEN Qiaohong,LOU Yangbo,FANG Xian.A visual question answering model based on spatial relationship aggregation and global feature injection[J].Journal of Zhejiang Sci-Tech University,2023,49-50(自科三):764.