[ 1 ] RameshA, Dhariwal P, Nichol A, et al. Hierarchical text - conditional image generation with CLIP latents. (2022 - 04 - 13) [ 2023 - 03 - 06 ] . [ 2 ] JianY N, Yu F X, Singh S, et al. Stable diffusion for aerial object detection. (2023 - 11 - 21) [ 2023 - 11 - 30 ] . [ 3 ] KuangZ Y, Zhang J X, Huang Y Y, et al. Advancing urban renewal: an automated approach to generating historical arcade facades with stable diffusion models. (2023 - 11 - 20) [ 2023 - 11 - 30 ] . [ 4 ] ChangD, Shi Y, Gao Q, et al. MagicDance: Realistic human dance video generation with motions & facial expressions transfer. (2023 - 11 - 18) [ 2023 - 11 - 30 ] . [ 5 ] LuoH S, Ji L, Zhong M, et al. CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning [ J ] . Neurocomputing, 2022, 508(C): 293 - 304. [ 6 ] RonnebergerO, Fischer P, Brox T. U - Net: Convolutional networks for biomedical image segmentation [ C ]∥ International Conference on Medical Image Computing and Computer - Assisted Intervention. Cham: Springer, 2015: 234 - 241. [ 7 ] BorjiA. Generated faces in the wild: Quantitative comparison of stable diffusion, midjourney and dall - e 2. (2023 - 6 - 5) [ 2023 - 11 - 30 ] . [ 8 ] GalR, Alaluf Y, Atzmon Y, et al. An image is worth one word: Personalizing text - to - image generation using textual inversion . (2023 - 8 - 2) [ 2023 - 11 - 30 ] . [ 9 ] RuizN, Li Y Z, Jampani V, et al. DreamBooth: Fine tuning text - to - image diffusion models for subject - driven generation [ C ]∥ 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, BC, Canada. IEEE, 2023: 22500 - 22510. [ 10 ] GallegoV. Personalizing text - to - image generation via aesthetic gradients. (2023 - 9 - 25) [ 2023 - 11 - 30 ] .