曾星宇-深圳理工大学人工智能研究院

曾星宇

特聘教授，博士生导师

工作邮箱：zengxingyu@suat-sz.edu.cn

个人简介

博士生导师，深圳理工大学人工智能研究院特聘教授，曾担任商汤科技集团高级算法总监兼多个业务线的算法负责人。研究聚焦于计算机视觉和多模态大模型方向，在各类顶级会议及期刊发表论文30+篇，成果发表于CVPR、ICCV、ECCV和TPAMI等顶级期刊会议，学术引用量4k+，h-index 24。曾多次担任CVPR、ICCV等国际顶级会议及期刊审稿人。曾获得Google PhD Fellowship，作为核心成员获得ImageNet视觉挑战赛多项冠亚军。

研究领域

研究聚焦生成式多模态大模型，包括理解-生成一体化的模型架构、复杂推理能力与3D空间感知能力、可解释性。

主要研究方向

1、早期研究主要聚焦于深度学习驱动的计算机视觉方向：

（1）图像/视频分类与物体检测：探索基于卷积神经网络的高效特征表示、多尺度特征融合，以及目标检测精度与效率的提升方法。

（2）人体姿态估计与动作识别：研究如何通过深度网络对视频中人体姿态、行为类别进行建模，关注时序信息、上下文特征与运动特征的结合。

（3）跨模态任务：探讨视觉与语言模态之间的对齐与融合，例如图像标注与图像-文本检索等。

2、近期研究主要聚焦于生成式多模态大模型，涵盖理解-生成一体化架构、大模型复杂推理与3D感知能力、可解释性：

（1）理解-生成一体化的多模态大模型：构建统一架构，融合感知、理解与生成能力，实现多模态任务间的泛化迁移与协同优化。

（2）大模型的复杂推理与3D空间感知能力：结合强化学习方法提升多模态思维链、精细生成控制与三维空间推理能力。

（3）大模型的可解释性与安全性：研究多模态大模型在生成过程中的可控性、因果推理能力与风险防范机制。

学习工作经历

学习经历

2012.8-2016.7，香港中文大学，电子工程，博士

2007.9-2011.7，中国科学技术大学，电子信息工程，学士

2007.9-2011.7，中国科学技术大学，金融，学士（双学位）

工作经历

2025.7-至今，深圳理工大学，特聘正教授

2016.8-2025.7，商汤科技，高级算法总监兼行业算法负责人

2011.11-2012.7，香港中文大学，研究助理

学术成果

所获荣誉

Google PhD Fellowship

ImageNet视觉挑战大赛多届冠亚军

科技项目及获奖情况

2021.6-2024.6，深圳市河套深港科技创新合作区深圳园区发展署，“面向行业赋能的计算机视觉关键技术研究项目”，资助金额4400万，核心成员

2022.1-2024.12，深圳市科技创新局，“重2021N083面向智能视觉的规模计算开放平台关键技术研发”，资助金额600万，参与

2019.03-2021.3，深圳市科技创新局，“重20180236面向智慧城市的超大规模视觉分析的关键技术研发”，资助金额450万，参与

2020.08-2022.03，深圳市科技创新局，“基于新一代信息技术的新冠肺炎防控及辅助诊疗关键技术研发”，资助金额1000万，参与

代表性论文

[1]Chengqi Duan, Rongyao Fang, Yuqing Wang, Kun Wang, Linjiang Huang, Xingyu Zeng, Hongsheng Li, XihuiLiu.”GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning.”arXiv preprint arXiv:2505.17022 (2025).

[2]Yize Zhang, Tianshu Wang, Sirui Chen, Kun Wang, Xingyu Zeng, Hongyu Lin, Xianpei Han, Le Sun, ChaochaoLu.”ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search.”arXiv preprint arXiv:2504.10893(2025).

[3]Rongyao Fang, Chengqi Duan, Kun Wang, Linjiang Huang, Hao Li, Shilin Yan, Hao Tian, Xingyu Zeng,Rui Zhao, Jifeng Dai, Xihui Liu, Hongsheng Li.”Got: Unleashing reasoning capability of multimodal largelanguage model for visual generation and editing.”arXiv preprint arXiv:2503.10639 (2025).

[4]Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shi Shiwei, Du Qing, Xiaoru Hu,Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao, Xueqian Wang.”TPTU-v2: Boosting Task Planning andTool Usage of Large Language Model-based Agents in Real-world Industry Systems.”Proceedings of the 2024Conference on Empirical Methods in Natural Language Processing: Industry Track. 2024.

[5]Rongyao Fang, Chengqi Duan, Kun Wang, Hao Li, Hao Tian, Xingyu Zeng, Rui Zhao, Jifeng Dai, HongshengLi, Xihui Liu.”Puma: Empowering unified mllm with multi-granular visual generation.”arXiv preprintarXiv:2410.13861 (2024).

[6]Chenyang Zhao, Kun Wang, Xingyu Zeng, Rui Zhao, Antoni B Chan.”Gradient-based Visual Explanation forTransformer-based CLIP.”International Conference on Machine Learning. PMLR, 2024.

[7]Sirui Chen, Mengying Xu, Kun Wang, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Chaochao Lu.”CLEAR:Can Language Models Really Understand Causal Graphs?.”In Findings of the Association for ComputationalLinguistics: EMNLP 2024

[8]Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, YuQiao, Chaochao Lu.”Causal Evaluation of Language Models.”arXiv preprint arXiv:2405.00622 (2024).

[9]Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shiwei Shi, Guoqing Du, Xiaoru Hu,Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao.”Tptu: Task planning and tool usage of large languagemodel-based ai agents.”NeurIPS 2023 Foundation Models for Decision Making Workshop. 2023.

[10]Yuhan Sun, Mukai Li, Yixin Cao, Kun Wang, Wenxiao Wang, Xingyu Zeng, Rui Zhao.”To be or not to be?an exploration of continuously controllable prompt engineering.”arXiv preprint arXiv:2311.09773 (2023).

[11]Zhixuan Liang, Xingyu Zeng, Rui Zhao, Ping Luo.”MeanAP-Guided Reinforced Active Learning for ObjectDetection.”arXiv preprint arXiv:2310.08387 (2023).

[12]Guoqiang Jin, Fan Yang, Mingshan Sun, Ruyi Zhao, Yakun Liu, Wei Li, Tianpeng Bao, Liwei Wu, Xingyu Zeng,Rui Zhao.”SeqCo-DETR: Sequence Consistency Training for Self-supervised Object Detection with Transformers.”BMVC 2023.

[13]Shaobo Lin, Xingyu Zeng, Rui Zhao.”Explore the Power of Dropout on Few-shot Learning.”arXiv preprintarXiv:2301.11015 (2023).

[14]Shaobo Lin, Kun Wang, Xingyu Zeng, Rui Zhao.”Explore the power of synthetic data on few-shot objectdetection.”Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[15]Shaobo Lin, Kun Wang, Xingyu Zeng, Rui Zhao.”An effective crop-paste pipeline for few-shot object detection.”Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[16]Guoqiu Li, Guanxiong Cai, Xingyu Zeng, Rui Zhao.”Scale-aware spatio-temporal relation learning for videoanomaly detection.”European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.

[17]Shaobo Lin, Xingyu Zeng, Rui Zhao.”A Unified Framework with Meta-dropout for Few-shot Learning.”arXivpreprint arXiv:2210.06409 (2022).

[18]Shaobo Lin, Xingyu Zeng, Shilin Yan, Rui Zhao.”Three-stage training pipeline with patch random drop forfew-shot object detection.”Proceedings of the Asian Conference on Computer Vision. 2022.

[19]Yingjie Cai, Buyu Li, Zeyu Jiao, Hongsheng Li, Xingyu Zeng, Xiaogang Wang.”Monocular 3d object detectionwith decoupled structured polygon estimation and height-guided depth estimation.”Proceedings of theAAAI Conference on Artificial Intelligence. Vol. 34. No. 07. 2020.

[20]Xinzhu Ma, Shinan Liu, Zhiyi Xia, Hongwen Zhang, Xingyu Zeng, Wanli Ouyang.”Rethinking pseudo-lidarrepresentation.”Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28,2020, Proceedings, Part XIII 16. Springer International Publishing, 2020.

[21]Peng Su, Kun Wang, Xingyu Zeng, Shixiang Tang, Dapeng Chen, Di Qiu, Xiaogang Wang.”Adapting objectdetectors with conditional domain normalization.”Computer Vision–ECCV 2020: 16th European Conference,Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer International Publishing, 2020.

[22]Buyu Li, Wanli Ouyang, Lu Sheng, Xingyu Zeng, Xiaogang Wang.”Gs3d: An efficient 3d object detectionframework for autonomous driving.”Proceedings of the IEEE/CVF conference on computer vision and patternrecognition. 2019.

[23]Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, BinYang, Zhe Wang, Hui Zhou, Xiaogang Wang.”Crafting gbd-net for object detection.”IEEE transactions onpattern analysis and machine intelligence 40.9 (2017): 2109-2123.

[24]Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, RuohuiWang, Xiaogang Wang, Wanli Ouyang.”T-cnn: Tubelets with convolutional neural networks for object detectionfrom videos.”IEEE Transactions on Circuits and Systems for Video Technology 28.10 (2017): 2896-2907.

[25]Jingwei Guan, Shuai Yi, Xingyu Zeng, Wai-Kuen Cham, Xiaogang Wang.”Visual importance and distortionguided deep image quality assessment framework.”IEEE Transactions on Multimedia 19.11 (2017): 2505-2520.

[26]Wanli Ouyang, Xingyu Zeng, Xiaogang Wang.”Learning mutual visibility relationship for pedestrian detectionwith a deep model.”International Journal of Computer Vision 120 (2016): 14-27.

[27]Xingyu Zeng, Wanli Ouyang, Bin Yang, Junjie Yan, Xiaogang Wang.”Gated bi-directional cnn for objectdetection.”Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October11–14, 2016, Proceedings, Part VII 14. Springer International Publishing, 2016.

[28]Xingyu Zeng, Wanli Ouyang, Xiaogang Wang.”Window-object relationship guided representation learningfor generic object detections.”arXiv preprint arXiv:1512.02736 (2015).

[29]Wanli Ouyang, Xingyu Zeng, Xiaogang Wang.”Partial occlusion handling in pedestrian detection with a deepmodel.”IEEE Transactions on Circuits and Systems for Video Technology 26.11 (2015): 2123-2137.

[30]Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang.”Learning deep representation with large-scaleattributes.”Proceedings of the IEEE International Conference on Computer Vision. 2015.

[31]Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong Tian, Hongsheng Li, Shuo Yang,Zhe Wang, Chen-Change Loy, Xiaoou Tang.”Deepid-net: Deformable deep convolutional neural networksfor object detection.”Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

[32]Wanli Ouyang, Xingyu Zeng, Xiaogang Wang.”Single-pedestrian detection aided by two-pedestrian detection.”IEEE transactions on pattern analysis and machine intelligence 37.9 (2014): 1875-1889.

[33]Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang,Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang.”Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection.”arXivpreprint arXiv:1409.3505 (2014).

[34]Xingyu Zeng, Wanli Ouyang, Meng Wang, Xiaogang Wang.”Deep learning of scene-specific classifier forpedestrian detection.”Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September6-12, 2014, Proceedings, Part III 13. Springer International Publishing, 2014.

[35]Xingyu Zeng, Wanli Ouyang, Xiaogang Wang.”Multi-stage contextual deep learning for pedestrian detection.”Proceedings of the IEEE International Conference on Computer Vision. 2013.

[36]Wanli Ouyang, Xingyu Zeng, Xiaogang Wang.”Modeling mutual visibility relationship in pedestrian detection.”Proceedings of the IEEE conference on computer vision and pattern recognition. 2013.

已获得/审查中的各类专利上百个，部分专利如下：

1.Vehicle lamp detection methods and apparatuses, methods and apparatuses for implementing intelligent driving, media and devices，USPatent，授权号US10984266B2

2.Object three-dimensional detection method and apparatus, intelligent driving control method and apparatus, medium and device，USPatent，授权号US11100310B2

3.Three-dimensional object detection method and device, method and device for controlling smart driving, medium and apparatus，USPatent，授权号US11138756B2

4.Forward collision control method and apparatus, electronic device, program, and medium，USPatent，授权号US11643076B2

5.Method for predicting direction of movement of target object, vehicle control method, and device，USPatent，授权号US11710243B2

首页

关于我们

新闻中心

研究中心

科学研究

人才招聘

联系我们