| [1] Newell A, Simon H A. The logic theory machine: A complex information processing system[J]. IRE transactions on information theory, 1956, 2(3): 61-79.
[2] McCulloch W S, Pitts W H. A logical calculus of the ideas immanent in nervous activity[J]. Bulletin of mathematical biophysics, 1943, 5(4): 115-133.
[3] Hebb D O. The organization of behavior: A neuropsychological theory[M]. John Wiley & Sons, 1949.
[4] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets [J]. Neural computation, 2006, 18 (7): 1527-1554.
[5] Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database [C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009: 248-255.
[6] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[7] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context [C]. European Conference on Computer Vision (ECCV), 2014: 740-755.
[8] Krasin I, Duerig T, Alldrin N, et al. Open images: A public dataset for large-scale multi-label and multi-class image classification [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops(CVPR), 2017: 1246-1254.
[9] Nickolls J, Dally W J. Scalable parallel programming with CUDA [C]. International Symposium on Computer Architecture (ISCA), 2008: 103-114.
[10] Chetlur S, Woolley C, Vandermersch P, et al. cuDNN: Efficient primitives for deep learning [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2014: 643-650.
[11] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [J]. Advances in neural information processing systems, 2017, 30.
[12] Radford A, Narasimhan K, Salimans T, Sutskever I. Improving Language Understanding by Generative Pre-Training [R]. OpenAI, 2018.
[13] Radford A, Wu J, Child R, et al. Language Models are Unsupervised Multitask Learners [R]. OpenAI, 2019.
[14] Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [C]. Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019, 1: 4171-4186.
[15] Brown T B, Mann B, Ryder N, et al. Language Models are Few-Shot Learners [C]. Advances in Neural Information Processing Systems (NeurIPS), 2020: 1877-1901.
[16] Carion N, Massa F, Synnaeve G, et al. End-to-End Object Detection with Transformers [C]. European Conference on Computer Vision (ECCV), 2020: 1929-1945.
[17] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [C]. International Conference on Learning Representations (ICLR), 2021.
[18] Christiano P F, Leike J, Brown T B, et al.Deep Reinforcement Learning from Human Preferences [R]. arXiv preprint arXiv:1706.03741, 2017.
[19] Srivastava A, Alon U, Shazeer N, et al. Scaling Laws for Reward Model Overoptimization [C]. International Conference on Learning Representations (ICLR), 2023.
[20] OpenAI. GPT-4 Technical Report [R]. OpenAI, 2023.
[21] Jacobs R A, Jordan M I, Nowlan S J, Hinton G E. Adaptive mixtures of local experts [J]. Neural computation, 1991, 3(1): 79-87.
[22] OpenAI. Introducing GPT-5 for developers [EB/OL]. OpenAI, 2025.
[23] Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. QLoRA: Efficient Finetuning of Quantized LLMs [R]. arXiv preprint arXiv:2305.14314, 2023.
[24] Hu E, Shen Y, Wallis P, et al. LoRA: Low-Rank Adaptation of Large Language Models [C]. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2021: 3045-3055.
[25] Li Y, Liu S, Zhang Y, et al. SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models [R]. arXiv preprint arXiv:2411.05007, 2024.
[26] Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models [C]. Advances in Neural Information Processing Systems (NeurIPS), 2022: 24824-24837.
[27] Kojima T, Gu S S, Reid M, et al. Large language models are zero-shot reasoners [C]. Advances in Neural Information Processing Systems (NeurIPS), 2022: 22194-22207.
[28] Yao S, Yu D, Zhao J, et al. Tree of thoughts: Deliberate problem solving with large language models [C]. Advances in Neural Information Processing Systems (NeurIPS), 2023: 1-13.
[29] Madaan A, Tandon N, Gupta P, et al. Self - Refine: Iterative Refinement with Self – Feedback. arXiv preprint arXiv: 2303.17651,2023.
[30] Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision [C]. Advances in Neural Information Processing Systems (NeurIPS), 2021: 1-13.
[31] Huang Y, Zhang Y, Wang X, et al. Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution [R]. arXiv preprint arXiv:2410.03142, 2024.
[32] Hoffmann J, Shah S, Driess D, et al. PaLM-E: An Embodied Multimodal Language Model [C]. International Conference on Machine Learning (ICML), 2023: 12920-12942.
[33] Prakash A, Yu T, Zeng A, et al. Open X-Embodiment: Robotic Learning Datasets and RT-X Models [C]. IEEE International Conference on Robotics and Automation (ICRA), 2024: 6892-6903. |