几篇论文实现代码:《Long-FormSpeechGenerationwi

爱生活爱珂珂 2024-12-27 14:54:15

几篇论文实现代码:

《Long-Form Speech Generation with Spoken Language Models》(2024) GitHub: github.com/google-deepmind/librispeech-long

《DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs》(2024) GitHub: github.com/MengLcool/SliMM

《DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT》(2024) GitHub: github.com/YvanYin/DrivingWorld [fig1]

《WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents》(2024) GitHub: github.com/elated-sawyer/WALL-E [fig2]

《VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks》(2024) GitHub: github.com/OpenMOSS/VLABench

《MINIMA: Modality Invariant Image Matching》(2024) GitHub: github.com/LSXI7/MINIMA [fig3]

《DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery》(2024) GitHub: github.com/DroneSplat/anonymous_code

《DriveMM: All-in-One Large Multimodal Model for Autonomous Driving》(2024) GitHub: github.com/zhijian11/DriveMM [fig4]

《Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction》(2024) GitHub: github.com/CHELSEA234/Dense-Face

《Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models》(2024) GitHub: github.com/KbsdJames/omni-math-rule

《GraphAgent: Agentic Graph Language Assistant》(2024) GitHub: github.com/HKUDS/GraphAgent

《Sound bubbles on hearables》(2024) GitHub: github.com/chentuochao/Sound_Bubble

《ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights》(2024) GitHub: github.com/Gabesarch/ICAL

0 阅读:0
爱生活爱珂珂

爱生活爱珂珂

感谢大家的关注