Skip to content

HorizonRobotics/HoloAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

33 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HoloAgent Logo

Project πŸ“„ arXiv πŸ€— Dataset Docker

HoloAgent is a unified embodied-agent framework for general-purpose robots, integrating closed-loop execution, 3D spatial memory, and robot skills for real-world tasks.

πŸ”₯ News

  • [2026.06] HoloAgent-0 is released. Code is under preparation and will be released soon.
  • [2025.09] FSR-VLN is released for fast-and-slow vision-language navigation.

βœ… Release Status

  • HoloAgent-0 code update
  • HoloAgent-0 project page & paper
  • FSR-VLN code

🧩 Components

  • Embodied AgentOS: Coordinates high-level task planning, runtime feedback, and closed-loop robot execution.
  • 3D Spatial Memory: Grounds robot reasoning in physical-world spatial representations for long-horizon tasks.
  • Embodied Skills: Connects agent decisions to executable robot navigation and manipulation skills.
  • FSR-VLN: Provides fast-and-slow vision-language navigation with a hierarchical multi-modal scene graph.

🧠 Framework for Closed-Loop Robot Execution

AgentOS turns language instructions into monitored skill graphs and closes the loop across spatial retrieval, execution, memory updates, and recovery.

Overview of the HoloAgent-0 framework

πŸ€– Real-Robot Demonstrations

Compressed previews from real-hardware deployments. Full-resolution videos are available on the project page.

Navigation and Dance Coordination Long-Horizon Mobile Manipulation
Navigation and dance coordination Long-horizon mobile manipulation
Coordinate navigation and humanoid motion across robots. Decompose long-horizon manipulation into navigation, grasping, placement, and recovery.
Active Exploration in a New Environment Interactive Humanoid Command Execution
Active exploration in a new environment Interactive humanoid command execution
Explore new spaces and update 3D memory online. Follow open-ended commands with navigation and embodied actions.
A Day with a Robot Companion A Day in the Life of a Robot Guide
A day with a robot companion A day in the life of a robot guide
Combine language, 3D reasoning, navigation, interaction, and action. Guide users through workspaces with spatial-memory-aware routes.

πŸ€– FSR-VLN

Project πŸ“„ arXiv 中文介绍

FSR-VLN is the HoloAgent navigation component, combining a Hierarchical Multi-modal Scene Graph with Fast-to-Slow Navigation Reasoning for efficient long-range spatial reasoning.

Overall Framework

πŸ— Getting Started

The current repository includes FSR-VLN and navigation-agent setup. HoloAgent-0 code will be added in a future release.

1. Semantic Mapping and Retrieval Pipeline

  • Task: Implement the semantic mapping and retrieval system based on the instructions in fsr_vln/README.md.
  • Steps:
    1. Download the necessary pre-trained model checkpoints.
    2. Download and configure the required datasets.
    3. Set up the environment and dependencies as specified.
    4. Run the complete pipeline to verify its functionality for semantic mapping and visual place retrieval.

2. Navigation Agent Setup and Execution

  • Task: Set up and test the navigation agent according to nav_agent/README.md.
  • Steps:
    1. Install all required dependencies for the navigation environment.
    2. Configure the necessary parameters and environment settings.
    3. Execute the navigation agent to ensure it runs successfully and performs its intended tasks.

πŸ“š Publications & Citation

If you find our project useful, please consider citing it:

@misc{holoagent2026holoagent0,
      title={HoloAgent-0: A Unified Embodied Agent Framework with 3D Spatial Memory},
      year={2026},
      eprint={2606.23565},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2606.23565},
}
@misc{zhou2025fsrvlnfastslowreasoning,
      title={FSR-VLN: Fast and Slow Reasoning for Vision-Language Navigation with Hierarchical Multi-modal Scene Graph}, 
      author={Xiaolin Zhou and Tingyang Xiao and Liu Liu and Yucheng Wang and Maiyue Chen and Xinrui Meng and Xinjie Wang and Wei Feng and Wei Sui and Zhizhong Su},
      year={2025},
      eprint={2509.13733},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2509.13733}, 
}

πŸ™ Acknowledgements

This project is built upon and inspired by several outstanding open source projects: OVO、HOV-SG、rerun、dimos、openclaw.


βš–οΈ License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

About

A unified, agentic system for general-purpose robots, enabling multi-modal perception, mapping and localization, and autonomous mobility and manipulation, with intelligent interaction with users.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors