Workflow for completing natural-language request with metric-semantic representation of environment
DOI:
https://doi.org/10.54939/1859-1043.j.mst.102.2025.12-22Keywords:
Natural-language request; Path planning; Task planning; Metric-semantic map; 3D scene graph.Abstract
In mobile robotics and autonomous systems, a natural-language request can be completed by converting it into high-level and low-level tasks. To accomplish such a request, both these types of tasks must be implemented, along with an efficient method to bridge them. However, this problem is still open. This work presents a two-phase workflow (figure 1), including Comprehension and Implementation, based on a metric-semantic map to address this problem. In the Comprehension phase, also known as automated planning, the natural language request is converted into actionable plans using semantic information from the map. These plans are then passed to the Implementation phase, where tasks like navigation or manipulation are executed utilizing geometric information from the map. Moreover, we also conduct an experiment to illustrate how a natural-language request is implemented on a specific metric-semantic presentation of the environment, namely a 3D Scene Graph, with the following complete sequence: from creating the 3D Scene graph until getting the feasible output path. In addition, this work highlights limitations that need to be addressed in the future to enhance the proposed workflow.
References
[1]. Cadena C., Carlone L., Carrillo H., et al. “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age,” IEEE Transactions on Robotics, 32(6), 1309–1332, (2016).
[2]. Khurana D., Koli A., Khatter K., et al. “Natural language processing: state of the art, current trends and challenges,” Multimedia Tools and Applications, 82(3), 3713–3744, (2022).
[3]. Torfi A., Shirvani R.A., Keneshloo Y., et al. “Natural Language Processing Advancements By Deep Learning: A Survey”, (2021). https://arxiv.org/abs/2003.01200
[4]. Jin K. and Zhuo H.H. “Integrating AI Planning with Natural Language Processing: A Combination of Explicit and Tacit Knowledge”, (2023). https://arxiv.org/abs/2202.07138
[5]. Aeronautiques C., Howe A., Knoblock C., et al. “Pddltextbar the planning domain definition language,” Technical Report, Tech Rep, (1998).
[6]. Ghallab M., Nau D., and Traverso P., “Automated planning and acting,” Cambridge University Press, (2016).
[7]. Helmert M. “The Fast Downward Planning System,” Journal of Artificial Intelligence Research, 26, 191–246, (2006).
[8]. Rosinol A., Abate M., Chang Y., et al. “Kimera: an Open-Source Library for Real-Time Metric-Semantic Localization and Mapping”, (2020).
[9]. Grinvald M., Furrer F., Novkovic T., et al. “Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery,” IEEE Robotics and Automation Letters, 4(3), 3037–3044, (2019).
[10]. Davison A.J. “FutureMapping: The Computational Structure of Spatial AI Systems”, (2018).
[11]. Armeni I., He Z.-Y., Gwak J., et al. “3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera”, (2019). https://github.com/StanfordVL/3DSceneGraph
[12]. Rosinol A., Gupta A., Abate M., et al. “3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans”, (2020). https://arxiv.org/abs/2002.06289
[13]. Rosinol A., Violette A., Abate M., et al. “Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs”, (2021). https://arxiv.org/abs/2101.06894
[14]. Hughes N., Chang Y., and Carlone L. “Hydra: A Real-time Spatial Perception System for 3D Scene Graph Construction and Optimization,” Robotics: Science and Systems (RSS), (2022).
[15]. Steven M. LaValle, “Planning algorithms,” Cambridge University Press, (2006).
[16]. Manning C., Surdeanu M., Bauer J., et al. “The Stanford CoreNLP Natural Language Processing Toolkit,” Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, Maryland, Association for Computational Linguistics, 55–60, 55–60, (2014).
[17]. Hoffmann J. “FF The Fast‐Forward Planning System,” AI Mag, 22(3), 57–62, (2001).
[18]. Hart P.E., Nilsson N.J., and Raphael B. “A Formal Basis for the Heuristic Determination of Minimum Cost Paths,” IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107, (1968).