2 results for “topic:multi-modal-llm”
[CVPR 2025] UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Python tool for capturing and logging human-computer interactions. Generate rich datasets for training multi-modal LLMs in autonomous computer control. Features screenshot, mouse, keyboard, and audio recording.