Multi-modal AI agent mimics human thinking for long video analysis and reasoning
While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.
from Tech Xplore - electronic gadgets, technology advances and research news https://ift.tt/sKFnIhr
from Tech Xplore - electronic gadgets, technology advances and research news https://ift.tt/sKFnIhr
Comments
Post a Comment