Can we convince AI to answer harmful requests?

New research from EPFL demonstrates that even the most recent large language models (LLMs), despite undergoing safety training, remain vulnerable to simple input manipulations that can cause them to behave in unintended or harmful ways.

from Tech Xplore - electronic gadgets, technology advances and research news https://ift.tt/4a9L0oP

Comments

Popular posts from this blog