Skip to main content

Toxicity Filter Improvements

Related products:Agentic Reasoning Engine
  • April 30, 2026
  • 1 reply
  • 31 views

Ajay Merchia
Forum|alt.badge.img+3

We've made some updates to the toxicity filter!

You can now choose which of five fixed content categories your Moveworks AI Assistant will engage on. The old "Specific Topic Toxicity" model with free-form custom topic exceptions has been replaced.
The five categories — all blocked by default:

  • Violent— physical violence
  • Sexual Content or Sexual Acts— sexually explicit content
  • Suicide & Self-Harm— mental health and self-harm topics
  • Unethical Acts— content promoting unethical behavior
  • Jailbreak— attempts to bypass the Assistant's safety instructions


Opt-in to only the categories your organization wants the Assistant to engage on. Uncheck at any time to return a category to being blocked. Selections apply to every future interaction.

A common reason to allow a category: Allowing Suicide & Self-Harm lets employees asking for mental health support get routed to your EAP knowledge instead of receiving a generic decline.

Where to configure Chat Platforms → Display Configurations → Moveworks AI Assistant Display Settings & Disclaimers → Safety Guard: Allowed Content Categories
 

 To verify your settings are working                       

  After saving, test directly in your Moveworks AI Assistant:                                                                                                                          

  - Send a message that falls into a blocked category — the Assistant should decline and return a refusal message.

  - Send a message that falls into an allowed category — the Assistant should respond normally.    

Note: Changes may take a few minutes to take effect after saving.


A few things to know

  • This is the Moveworks-specific safety check. Underlying OpenAI/Azure toxicity protections still apply and aren't configurable here.
  • Align with HR, Legal, and Security before allowing any category — selections apply to every user in your org.



Read how to configure the Toxicity filter here.

1 reply

ajohanson
Forum|alt.badge.img+3
  • Employee
  • May 1, 2026

@gsachdev FYI