Toxicity Filter Improvements

Related products:Agentic Reasoning Engine

Forum|Forum|3 days ago
April 30, 2026
1 reply
31 views

+3

Ajay Merchia
Community Manager

We've made some updates to the toxicity filter!

You can now choose which of five fixed content categories your Moveworks AI Assistant will engage on. The old "Specific Topic Toxicity" model with free-form custom topic exceptions has been replaced.
The five categories — all blocked by default:

Violent— physical violence
Sexual Content or Sexual Acts— sexually explicit content
Suicide & Self-Harm— mental health and self-harm topics
Unethical Acts— content promoting unethical behavior
Jailbreak— attempts to bypass the Assistant's safety instructions

Opt-in to only the categories your organization wants the Assistant to engage on. Uncheck at any time to return a category to being blocked. Selections apply to every future interaction.

A common reason to allow a category: Allowing Suicide & Self-Harm lets employees asking for mental health support get routed to your EAP knowledge instead of receiving a generic decline.

Where to configure Chat Platforms → Display Configurations → Moveworks AI Assistant Display Settings & Disclaimers → Safety Guard: Allowed Content Categories

To verify your settings are working

After saving, test directly in your Moveworks AI Assistant:

- Send a message that falls into a blocked category — the Assistant should decline and return a refusal message.

- Send a message that falls into an allowed category — the Assistant should respond normally.

Note: Changes may take a few minutes to take effect after saving.

A few things to know

This is the Moveworks-specific safety check. Underlying OpenAI/Azure toxicity protections still apply and aren't configurable here.
Align with HR, Legal, and Security before allowing any category — selections apply to every user in your org.

Read how to configure the Toxicity filter here.