We've made some updates to the toxicity filter!
You can now choose which of five fixed content categories your Moveworks AI Assistant will engage on. The old "Specific Topic Toxicity" model with free-form custom topic exceptions has been replaced.
The five categories — all blocked by default:
- Violent— physical violence
- Sexual Content or Sexual Acts— sexually explicit content
- Suicide & Self-Harm— mental health and self-harm topics
- Unethical Acts— content promoting unethical behavior
- Jailbreak— attempts to bypass the Assistant's safety instructions
Opt-in to only the categories your organization wants the Assistant to engage on. Uncheck at any time to return a category to being blocked. Selections apply to every future interaction.
A common reason to allow a category: Allowing Suicide & Self-Harm lets employees asking for mental health support get routed to your EAP knowledge instead of receiving a generic decline.
Where to configure Chat Platforms → Display Configurations → Moveworks AI Assistant Display Settings & Disclaimers → Safety Guard: Allowed Content Categories
To verify your settings are working
After saving, test directly in your Moveworks AI Assistant:
- Send a message that falls into a blocked category — the Assistant should decline and return a refusal message.
- Send a message that falls into an allowed category — the Assistant should respond normally.
Note: Changes may take a few minutes to take effect after saving.
A few things to know
- This is the Moveworks-specific safety check. Underlying OpenAI/Azure toxicity protections still apply and aren't configurable here.
- Align with HR, Legal, and Security before allowing any category — selections apply to every user in your org.
Read how to configure the Toxicity filter here.