Toxicity Filter Improvements
We've made some updates to the toxicity filter!You can now choose which of five fixed content categories your Moveworks AI Assistant will engage on. The old "Specific Topic Toxicity" model with free-form custom topic exceptions has been replaced.The five categories — all blocked by default:Violent— physical violence Sexual Content or Sexual Acts— sexually explicit content Suicide & Self-Harm— mental health and self-harm topics Unethical Acts— content promoting unethical behavior Jailbreak— attempts to bypass the Assistant's safety instructionsOpt-in to only the categories your organization wants the Assistant to engage on. Uncheck at any time to return a category to being blocked. Selections apply to every future interaction.A common reason to allow a category: Allowing Suicide & Self-Harm lets employees asking for mental health support get routed to your EAP knowledge instead of receiving a generic decline.Where to configure Chat Platforms → Display Configurations → Moveworks AI Assistant Display Settings & Disclaimers → Safety Guard: Allowed Content Categories To verify your settings are working After saving, test directly in your Moveworks AI Assistant: - Send a message that falls into a blocked category — the Assistant should decline and return a refusal message. - Send a message that falls into an allowed category — the Assistant should respond normally. Note: Changes may take a few minutes to take effect after saving.A few things to knowThis is the Moveworks-specific safety check. Underlying OpenAI/Azure toxicity protections still apply and aren't configurable here. Align with HR, Legal, and Security before allowing any category — selections apply to every user in your org.Read how to configure the Toxicity filter here.