I wanted to share a useful architecture for empowering non-technical teams—like Sales, Finance, or Operations—to query complex databases using natural language. The logic is system-agnostic and can be applied to any SQL-based data source.
The Problem: The "SQL Gap"
In many organizations, critical business data is locked behind complex SQL databases. Non-technical users often have to wait for analysts, navigate clunky BI tools, or attempt to write SQL themselves (leading to errors or inefficiency).
The Solution: Natural Language to SQL
This architecture allows users to ask questions like "What was the total revenue for Customer X in Q3?" or "Show me the top 5 products by volume last year"Â and receive real-time data directly in MW Assistant. The plugin automates the translation of intent into an executable SQL query.
Architecture Overview
1. Data Modeling & Training
The core of the solution is a high-quality system prompt. Rather than just listing table names, provide a Data Model that includes descriptions, synonyms, and example values to help the LLM map business terms to your schema.
<attribute_info>
  <attribute>DIVISION</attribute>
  <description>Identifies the specific business unit or product category</description>
  <example_values>Computer, hardware</example_values>
  <synonyms>Division, Demand plan, customer divisions</synonyms>
</attribute_info>
Few-Shot Training: Include ~50-100 question-qnswer pairs. This teaches the agent which tables to JOIN and the exact SQL syntax required (e.g., Snowflake-specific functions) for your environment.
2. SQL Generation Step (LLM Action)
Use mw.generate_text_action to process the user prompt and output a structured SQL string.Â
Guided by your system prompt and few-shot examples (Q&A pairs), it performs two critical tasks:
- Intent Deciphering: Maps ambiguous terms (e.g., "top performance") to specific SQL functions (e.g., SUM(ESTIMATED_REVENUE)).
- Parameter Extraction: Identifies and filters for dates or customer names directly from the user’s message.
3. Secure Execution
Generated SQL string is passed to an HTTP Action that communicates with the system. This is a managed service or API endpoint that holds the necessary credentials to interact with your data warehouse.Â
Once the query is executed, the Output Mapper ingests API responses (like JSON arrays) and presents them in a user-friendly format.
Tips
- Practice Iterative Learning: Treat your prompt as a living document. Regularly review failed queries and add them as new examples to improve accuracy over time.
-
Scripted Cleanup for API Readiness:Â Since LLMs often wrap output in Markdown, use a Script Action to sanitize the query before execution. This handles Markdown removal and ensures URL encoding.
- Enforce Result Limits: To prevent overwhelming the interface with massive payloads or triggering complex pagination logic, instruct the agent to always include a LIMITÂ clause in the generated SQL (e.g., "Limit results to the top 20 rows"). This ensures fast response times and cleaner data presentation.
- Schema Pruning: Only provide the LLM with the tables and columns absolutely necessary for the specific use case. Overloading the prompt with an entire warehouse schema increases the risk of incorrect queries.
Have you implemented a similar "Talk to your Data" use case? Let me know your thoughts or any "gotchas" you've encountered in the comments!