Skip to main content
7.0 Duplicate

Cosine similarity or similarity models within built in actions and Moveworks datalake

Related products:Agent Studio
  • March 14, 2025
  • 4 replies
  • 28 views

Forum|alt.badge.img+4

Hi Moveworks community!!

We are having a few use cases where we need to analyze large tabular datasets to get data based on similarity.

I was thinking maybe it would be interesting for Moveworks to have its own datalake to store tabular files. And a built in action to use a similarity model (k nearest neighbor or cosine similarity) to get items based on similarity. 

I know Moveworks has its data ingestion connectors which works with knowledge and SharePoint data But also tabular data might also be interesting. Since API data analysis is limited and Moveworks cannot process one thousand json api items.

For example one of the use cases - we cannot use contain to lookup data, because data quality is not good and with similarity models this works very well. Also if we want to do recommendation systems within the chatbot this might also come in handy to recommend trainings and finance data.

I know most of this can be handled via middleware, but this is exact what we want to avoid and to everything on moveworks :)

just an idea!

Thank you!

4 replies

Ajay Merchia
Forum|alt.badge.img+3
  • Community Manager
  • March 19, 2025

Great idea ​@João Durand. Re this:

Moveworks cannot process one thousand json api items.

 

We’re actually building a structured data analyzer that CAN process that much data. It’s like a code interpreter that writes code to analyze large volumes of tabular data.

Can you share more about the exact data table & use case you’re interested in? I can see if there’s a good fit.


Forum|alt.badge.img+4

Hi ​@Ajay Merchia!

Sounds promising!

We recently got a use case to connect to the service offerings/IT portfolio table on service now to do a look up.

This table has 2000 records.

We are now using a contain to filter by name (because mws cannot process such a large amount of data) but the data quality is bad with weird characters and spaces and the contain assumes the user knows the name of the service.

I did a PoC using cosine similarity on an azure function to fix this and it was working surprisingly well. and the user could just describe the service, not only a keyword and the model would work. We are using middleware which we don't want to use so we drooped this idea.

Furthermore we also have a trainings use case where we are planning on recommending trainings to the users this would also solve this problem.

Essentially a recommendation system based on large data.

 


Ajay Merchia
Forum|alt.badge.img+3
  • Community Manager
  • March 19, 2025

Ah I see - this is like fuzzy matching on the list of possible responses. We’re planning on solving this actually by allowing you to index your records and then (w/ Slot Resolvers) tie them to your plugin using a data type. I would follow up and +1 this feature.

 


Ajay Merchia
Forum|alt.badge.img+3
  • Community Manager
  • March 19, 2025
Updated idea status1. New7.0 Duplicate
Idea merged into:

All the votes from this idea have been transferred.