In the field of Customer Experience (CX), from AI assistants and voice calls to human agent chats and email ticketing systems, the underlying natural language conversations/interactions is a black box. Opening up that black box means visualizing the organic natural language flows to pinpoint the breaking points and failures to better serve the end customer.
The issue is amplified in Gen AI assistants. Unlike classic dialog trees designed manually, Gen AI assistants are free-form, with little observability and monitoring.
Regular iterative changes to CX needs monitoring, including conducting A/B tests to track improvements. This requires diving deeper into the customer conversation data flows and taking actions based on those insights to drive continuous improvement in the quality of support service, while improving retention & revenue as a side effect.
How to monitor and analyze conversational CX flows?
At Cuein, it all starts with raw conversation transcripts or recordings between agents (bot or human) and the end customer. The Cuein AI platform extracts 100s of signals aka “cues” from each conversation like understanding the user query, underlying root cause, data collected to clarify the user query, and agent actions. It also bubbles up cues like frustration and confusion to get additional understanding of the interaction.
The platform then aggregates these cues across millions of conversations and groups them into actionable variations or conversation flows. Let’s dive deeper into these flows, what they look like and some design considerations for the product.
The anatomy of conversation flows
Design considerations for Conversation Flows
Here are some of the design considerations that were made as part of numerous iterations of conversation flow
Node Taxonomy
There is a lot of historical literature on message (aka utterance) categorization and taxonomies. Classic academic papers including MultiDoGo, Schema Guided Dialog Dataset and MultiWoz have attempted to construct realistic conversations and define their own message categories.
The node taxonomy is inspired from the above references but tailored for the future of conversational interfaces. The nodes essentially have two parts to their anatomy.
The first is a domain agnostic taxonomy which is derived from a combination of some of these taxonomies. Note that there were categories like “Greeting”, “Goodbye”, “Request More” but since they were less meaningful parts of the conversation, they are not shown in our conversation flow.
Categories like User Intent, Agent Response, Agent Transfer, Confusion are the meat of the conversation and help surface the relevant and most actionable aspects that need attention.
The second part of the anatomy are intent categories, response categories, pieces of data information collected as part of a transaction (like email, order number, etc.) which are specific to the dataset being analyzed. This varied based on the vertical domain, and even with the same domain based on the use case.
Our taxonomy continues to evolve with new use cases and needs constant review and tweaking on an ongoing basis to make it efficient and effective for the end product.
Filter and Search Flows
Another challenge in visualization of natural language conversation flows is to get to the actionable pieces quickly without having to explore through all the permutations and combinations. The constraint gave birth to an opinionated filter abstraction to enable slicing/dicing and segmentation to derive actionable improvements.
Intents or contact reasons are used as the primary way to drill down into specific subset of flows. As seen below, Food Delivery Estimate is the selected intent for the example. Additional intents if present in conversations would also show up if there were interactions with multiple intents.
There is also a zoom option (1x to 3x) which shows more of the top flows for that intent by frequency of occurrence. The gradual addition of more flows makes it easier to consume the information at different levels.
In addition to filtering by user intent and contact reason, dynamic filtering of the flows are supported by dimensions like Confusion, Deflection, Inferred CSAT, etc.
Lastly, power users can navigate flows through powerful keyword and semantic search on the entire conversation transcript.
Generalizations and compression to separate signal from noise
Messages are the components of a conversation. Initially, we modeled every single message into flows, which resulted in many different paths in the flow with insignificant variations in them.
For instance, similar agent answers might be split into multiple messages interleaved with occasional filler messages and/or user acknowledgement and clarifications with repetitions. This explodes the flows with inherent noise in natural language conversations.
We evaluated and experimented with many generalization and compression strategies before picking an optimal view for the high level noise free flow view and node semantics while not losing critical information in terms of frustration, confusion and abandonment in the conversations.
Conclusion
As language emerges as the new UI, conversation interfaces need a flow visualizer to be able to provide transparency into breaking points. Real world conversation flows are complex and nuanced, and need multiple design considerations to be insightful and actionable. At Cuein, we invested years in the research and development of conversation flows in production environments and continue to explore the balance between noise and signal.