How it began
Garden Games is a company that sells and manufactures outdoor garden games. The company began in 1997 when Stuart Cardy made the first Hi-Tower for his twins' first birthday party, a creative way of solving the problem of how to entertain all ages outdoors. John Cardy now runs the business.
Garden Games interact with many customers via email, and John was getting concerned that employees were spending a large amount of their time answering simple queries, like:
- Where is my order?
- Do you deliver to Italy?
- Please cancel my order
John wanted his employees to spend more time creating innovative garden game ideas.
Customer emails were already turned into tickets using the Zendesk helpdesk system. However, agents still needed to spend time researching answers, looking up products and checking orders. John felt that if a computer could answer these queries, Garden Games employees could spend more time on the creative and complex side of the business.
John shared these ideas with Rock Solid Knowledge director Andrew Clymer at a business networking event. Andrew was sure our team could help, and we started working with Garden Games to realise this vision.
The heart of all Garden Games product management is held in KHAOS Control, an inventory system. KHAOS Control manages customer invoices, orders, products, pricing, stock status, customer details, and more.
A customer might ask about a product or their order, supplying an order number like BGH123456. The Garden Games staff would have to:
- Login to KHAOS Control
- Find the order or product
- Check the status
- Relay that back to the customer
Garden Games has two online outlets that both receive orders either directly or via Amazon/eBay. Each shop and channel require unique specialists and different tones of voice.
Where and how you sell a product affects your audience, and how you interact with the customer. A customer asking “Where is my invoice?” in Amazon needs a different answer than someone buying directly.
Garden Games felt that a machine powered by artificial intelligence could answer these kinds of requests. We coined it a helper bot, whose role would be to remove manual steps, automatically returning information via a user-friendly reply. This helper bot would tailor the reply by shop and channel, allowing Garden Games to adjust the content and instructions as needed.
Machine learning choices
Now we knew the problem, we could start building an experimental prototype. Was this solution even possible? At the time, Rock Solid Knowledge hadn’t built something like this before, and we weren’t overly experienced in the machine learning space at the time. We were confident that we had the technical skills and discipline to learn, but starting a new project is always tricky.
To add to our requirement of a helper bot that could automatically answer repetitive customer queries, we also required that Garden Games employees could customise their own reply content and business rules. And we needed to be able to spin this up without a machine learning PhD.
Narrowing down the myriad of technical options that were available to us, we considered:
- Zendesk’s Answer Bot
- Hand-crafting our own machine learning solution
- Buying an existing AI solution
- Developing against IBM’s Watson Assistant
- Using a Microsoft offering called Language Understanding Services (LUIS)
Our company is familiar with the Microsoft ecosystem, plus we needed full control of the solution to be able to integrate it with third-party services.
LUIS is accessible via Microsoft Azure and starts from free, so we took the plunge and signed up.
For all intents and purposes
Microsoft LUIS is part of Microsoft’s Cognitive Services offering, a speech, text analytics and machine translation cloud service available on Microsoft Azure.
LUIS applies custom machine learning intelligence to process and interpret natural language text. Think of natural language as a question you might ask a device such as Amazon Echo or Google Home. You don’t need to speak an exact set of words because the device can interpret the intention of what you are trying to say. LUIS does the same – it predicts overall meaning and pulls out relevant information from a phrase.
If we send LUIS a query, or utterance, it can determine the intention, or intent. Key terms to know are:
- Utterance: the phrase to interpret, e.g. “Where is my order?”
- Intent: an action to perform, e.g. “Check Order Status”
Imagine LUIS hearing a phrase and saying, “Which bucket should this go in?”.
You can preload LUIS with prebuilt intents that are useful to your problem space. We preloaded LUIS with a set of intents from the shopping domain, as well as crafting our own.
We enabled an optional LUIS feature to send the utterance to a sentiment service that returns a sentiment score. This informs us whether the utterance is negative, neutral or positive, allowing us to adjust our replies further if a query is very negative.
As well as predicting intents and sentiment, LUIS can extract data items from an utterance. These data items are known as entities.
Entities are wide-ranging and can be prebuilt or custom built (like intents). In the utterance “Do you deliver to Italy?” the entity is Italy, one of the geography entity types.
You can load LUIS with:
- Prebuilt domain entities (e.g. calendar-related)
- Prebuilt non-domain entities (e.g. temperature)
- Custom-built entities
Entities can also be:
- Machine-learned: entities trained over time
- Non machine-learned: a hard-coded set of entities, such as a product list
So, what is the model powering LUIS under the hood? And first of all, what is a machine learning model anyway?
We can consider a machine learning model to be a function with learnable parameters that is trained on data from our problem space, mapping an input to the desired output. For example, our LUIS model takes an utterance as the input and transforms it into an expected output comprised of intents, entities and sentiment.
We can’t inspect the LUIS model, but we can assume it uses a mixture of models to supply the functionality. We can presume that:
- LUIS classifies the sentences it finds into intents or categories, so it uses a classification algorithm
- LUIS must be using supervised learning because we have to manually label intent and entity data*
* In contrast, unsupervised machine learning would have sent LUIS to learn from clusters of similar intents (with no instructions).
Accurately does it
The more often the model matches the input to the correct output, the higher the model accuracy. To increase prediction accuracy, LUIS needs to train, learn, and extract entities from many utterances per intent. LUIS uses experience gained during machine learning to get better at predicting what the user wants, evolving the model over time.
It might seem like adding more intents could mean more development time and complication. Rather, adding suitable intents helps clarify the model. Supplying a bucket of similar phrases strengthens the ability of LUIS to match other intents by creating differences between them. Despite this, intents that are too alike may skew results.
Once we understood the fundamental concepts of LUIS and were more familiar in that space, we span up a prototype for Garden Games that:
- Sent an utterance
- Used LUIS to predict the intention of the utterance
- Sent back a customised reply from Umbraco
It is hard to decide on a good score for a machine learning solution that handles human interactions. We agreed that for the project to be successful, we would need the helper bot to correctly handle over 30% of customer queries.
This seemed realistic, but how hard would it be?
The story so far...
We had designed a prototype of a helper bot that would to automate replies to common Garden Games customer emails. To succeed, the helper bot had to handle 30% of queries as accurately as a human.
The next step was setting up the helper bot to work with Zendesk. When a customer sends an email to a Zendesk helpdesk, Zendesk turns it into a ticket (or adds to an existing ticket). We hooked into this new ticket creation event by creating a Zendesk trigger to send the ticket to our custom API controller. This controller orchestrates what happens next, and it's first job is to get rid of the bits of an email we don't need.
Everyone has a unique email style; some like to tell stories with tangents and pleasantries, while others prefer to get straight to the point. Many emails have extra content that we naturally ignore when reading and don’t process. There are also disclaimers, forwards, and signatures muddying the content.
Also, Microsoft charges based on the number of LUIS requests, so the fewer utterances (sentences) we send per email, the cheaper it becomes.
To reduce this email noise, our helper bot needed to pre-process the ticket content.
We created a list of editable email forwards in Umbraco, so the helper bot could recognise where a forward begins, chopping off all text after that point.
We soon found we were getting excessively long sentences; many without punctuation at all. We used, and contributed to, a library called OpenNLP (NLP being natural language processing) to detect and split content into appropriate sentences.
After some further text sanitisation, such as removing special characters and HTML, we could send each cleaned utterance to a LUIS developer API endpoint.
What do LUIS responses mean?
As described earlier, LUIS processes each utterance into an overall response with:
LUIS offers a management dashboard that lets us take a holistic view of what is happening within our instance. Here you can manage intents and entities, and test, train, and publish the LUIS model.
We can also add utterances and review the returned response data to understand what they are made up of. A typical LUIS testing result for the utterance “Do you deliver to Italy?” might return this result:
- Utterance: "Do you deliver to Italy?"
- Top scoring intent: "GetDeliveryInformation"
- Score: 0.999
- Entities: "italy - type geographyV2"
- Sentiment: "positive (score 0.7104479)"
Some sentences always follow a set template, which means we can add them in LUIS as patterns and set them against their required intent. An example is an auto-generated email reply from a company with a standard subject:
“Flower Butterfly Teepee is on its way! Order and invoice included”
We can switch out the dynamic entity variables and add this as a pattern, matched against an “Get Item Details” intent. That way, when LUIS comes across this utterance, it will have 100% confidence that someone is asking for details of a product.
If the intent relates to an order or product, our warehouse service fetches data from the customer’s KHAOS Control instance (e.g. product stock information). By building a response automatically, the customer service agent does not have to query multiple systems.
Additionally, our helper bot regularly caches KHAOS Control product data so we can update the LUIS product entities list.
With these steps, the helper bot can build up a comprehensive and appropriate customer reply by updating the original Zendesk ticket via the Zendesk Support API.
Depending on the logic required, the bot can update the original Zendesk ticket with:
- Whether to show a private or public reply
- Priority, such as urgent for very negative queries or cancellations
- Custom tags like “manual reply” or “stock low”
Private Zendesk replies mean that only the Zendesk agent will see the reply, while public replies are shared with the customer. You can imagine how important it is to have confidence in an intent before making it public; we monitor the private replies carefully until we are confident enough to put it live.
It was also important that we made the automated reply good enough to aid both the customer and the agent in resolving their issue. It needed to answer the query effectively, and also be able to be customised and fine-tuned as needed throughout the year.
To enable Garden Games to customise their replies, we chose Umbraco, the open source content management system (CMS). Our Umbraco setup allows fine control over the way the helper bot handles an intent, letting Umbraco content editors structure the entire content of email replies.
Content editors can edit various thresholds and settings, giving them fine-grained control to:
- Choose when to ignore a low scoring intent
- Decide when the negativity is low enough to trigger a reassuring email
- Choose the minimum number of stock items to trigger an internal low stock flag
- Edit a list of email forwards to cut off processing from
To enable this level of fine tuning, each intent in LUIS has a matching Umbraco intent. Content editors can tweak settings against an intent to:
- Specify different content for each shop and channel
- Decide the intent’s helpdesk priority
- Tag intents as needing a manual helpdesk reply
- Check the stock level
- Assign the ticket to a specific helpdesk agent
- Always show a public response in the helpdesk instead of a private note
- Decide how multiple intents relate to each other
- Ignore intents or not process them as negative
For example, a response including a cancellation is set as always urgent and assigned to a specific helpdesk agent, excluding all other intents from the reply.
To further improve clarity, editors can choose to discard any intents returned from LUIS that are not part of the crux of the message, including:
- Email signatures
- Sign offs
- Privacy warnings
Each entity (data item) returned from a LUIS response has a matching entity item in Umbraco. The reply can be further customised, with each entity adding contextual text to the reply, depending on the intent.
I love seeing an architectural diagram to show the flow of the application, especially one with several moving parts like our helper bot. Here's a diagram of the workflow of our helper bot:
Optimising the model
Rock Solid Knowledge completed the bulk of helper bot development by January 2020 and were spending time playing “Where will LUIS put the intent?”.
LUIS was learning, and as we started to creep up past 20% accuracy, it became a fun challenge to hit 30%. How did we know if were getting it right, and improve things if not?
Optimising and improving the model is a major part of machine learning. The LUIS dashboard allows us to review the overall prediction accuracy and investigate problematic intents like dominant intents that weigh heavily and need balancing out, or an intent that is too similar to another intent.
Testing the model
Monitoring the success of our helper bot involved more than viewing a LUIS dashboard. To track the bot’s overall accuracy, we:
- Recorded whether an intent reply passed or failed based on manual checks
- Built a test tool to observe how our app was performing against an untrained dataset (end to end)
- Learned how to use the LUIS batch testing facility to test the accuracy of our model on an individual utterance basis
LUIS models are versioned and you can publish them to two different slots, staging and production. If we found the LUIS accuracy decreased on staging, it meant we needed to test and train LUIS to strengthen the model before pushing the model updates to live.
Training data at different times of year skewed things because customer requests differ between summer and the festive periods. Our model balanced out as we gathered more data throughout the year.
What happened when we went live?
We were thrilled that we hit 30%! This was a fantastic moment for all of us. Garden Games spoke about the bot like it was a colleague, helping them reduce the manual replies to many emails they retrieved daily.
Our helper bot reduces customer workload and enables Garden Games employees to spend time on the complex cases that require creative thinking.
Rock Solid Knowledge plan to productise this and build it into a full Zendesk app. We are also building a free Zendesk app that analyses and prioritises tickets based on positivity. Look out for our upcoming release, and please let us know if a helper bot is something that you would like to try out for your company.