Hello guys, this post is really overdue as I left this project some months ago, but still useful to share 🙂
If you are using React, AWS Lex, Kubernetes to develop your chatbot then you might find this article useful as this was the kind of the tech stack that I used on this previous project.
I am going through the test/release approach which I believe worked quite well, caught different type of bugs, Continuous Development with full automated release pipelines, just feature manual tests, but we could have improved on the api integration part (this one required a lot of maintenance).
You need to understand a bit about how the NLP (Neuro-Linguistic Programming) works, so you can plan your regression pack, exploratory tests around the possible scenarios/questions/utterances.
If you think about how your brain learns to communicate you will notice it needs to have like a manual to assimilate words and actions/objects/feelings etc. NLP is a set of tools and communication techniques for one to become fluent in the language of the mind. It is an attitude and a methodology of knowing how to achieve your goals and get results.
You can have a set of different types of tests, but these are the ones that I most focused and how we used them: Exploratory tests for the new features (manual tests), E2E (UI) tests with an example of each functionality for the regression pack (automated tests), API integration with happy path scenarios (automated tests), Utterances vs Expected Intents (This was a data test to check if the phrases were triggering the correct intent/AWS response), performance tests to review the performance of the internal microservices (automated tests).
Performing exploratory tests on the new features will bring you more understanding how the algorithm replies and understands mistypes and the user sentences. Also, it is about testing the usability of the bot independently the platform (mobile, web). This type of test is essencial to be certain the user experience is good from the beginning to the end. Imagine if the bot is popping up every time you navigate through the pages on the website ? This would be really annoying right ?
The scope for this kind of tests was really reduced, just acceptance criteria here and checking the usability of the bot. As always there are some exceptions where you might have to test some negative scenarios as well, specially if this is a brand new functionality.
The regression test pack was built with Testcafe and contained only the happy path, but randomly choosing the utterance for each intent. Some of the intents were customised in my previous project therefore we had to run some tests to assure that the cards, response messages were rendering correctly on the bot.
We were always using real data, integrating with the real QA server. As the aim was not to test AWS, but the bot, we had a script to run before the tests to download the latest bot version (intents/utterances/slots) in a json file and use this data to run the tests on the bot against the server.
If you are not mocking the server and you want to see how the chatbot is going to behave with a real end user flow, just be aware that this type of tests have pros and cons, sometimes testcafe was failing because there was a problem with the client and sometimes with the server, and that’s ok if you want to make sure the chatbot is working smoothly from the start until the end and you have end-to-end control (backend/frontend).
You can mock the AWS Lex responses using the chatbot JSON file, so you can return the response expected for that utterance, in this case you are not testing if the entire flow is correct, but you are making sure the bot is rendering as expected. Just be aware that in this case, you would be doing UI tests and not E2E, but this is a good approach if you don’t have full control of the product (backend/frontend).
For the integration tests between the internal microservices, you can use Postman and Newman to run on the command line.
– It is horrible for maintenance (think how often you are going to update these scripts);
– Need to add timeout conditions to wait for the real server to reply/the response data to change (even though these waits helped us to find bugs on the database);
– It takes a long time to run as it depends of the server’s conditions;
– Debug is not great, you need to add print outs most of the time;
– It is easy to write the scripts;
– It is fast to get it running, you can add hundred of flows in minutes;
– You can separate the type of tests with collections and use as a feature targeting;
In the end, the best approach would be adding some contract tests for the majority of the scenarios where you would save time with the maintenance, wouldn’t need to add retry with timeouts, etc.
You can use a framework like Artillery to create the scripts and check how the services are going to behave under a certain condition, fpr example X number of requests per second. Usually you will have a KPI as a requirement and from this you will create your script. It is pretty simple and you can simulate the flows of the user.
AWS Lex used to have quite a low limit of requests per second (maybe they have upgraded that by now). You can observe that depending on the load that you send, the request is going to fail before even reaching your service. It seems you can only use the $Latest version for manual tests, not automated, so keep this in mind as they suggest.
You can also check how to create a performance script in a previous post here and if you need to run these tests in the same kubernetes cluster (so you are able to reach the internal microservices) you can check it here.
If you want to test the intent responses, if they are the expected ones, you can create scripts using AWS Cli Lex models to export the bot in a JSON file and then you can extract from this json the utterances and the expected intent response.
With the JSON file in hand you will be able to go through all the type of the intents and all the utterances and check if they are returning the expected responses (cards, messages, links). A common error is when you have similar/duplicated utterances and the bot responds with a different intent than the expected one maybe because the bot is misunderstanding and getting confuse with the utterance from the other intent.
For example if you have a utterance like:
“get my card” that calls an intent X and then you have another utterance “get my credit card” in the intent Y, the bot can get confused and use the same intent X to reply both instead of knowing they are from different intents and not the same. This happens because the bot is constantly learning what the user means and tries to get the most probable intent.
Some challenges that you migh face during the chatbot development:
– Duplicated utterances accross different intents (utterances triggering the wrong intent), be sure you have a robust map of intents/utterances;
– AWS Lex console quality and support really slow to fix bugs (2/3 months to release a fix);
– Get all the possible ways to ask a question/service/etc. Try to get some help from call center people;
Really good talk about how to test chatbot with Lina Zubyte, she explains more about the exploratory tests, comprehension and character of the bot.
Also if you are interested, this is a AI & Machine Learning AWS tech talk from the 2018 WebSummit in Lisbon, Portugal: