Objective:

A COVID-19 care assistant we created in June-2020, in the early days of the pandemic to offer a commoner with an interactive assistant to help navigate through the confusion and misinformation. With interactive Q/A, the chatbot provides guidance about the disease, safety measures, risk assessment report and actions to take at those the risk levels.

Data sources:

WHO, Harvard med school, CDC and Govt. websites

Steps:

  • Data acquisition, via APIs or scrapping: This involved collecting large corpus of unstructured textual data from the public authorities and research journals.
  • Data preprocessing: To create structured data pipelines and streamlined datasets, an experienced data engineer follows the process of data cleaning, Data selection, Integrating and eventually aligning datasets into congruent data structure. The overall process of data mining includes knowledge mining from data, knowledge extraction, data patterns and trends analysis. For which the following steps were conducted.
  • Topic modelling: As a standard practice in natural language processing, NLP engineers create a topic model. It is a statistical model for exploring primary topics or themes that dominantly present in the collection of documents. This text-mining tool also helps uncover hidden semantic structures in large text corpora. The idea is to branch out the text data body into first level tree strucuture. Each topic branch then can be further expanded in order to create a knowledge graph.
  • Named entity recognition and relation extraction: NER helps with deeper understanding and classification of the structured text data. It assists in identifying the relationships between the entities. A key task of information extraction and text analytics, it boosts the speed and quality of relevent responses.
  • Text summarization: AI based summarization boosts the performance of data indexing. Simply put, It helps the health assistant to provide more concise and faster responses to the users. In addition it results into efficient data management.
  • Conversation design: To build an engaging health assistant that provides im-promptu yet relevent answers to the users. It is essential to design the conversation flows to be as detailed and as diverse as possible. It is very much possible that the user may take any coversation path and jump to any point. While it is an iterative process to create new conversation paths and re-train the chatbot frequently, following are the basics that must be covered to build a strong POC level version 0 AI assistant. These include - Define the audience, Define core services and KPIs of the AI assistant and conceptualize a chatbot persona based on that.
  • Q/A generation: The creation of new conversation paths was also assisted by automated Q/A generation. Synthetic data of coherent questions and answers helped attain a scalable ML life cycle.
  • Deploy on a UI as a webhook: The POC version was deployed on web with responsive UI. The users can interact with the COVID-19 care assistant through the chat window and get assistance.
  • Integration on social media : Post successful testing on web, the revised version was integrated with Telegram and Facebook for easy access.

Want to leverage text / voice / video data? TotemX labs team can get you started from AI strategy to successful monetization of data and SOTA NLP tech! Reach out TotemX labs NLP experts today!

Get in touch!