Where to Start
In the knowledge that we are seeking, the subjects are typically your customers or consumers, and the objects are typically elements of your products and services. The relationships are the customers’ views and opinions; their sentiments. The first step in creating a model is to determine which objects you are interested in. Objects can be grouped into categories, and the objects within each category are arranged as a hierarchy. The depth of the hierarchy for a category depends on the level of granularity needed to determine the knowledge that you can act upon, and also on the volume of texts that will match the category.
For example, the ordering channel for a product could be categorized as “Online”, “Call Center” or “Store”. “Staff” could be divided into “Managers”, “Associates”, “Cashiers” etc. Remember that you will need to obtain these keywords from your customers’ verbatim, so you will need to create sub-categories based on the terminology that your customers use. The temptation might be to refine the hierarchy to many levels, but often the consumer does not easily distinguish things to the same detail so it may not be worthwhile spending time defining a complex model.
On the other hand, not refining the model sufficiently will mean losing out on valuable intelligence. For example, having a single category of “Flight” does not allow you to separate out analysis of different aspects of the flight, such as legroom, cabin service, menu choice, entertainment, etc. These are items that customers are likely to express opinions about, so it makes sense to build these hierarchy levels into the categorization model.
At the same time as you are thinking about the objects that are being discussed in the texts, and how they can be classified, you should also think about the way opinions about the objects can be categorized. For example, what opinions could be expressed about the staff in a store, or about the flight experience? How helpful were the staff in the store, or how knowledgeable were they? How comfortable were the seats, how extensive was the choice of in-flight entertainment etc.? These are Attributes and they need to be carefully thought through such that they can be segmented into useful groupings that can typically be determined from analysis of textual content. While we might be interested in consumers’ opinions of the change in color of our logo, they are unlikely to express an unsolicited opinion about it. If we were to ask them for an explicit opinion using an open-ended question, then we already have the context and we do not need to look for it in the text.
Asking the Right Questions
One way to approach building your categorization model and associated attributes is to think about how you would ask structured questions. With text analytics, you are effectively asking questions of the open-ended text, and the answers are in the structured output you get in the categorization model and the sentiment analysis of the opinions in the text.
Using an open-ended question instead of a series of closed questions means you are not explicitly asking about a particular aspect that you are interested in, such as the friendliness of the cabin staff. However you can imagine asking that question of the open-ended text responses and using that thought to help build your categorization model. This can also help to segment which questions you want to ask explicitly in a closed-question format, and which questions you can leave to be asked by the analysis of open-ended text responses.
In a survey, the way an open-ended question is worded impacts the answers. If you say to a customer “Tell us why you like the product”, then you will get answers about aspects of the product but not necessarily opinions, such as “The price”, “The design”, “The range of add-on modules available”, etc. This is fine as long as you have previously determined from a closed question that the customer actually likes the product, as you can then derive actionable insight from this. Furthermore, an open-ended question that is non-neutrally worded (such as "What are we doing well?" or "What do we need to improve?" will likely have unpredictable effects upon any sentiment engine's results.
If you are considering analyzing unsolicited comments, for example from social media or call-center notes, then ideally you should analyze these comments alongside your survey comments. In this case your survey question should be less loaded, for example “Please provide any further comments you have about the product”. Your text analysis will then ask the questions of the solicited response text in the same way as it is asking questions of the unsolicited text from support or social media interactions