A Discussion of Text Analytics with Michael Tupanjanin

Michael TupanjaninWhat follows is the next in a series of interviews I conducted at the Net Promoter Conference in San Francisco last month.  If you missed my video interview with Dr. Ming Duong-van, you’re going to want to click over for a listen to his fascinating interview.  Still to come is an interview with Satmetrix CEO Richard Owen.  This interview is with Michael Tupanjanin, the CEO of Metavana.  The interview was conducted in the morning on the day Metavana and Satmetrix announced a partnership to create a social Net Promoter Score called the SparkScore.

Dana Stanley: We’re here at the Net Promoter Conference at San Francisco with Michael Tupanjanin, CEO of Metavana, as well as the company’s CMO, Romi Mahajan.

Michael, why don’t you go ahead and tell people who aren’t familiar with Metavana a little bit about your company.

Michael Tupanjanin: Sure, so Metavana was started about three and a half years ago by a guy named Ming Duong-van.  Dr. Ming is very well known in the academic circles primarily as a physicist. He was actually the co-founder of chaos theory. And he’s spent a lot of time studying the text analytics market and has, I think, done some incredible breakthroughs, scientific breakthroughs, specifically the algorithms that he’s written for Metavana that really take a look at text, specifically in the social web, and really uncover the true meaning and opinions that people have on the social web.

Dana Stanley: So when you’re talking about the social web and text, give me a practical sense of what type of data your software’s analyzing.

Michael Tupanjanin:  Well, I think just about every piece of text as far as I know is unstructured on the social web, which can be incredibly chaotic. So if you think about the correlation of people that have studied chaos theory and the clusters of galaxies, you’re actually able to apply that scientific principle to the social web, where the conversations are unstructured, the sentence and the grammatical structures are completely wacky, and the content itself is very unstructured. Being able to actually get meaning out of the second structures is a very difficult thing to do.

Dana Stanley:  What are some examples of how folks are using the Metavana technology to gain insights?

Michael Tupanjanin: We have a couple customers, like Marriott, they have a customer service group that spends a lot of time looking at the social web analyzing things like the basic things, what was your stay like at our hotel? Were the beds OK? Were the towels OK? Was the room service OK? And they’re always analyzing those pieces of information to see how they could improve their service.

We have another company that’s using our technology for smartphones. So right now, the smartphone market is incredibly competitive. We have a clear leader in iPhone, and they’re trying to figure out what their competitive advantage is. What kind of things can they put into their product to make them better? They’re also looking at customer service issues.

Dana Stanley: What do you say to people who throw out the idea that not all sentiment is on the web, that the people who participate in the web, that’s just a segment of all that sentiment that people need to pay attention to.

Michael Tupanjanin: That’s a good question. I’m a neophyte in market research. But here’s my impression. Market research is actually somewhat limited in terms of the sample size, right? You send out a survey to a bunch of people, but the sample size of the social web’s a lot larger than the sample size that you send out to people through your surveys themselves. And I think there’s also a predisposition amongst people that actually are willing to fill out a survey, as opposed to people that are just expressing their opinions on the web where it’s a little less stilted, and you actually probably get more meaningful information back.

Romi Mahajan: Dana, can I just pop in on that?

Dana Stanley: Absolutely.

Romi Mahajan: I think it’s a very prescient questions about how big, how complete is your set, right? And clearly, the social web is not everything, but there are 845 million people on Facebook. There are 250 million, bordering on now 270 million tweets a day. And each of these expresses something. Now, not all of them express sentiment, but a lot do. I think where normal, canonical market research needs to grow and evolve is in the notion of active data collection versus passive data collection, where what people are expressing on the social web is– they’re expressing it while in the context, their natural context.

They’re not being prompted. And so you get a different set of data, right? You get maybe a more natural set, a more authentic set, but a different set. In reality, when you put these two sets together, you get the truth. But the fact that structured data is easier to come by and unstructured data is harder to decipher, that’s what gives a company like Metavana room to maneuver.

Dana Stanley: Where do you think companies are in terms of their approach to this? Are companies diving into sentiment analysis? Are they wary? How would you assess that?

Michael Tupanjanin: I think that the market, in general, is incredibly interested. And I’ll take it to a higher level called text analytics as opposed to sentiment analysis.

Dana Stanley: Sure.

Michael Tupanjanin: I think the market’s incredibly confused. I think the market’s incredibly chaotic right now. There are lots of solutions that are available in the market. And I think a lot of the solutions are incredibly complex to actually do implementations to. So traditionally, a lot of those sentiment analysis or text analytics seem to reside with the knowledge management people inside major companies. And I think there’s a huge opportunity to actually now take it out to the masses, to the functional leaders, the sales leaders, the marketing leaders, the product management leaders, the research leaders, where they really haven’t had access to this kind of technology before.

I think there’s a lot of latent demand for it, but there’s also a confusion because I think so many different companies are approaching it in so many different ways. And I think traditionally the accuracy levels have not been that great. So I think there’s a little bit of skepticism, too.

Dana Stanley: So help me understand Metavana’s unique approach.

Michael Tupanjanin: So without getting into a long, scientific explanation – what it all comes down to is the algorithms that you write and how accurate they are and the principles that you apply. Traditionally, there’s been two approaches to what we’ll call text analytics. There’s been the natural language processing approach and then the more machine-learning approach.

The natural language processing, tends to be a very highly curated approach, like almost a lot of human intervention actually looking at grammatical structures and trying to develop taxonomies to be able to pull out the meaning, versus the statistical approach, which is much more automated and based specifically on algorithms themselves. Traditionally, people have felt that the statistical approach is less accurate, that the natural languaging process approach is more accurate.

However, the natural languaging approach tends to be not scalable because you have to spend a lot of time going through taxonomies versus having a more statistical approach, which is much more scalable. We tend to be more towards the statistical end, but the algorithms that we have written have taken accuracy to a whole new level, up to over 95%.

Romi Mahajan: Dana, it’s a great question. I think Michael answered it correctly on the scientific side. When we think about our business in general, right, we think about three core principles around why we think we’re unique. One is clearly accuracy, right? So whereas the industry is offering scarcely better than a coin toss accuracy, we’re offering one standard deviation away from perfect, so 95%, 96%. The second is what we call accessibility. We don’t believe that customer satisfaction understanding the social web should be sequestered or siloed someplace in the CSAT division of a company. It’s really for everyone.

So we’re building a system that allows any one in the corporation to be able to take– to interpret the social web. Accessibility is the next thing, true enterprise scale. And the third thing is scalability. We believe that our business model is going to offer the ability for anyone, regardless of price point, regardless of degree to which they believe in the social web or not, to access the social web. So those three principles we think make us unique.

Dana Stanley: That’s great. One thing that stood out, you mentioned the accuracy level. I’m just curious, how do you measure accuracy, or how do you self-evaluate as your algorithms presumably evolve?

Michael Tupanjanin: Yeah, we actually have to do it the old fashioned way. We literally will take– we recently did about 3,000 quotes that we actually rated, and we sat down with a bunch of high school kids and actually had them go through sentence by sentence by sentence and see, how would you score this sentence? And how did the machine score the sentence?

Dana Stanley: So you’re basically giving them homework?

Michael Tupanjanin: Absolutely.  There is no other way to do it because you can either do it some kind of automated way, which again, people question whether or not that’s the right way to do it.

Romi Mahajan: The thing is, once you go through the high school exercise, then the system learns on its own. But you have to go through the initial validation period to make sure that if someone leaves Starbucks and says, man, that Americana was awesome, that somebody’s verifying that that’s a positive comment.

Dana Stanley: Yeah, and how do you account for evolving language, and Urban Dictionary entries, and the fluid nature of language?

Michael Tupanjanin: Yeah, so the way the process is set up, we actually– one of our unique things is that we actually do things on a domain by domain basis. So we, for example, we’ll start with smartphones as a category. We’ll start with printers as a category, hotels, or airlines. And each of those domains has their own specific language in them. And one of the things that we do is the engine goes out actually crawls and trains itself on the language of that particular domain. So that’s one of the reasons that we get such high accuracy rates.

But the reality, as you said, is that language continues to evolve. And new words of slang appear all the time. So we found that we have to at least have the engine retrain itself every quarter. And it’s not a manual process. It’s literally simply going out and crawling the same data sources and doing almost like a QA process on the data sources for about a week, and then it’s updated itself on the slang. What it also does is it updates itself on categories. So what the engine does when it goes out and crawls, versus having a taxonomy that’s kind of predetermined, it actually will develop its own taxonomy based on organically what seems to be the right category.

So, for example, we crawled the airline industry, and lo and behold, the categories that came up were seating, crew, entertainment, waiting lines at the airport, baggage handling, all the things you would suspect. But at some point, there could be other categories that emerge.  For example, security, gate security, and stuff like that seems to be starting to percolate on the social web could become a category, too. So that’s part of the engine’s updating process.

Dana Stanley:  Do you sometimes get into arcane industries where maybe the client would have particular language that your incorporating as you go along?

Michael Tupanjanin: Some industries are more difficult than others. We’ve actually looked at, for example, one of our customers is a coffee machine manufacturer. And that’s a fairly simple, straightforward thing versus pharmaceuticals, where you start to get into some pretty arcane language around drugs and therapies, and that’s a lot more difficult. So I don’t know if we have all the answers for you. We’re looking at– pharmaceuticals, I think, will be a little bit of a tougher industry for us.

Dana Stanley: Interesting. And is it just English at this point?

Michael Tupanjanin: English, yes. We’ve done, now, tests in both Chinese and French. And interestingly enough, it’s taken about a day.

Dana Stanley: Wow.

Michael Tupanjanin: Yeah.

Dana Stanley: It took me longer than that to learn French.

Michael Tupanjanin: Well, what’s interesting about the technology, it’s not based on grammatical structure. It just needs to have a translation of all the words themselves, and then it can go out and train itself. So again, it’s a little bit different approach.

Dana Stanley: Interesting. So I have to ask, we’re here at the Net Promoter Conference, and by the time this interview is out, your release will have hit the wires. So tell me about this exciting initiative that you have going with Satmetrix.

Michael Tupanjanin: Well, from our perspective, it’s amazing on a couple of different levels. First, Satmetrix is clearly the leader in Net Promoter. They wrote the book on it. And they have established a very clear set of activities and workflows for people to actually improve their net Promoter Scores. So they are the methodological geniuses and also the workflow geniuses for helping companies improve their Net Promoter Score. And they’ve tied that directly to revenues, which is also a really, really good thing.

I think, from our perspective, being able to provide people a Net Promoter Score like a stock ticker, real-time, is huge. The old model has been you get your survey results back. You work on them and see how you improve over the next quarter. Now, you have an opportunity to actually see how you’re improving every 10 minutes if you need to, which is a huge breakthrough. And this is not an easy thing to do or replicate. From our perspective as a text analytics company, the fact that we have such high accuracy rates and the fact that our machine is flexible enough to actually take somebody else’s methodology and apply that to the social web is huge. There are very few people who can actually do that.

So from our perspective, it’s great. It also makes the information a lot more actionable. One of the things that I think the industry suffers from is that people sit there and say, yes, this sentence is positive. This is negative. Baggage handling was poor in this airport. What are we going to do? Who’s going to get that information, and what are they going to do with it? Being able to tie that to some kind of a standardized score for a company, I think, is a really big deal.

Romi Mahajan: So Dana, in about 45 minutes from this interview, but of course before this interview is published, there’ll be a piece of press on the wire around what we christened the SparkScore, which is a social NPS gauge. And it’s taking the notion of NPS, which is an industry-proven powerful methodology for loyalty and profit driving and completing the picture. The panorama is now complete. It used to be about structured, episodic, survey-based loyalty. And now it’s about the constant here-and-now social web loyalty. So we believe it’s a huge breakthrough for the industry, and Metavana’s very happy to power the SparkScore with, of course, Satmetrix, being the methodology and software provider.

Dana Stanley: So if I’m a customer who’s accustomed to using a Net Promoter Score, what will change for me?

Romi Mahajan: So I think your world gets better, slightly more complex but better, because we’re not saying don’t do normal Net Promoter. There’s a certain value in getting episodic structured data, longitudinally and otherwise. There’s also a certain value in understanding what’s being said anyway, unprompted, every day, 24/7, 365 worldwide. And so when you munge the two, you actually look at your business 360 degrees, as opposed to just seeing one fraction of not only the expression but also the ways in which customers express how they feel.

Dana Stanley:  That’s great, very exciting. So for the traditional, for lack of a better word, market research community, what should they take from this announcement?

Romi Mahajan: Let me break it into two categories. One’s smaller, and one’s bigger. So if the market research people who are familiar with, espouse, or follow NPS, clearly this is going to be a breakthrough, because it’s taking a very proven, powerful methodology and making it 21st century. It’s NPS 2.0. So for the NPS followers, it’s huge.

For the non-NPS followers, we’re all familiar enough with market research to know that it’s grappling with the abundance of data and the abundance of content and the burgeoning importance of the social web. And this allows them to start getting data and data feeds from the social web to use in anything, predictive analytics, reports, analysis of any sort. And so we believe that market research is an incredibly important part of the organization and of the industry.

But we also believe that it’s extremely limited by the technology. And now, we’re opening new business for them. So it’s about reinventing the industry and reinventing ourselves as market researchers.

Dana Stanley: Great. And if people want to learn more about the SparkScore, what should they do?

There’s a couple different things they can do if they’d like to learn more about the SparkScore. One is they can go to metavana.com. Then for second, go to satmetrix.com. Those are the best places to learn about the SparkScore. We will very shortly we will very shortly have a website called spark-score.com, very shortly, so not yet, in which people can play around with this and enter stuff in, and see what their score is.

Michael Tupanjanin: It’s interesting because it almost becomes, in a way, like the Klout score for companies, right? So we’re actually going to be posting a website that actually lists out, front and center, what people’s SparkScore is.

Dana Stanley: Interesting.

Michael Tupanjanin:  So anybody has access to it, whether it’s the companies themselves, customers, they’ll be able to go in, look at their Spark Score. We’re starting by rolling out five industries right now. But we think it’ll actually be very much like a corporate Klout score.

Romi Mahajan: Dana, under your tutelage, one day we hope that Research Access has an sNPS ticker running across it, so every company can come up and say, how are we doing?

Dana Stanley: So almost like a stock ticker concept?

Michael Tupanjanin: It is absolutely a stock ticker concept.

Dana Stanley: Very cool. Well, Michael, Romi, thank you for your time today.

Michael Tupanjanin: Appreciate it.

Romi Mahajan: Dana, our pleasure.

Related posts:

  1. Satmetrix Reveals Social Net Promoter Score
  2. Sentiment Analysis Firm Metavana’s New CMO, Romi Mahajan: An Interview
  3. DiscoverText aims to reinvent Text Analytics
  4. Ipsos Loyalty and Survey Analytics Strike Mobile Deal
  5. The Dilemma of Social Media Measurement
Advertisement
About Dana Stanley

Dana is the Editor-in-Chief of Research Access.