Stop the press – an NLP model is only as good as the data you train it with. I think we can all generally agree on that statement. Also, when it comes to training intents with utterances, ultimately the best data is real data. However, you have to start somewhere and to do so you need to bootstrap your NLP model by creating your own utterances.
Even with the best intentions in the world, synthetic utterances can introduce bias, often because we all have our own myopic view of the world. It is also fair to ask the question “what makes a good utterances or training corpus?” So how might you kick-start training your NLP model?
Remember the basics
You don’t have to actually know much, or anything, about NLP so long as you follow some general guidelines. From my experiences, here are the guidelines I try to follow:
- Focus on the essence of the intent and what makes it unique: remember the goal is not to “understand”, but to classify. You must ensure intents are designed to be as distinct as possible and the utterances will define what makes those intents distinct. Think of the rings in the Olympic flag; ideally each intent should have no overlap (or as little as possible).
- As you create utterances, focus on what it is that makes this intent distinct from any other – that is the essence you want to capture in your utterances. Read the complete article here.
For regular information become a member in the Developer Partner Community please register here.
Blog
Twitter
LinkedIn
Facebook
Meetups
Technorati Tags: PaaS,Cloud,Middleware Update,WebLogic, WebLogic