Training your NLP model – best practices in writing utterances by Grant Ronald


Stop the press – an NLP model is only as good as the data you train it with. I think we can all generally agree on that statement. Also, when it comes to training intents with utterances, ultimately the best data is real data. However, you have to start somewhere and to do so you need to bootstrap your NLP model by creating your own utterances.

Even with the best intentions in the world, synthetic utterances can introduce bias, often because we all have our own myopic view of the world. It is also fair to ask the question “what makes a good utterances or training corpus?” So how might you kick-start training your NLP model?

Remember the basics

You don’t have to actually know much, or anything, about NLP so long as you follow some general guidelines. From my experiences, here are the guidelines I try to follow:

  • Focus on the essence of the intent and what makes it unique: remember the goal is not to “understand”, but to classify. You must ensure intents are designed to be as distinct as possible and the utterances will define what makes those intents distinct. Think of the rings in the Olympic flag; ideally each intent should have no overlap (or as little as possible).
  • As you create utterances, focus on what it is that makes this intent distinct from any other – that is the essence you want to capture in your utterances. Read the complete article here.

Developer Partner Community

For regular information become a member in the Developer Partner Community please register here.

clip_image003 Blog clip_image005 Twitter clip_image004 LinkedIn image[7][2][2][2] Facebook image Meetups

Technorati Tags: PaaS,Cloud,Middleware Update,WebLogic, WebLogic


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.