How to find part of speech.

Article explains how to find part of speech.

You can find part of speech in 2 ways: by enumerating Lexemes in a text and by enumerating Utterances with NLParser.

Here are the code samples showing how to parse text:

  1. How to check spelling with NLP for .NET
  2. How to implement English syntax parser

When enumerating through Lexemes syntax relations between the words are ignored which results in a high lexical ambiguity (when many Words are associated with the same Lexeme).

Enumeration through Utterances takes into account syntax relations. It costs significantly more CPU resources but results in significantly smaller syntax ambiguity (when many syntax diagrams are associated with the same Utterance).

Word collocation.

Collocation typically uses statistical data to disambiguate a sequence of lexemes.

NLP for .NET relies on Reed-Kellogg grammar and doesn't use statistical approach because statistically correct parsers are in fact not always correct. Another general problem with collocation is that related words may be close to each other in a syntax graph but quite far in a lexeme sequence.

Utterances syntax ambiguity (number of possible syntax graphs) is in orders smaller than lexical ambiguity (number of possible Words combinations).

Consider 2 phrases:
cherry tree
and
tree the cherry

The lexeme cherry may be used as an adjective or noun; the lexeme tree may be used as noun or verb, which results in 4 possible combinations of words in a phrase cherry tree, while only one (cherry:adjective + tree:noun) is correct. Lexical ambiguity is 4, while syntax ambiguity is 1.

NLParser parser = new NLParser();
foreach(Utterance utterance in parser.Text<Utterance>("cherry tree")
{
     if ((null != utterance.Syntaxes) && (0 != utterance.Syntaxes.Length))
           Console.WriteLine(utterance.Syntaxes[0].ToString());
}

The longer an utterance is, the greater will be the difference between lexical and syntax ambiguity.

You don't have to care about collocation because grammar seems to be more powerful instrument than statistics in this case. If you have statistical data you can improve syntax result further, although it may be more straightforward to apply semantic and pragmatic restrictions.