You can find part of speech in 2 ways: by enumerating Lexemes in a text
and by enumerating Utterances with NLParser.
Here are the code samples showing how to parse text:
- How to check spelling with NLP for .NET
- How to implement English syntax parser
When enumerating through Lexemes syntax relations between the words are
ignored which results in a high lexical ambiguity (when many
Words are associated with the same
Lexeme).
Enumeration through Utterances takes into account syntax relations. It costs
significantly more CPU resources but results in significantly smaller syntax ambiguity (when many
syntax diagrams are associated with the same Utterance).
Word collocation.
Collocation
typically uses statistical data to disambiguate a sequence of lexemes.
NLP for .NET relies on Reed-Kellogg grammar and doesn't use statistical
approach because statistically correct parsers are in fact not always
correct. Another general problem with collocation is that related words may
be close to each other in a syntax graph but quite far in a lexeme sequence.
Utterances syntax ambiguity (number of possible syntax graphs) is in
orders smaller than lexical ambiguity (number of possible Words combinations).
Consider 2 phrases:
cherry tree
and
tree the cherry
The lexeme cherry may be used as an
adjective or noun; the lexeme tree may be used as noun or verb, which
results in 4 possible combinations of words in
a phrase cherry tree, while only one (cherry:adjective + tree:noun) is correct. Lexical ambiguity is 4, while syntax ambiguity is 1.
NLParser parser = new NLParser();
foreach(Utterance utterance in parser.Text<Utterance>("cherry tree")
{
if ((null != utterance.Syntaxes) && (0 !=
utterance.Syntaxes.Length))
Console.WriteLine(utterance.Syntaxes[0].ToString());
}
The longer an utterance is, the greater will be the difference between lexical and syntax ambiguity.
You don't have to care about collocation because grammar seems to be more
powerful instrument than statistics in this case. If you have statistical
data you can improve syntax result further,
although it may be more straightforward to apply semantic and pragmatic
restrictions.