Natural Language syntax parser (NLParser)

NLParser class description

Namespace: Nlp4Net.NlpLib Assembly: NlpLib.dll

public class NLParser : IUserData


NLParser is a natural language parser that allows lexical and syntax parsing of English text.

Note: converting plain text into Lexemes and Utterances.

The easiest way is to use NLParser.Text<Lexeme>() or NLParser.Text<Utterance>() eumerators:

NLParser parser = new NLParser();
foreach(Lexeme lexeme in parser.Text<Lexeme>(@"c:\test.txt", Encoding.UTF8))
{
    if ((Lexeme.LexType.word == lexeme.LexemeType) && !lexeme.HasWords)
        Console.WriteLine(lexeme.Text);
}

Alternatively you can use Parse() method and subscribe to NLParser.OnLexeme or NLParser.OnUtterance event. When parsing is complete, call Flush() method otherwise some portion of text may remain in internal buffers. You can continue parsing after calling Flush().

Flush() may be used if you want to force an end of an utterance.

Note: NLParser supports plain text or Lexemes as input

If you use other formats like HTML, PDF, RTF etc. you need to convert them either into plain text or in Lexemes. You can include format or any arbitrary information into Utterance using Word of Word.LexType.format type. Such words are not processed but included into Utterance

Plain text doesn't necessarily means ASCI encoding. NLParser relies on Unicode standard.

Note: performance

Syntax parsing is CPU-expensive operation. Subscribing to NLParser.OnUtterance or calling NLParser.Text<Utterance>() switches the syntax parser on.

If you don't need syntax processing, please use NLParser.OnLexeme or NLParser.Text<Lexeme>() enumerator only. It requires only lexical parser and is significantly faster.

More samples

  1. How to check spelling with NLP for .NET
  2. How to implement English syntax parser



Constructors Description
Nlp4Net method NLParser() Creates NLParser with default settings
Nlp4Net method NLParser(IDictionary configuration) Allows custom configuraiton



Methods Description
Nlp4Net method IEnumerable<T> Text<T>(TextReader reader) Enumerates Lexemes or Utterances in a passed stream.

Use it in foreach enumerator.

T may be of Type Lexeme, Utterance or object

TextReader parameter is not closed. You have to close TextReader to free resources.
Nlp4Net method IEnumerable<T> Text<T>(string text) Enumerates Lexemes or Utterances in a passed string.

Use it in foreach enumerator.

T may be of Type Lexeme, Utterance or object
Nlp4Net method IEnumerable<T> Text<T>(string file, Encoding encoding) Enumerates all Lexemes or Utterances in a text file.

Use it in foreach enumerator.

T may be of Type Lexeme, Utterance or object
Nlp4Net method Flush() Forces Utterance completion.

NLParser may have charecters or Lexemes in internal buffer, waiting for more text.
By calling Flush() you indicate a sentence/phrase break.

For example, call Flush() in a chat application, when user presses Enter.

You can call Parse() and Flush() method any number of times and in any sequence.

You don't have to call Flush() if you don't know sentence breaks.
For example if you read a text file line by line, call Parse() method for each line and call Flush() only once at the end of file.
Nlp4Net method Parse(object text) Submits text for processing.

Object parameter may be of type string, char[] or Lexeme.

To get results subscribe to OnLexeme or OnUtterance events.

By feeding Lexemes in Parse() method, you can use your own Words.
For example if you need your own dictionary, in OCS or speech recognition applications.

You can mix calls with string, char[] and Lexeme.



Properties Description
Nlp4Net propertyobject IUserData.UserData Your arbitrary object



Events Description
Nlp4Net event OnLexeme Fired after Lexical parser but before Syntax parser, when Lexeme is ready. You can use this event to alter the Lexeme
For example if Lexeme is unknown, you can look it up in your dictionary and associate it with Words
Nlp4Net event OnUtterance Fired when Syntax parser has completed an Utterance