Namespace: Nlp4Net.NlpLib Assembly: NlpLib.dll
public class NLParser : IUserData
NLParser is a natural language parser that allows lexical and syntax parsing of English text.
Note: converting plain text into Lexemes and Utterances.
The easiest way is to use NLParser.Text<Lexeme>() or NLParser.Text<Utterance>() eumerators:
NLParser parser = new NLParser();
foreach(Lexeme lexeme in parser.Text<Lexeme>(@"c:\test.txt", Encoding.UTF8))
{
if ((Lexeme.LexType.word == lexeme.LexemeType) && !lexeme.HasWords)
Console.WriteLine(lexeme.Text);
}
Alternatively you can use Parse() method and subscribe to NLParser.OnLexeme or
NLParser.OnUtterance event. When parsing is complete, call Flush() method
otherwise some portion of text may remain in internal buffers. You can
continue parsing after calling Flush().
Flush() may be used if you want to force an end of an utterance.
Note: NLParser supports plain text or Lexemes as input
If you use other formats like HTML, PDF, RTF etc. you need to convert them either into plain text or in Lexemes.
You can include format or any arbitrary information into Utterance using
Word of Word.LexType.format type. Such words are not processed but included into Utterance
Plain text doesn't necessarily means ASCI encoding. NLParser relies on
Unicode standard.
Note: performance
Syntax parsing is CPU-expensive operation. Subscribing to
NLParser.OnUtterance or calling NLParser.Text<Utterance>() switches the syntax
parser on.
If you don't need syntax processing, please use NLParser.OnLexeme or
NLParser.Text<Lexeme>() enumerator only. It requires only lexical
parser and is significantly faster.
More samples
- How to check spelling with NLP for .NET
- How to implement English syntax parser
| Constructors |
Description |
NLParser() |
Creates NLParser with default settings |
NLParser(IDictionary configuration) |
Allows custom configuraiton |
| Methods |
Description |
IEnumerable<T>
Text<T>(TextReader reader) |
Enumerates Lexemes or
Utterances in a passed stream.
Use it in foreach enumerator.
T may be of Type Lexeme, Utterance or object
TextReader parameter is not closed. You have to close TextReader to free resources.
|
IEnumerable<T> Text<T>(string text) |
Enumerates Lexemes or Utterances in a passed string.
Use it in foreach enumerator.
T may be of Type Lexeme, Utterance or object
|
IEnumerable<T> Text<T>(string file, Encoding encoding) |
Enumerates all Lexemes
or Utterances in a text file.
Use it in foreach enumerator.
T may be of Type Lexeme, Utterance or object
|
Flush() |
Forces Utterance completion.
NLParser may have charecters or Lexemes in internal buffer, waiting for more text.
By calling Flush() you indicate a sentence/phrase break.
For example, call Flush() in a chat application, when user presses Enter.
You can call Parse() and Flush() method any number of times and in any sequence.
You don't have to call Flush() if you don't know sentence breaks.
For example if you read a text file line by line, call Parse() method for each line and call Flush() only once at the end of file.
|
Parse(object text) |
Submits text for processing.
Object parameter may be of type string,
char[] or Lexeme.
To get results subscribe to OnLexeme or OnUtterance events.
By feeding Lexemes in Parse() method, you can use your own Words.
For example if you need your own dictionary, in OCS or speech recognition applications.
You can mix calls with string,
char[] and Lexeme.
|
| Events |
Description |
OnLexeme |
Fired after Lexical parser but before Syntax parser,
when Lexeme is ready.
You can use this event to alter the Lexeme
For example if Lexeme is unknown, you can look it up in your dictionary and associate it with Words
|
OnUtterance |
Fired when Syntax parser has completed an Utterance
|