Namespace: Nlp4Net.NlpLib Assembly: NlpLib.dll
Unique features.
Single SpellChecker instance can be used for mutual purposes.
You can simply verify if exact word is present in the speller dictionary, get suggestions for a misspelled word or perform autoc-ompletion.
Force suggestion is a unique option of SpellChecker.
It allows you to get nearest suggestions even if verified word is correct.
For example, if your user has found an exactly matching product he or she may be interested in products with similar spelling.
Full control over dictionary.
You're not forced to use any built-in lexicon; instead you have full control over speller dictionary.
Use Add() and AddRange() methods to create your dictionary.
Dictionary may be a list of your products, names in HR database, e-mails, urls or a non-English language dictionary.
Once dictionary is created, use Save() and Load() methods to serialize speller into binary or XML-File.
Depending on your implementation you can let the user add custom words to the speller.
Using ICollection<string>
you can enumerate through dictionary.
You have full control over word normalization.
You can have case-sensitive, case-insensitive speller or implement your own normalization algorithm.
Suggestion quality.
Nlp4Net Speller has high suggestion quality.
For a misspelled word there may be zero or one correct suggestion.
Suggestion quality is a ratio of correct suggestions to total number of suggestions.
In most cases Nlp4Net SpellChecker suggests a single word, which allows using it in automatic text processing, when no human interaction is possible.
Serialization.
You can serialize SpellChecker into binary or XML-Stream.
Load() and Save() methods are shortcuts to serialize directly into a file.
Thread safety and performance.
SpellChecker is a high performance speller designed for multithreaded applications.
Application can get the best performance if it loads or fills in SpellChecker first and then performs only read operaions.
Read operations like SpellCheck() or GetEnumerator() do not alter state of a SpellChecker
and can be safely called from multiple threads simultaneously. Do not put any lock on read operations for better performance.
You have to take care about synchrnonization if you do write operations like Add(), Remove(), Clear()
SpellChecker is neither thread safe for writing, nor it implements any locks internally.
| Constructors |
Description |
SpellChecker() |
creates SpellChecker with default settings:
SpellChecker.MaxSuggestions = 5;
SpellChecker.LCID = 1033; //(en-US)
SpellChecker.Normalizer = StringNormalizer.ToLowerInvariant; // case insensitive spell-checking
NlpLib.SpellChecker.WordSeparator=" ";
|
SpellChecker(INormalizer normalizer) |
creates SpellChecker with default settings and specified normalizer
Before SpellChecker processes a string in methods like Add(), AddRange(), Remove(), SpellCheck() it
converts it into a normalized form by calling INormalizer.Normalize()
By providing normalizer you can control normalization behavior. For example converting string into upper case will make spell-checker case insensitive.
If you pass null normalizer, strings are processed as is without modification.
You can use StringNormalizer.ToLowerInvariant, StringNormalizer.ToLowerInvariantTrim or your own implementation.
StringNormalizer.ToLowerInvariant converts string to lower case.
StringNormalizer.ToLowerInvariantTrim converts string to lower case and trims spaces.
|
SpellChecker(IDictionary configuration) |
Allows custom configuraiton. Supported properties are:
"NlpLib.SpellChecker.Normalizer"
"NlpLib.SpellChecker.LCID"
"NlpLib.SpellChecker.MaxSuggestions"
"NlpLib.SpellChecker.WordSeparator"
MaxSuggestions controls maximum number of suggestions. Number of actual suggestions may be less or equal to MaxSuggestions
WordSeparator is a string, inserted in suggestions when misspelled word consists of sevral words. Default value is a single space: " ".
For example SpellCheck("hotdog", out suggestions) will result in suggestion: "hot dog". (if SpellChecker contains words "hot" and "dog").
If you configure WordSeparator="+" the suggestion will be "hot+dog".
WordSeparator may affect suggestions only. SpellChecker never does word breaking of incomming strings in Add(), AddRange(), Remove(), SpellCheck()
|
SpellChecker(INormalizer normalizer, int LCID) |
creates SpellChecker with default settings and specified normalizer and LCID
SpellChecker.LCID property can allow different spelling algorithms depending on language.
Currently only en-US rules are implemented and the property takes no effect.
In most cases you can still apply SpellChecker for different languages, but it may not always work.
German or Japanese scripts have different word breaking, which must be honored by a speller.
|
| Methods |
Description |
void Add(string word) |
Adds a word to SpellChecker dictionary.
If SpellChecker.Normalizer is not null, INormalizer.Normalize() is called and the normalized word is added instead of original word.
|
void AddRage(IEnumerable<string> words) |
Similar to void Add(string word) but for the range of words.
You can pass a WordDictionary as words parameter.
|
void Clear() |
Removes all words from SpellChecker |
bool Contains(string word) |
Returns true if word is in dictionary.
Word is normalized first and then searched in dictionary.
|
void CopyTo(string[] array, int arrayIndex) |
Copies whole internal dictionary into string array.
|
static SpellChecker Create(string fileName) |
Reads SpellChecker from a file.
If fileName has .xml extension, uses XML format, otherwise uses binary format.
|
IEnumerator<string> GetEnumerator()
|
You can enumerate dictionary using foreach statement. |
XmlSchema GetSchema() |
Always returns null. |
void ReadXml(XmlReader reader)
|
Loads SpellChecker from XML stream. |
bool Remove(string word) |
Returns true if word is removed from dictionary.
Word is normalized first and then the normalized form is removed from dictionary.
|
void Save(string fileName) |
Saves SpellChecker into a file.
If fileName has .xml extension, uses XML format, otherwise uses binary format.
|
virtual bool SpellCheck(string word) |
Returns true if word is correct, otherwise false. Word is considered to be correct if it is found in dictionary.
Word is normalized first and then the normalized form is searched in dictionary.
|
virtual bool SpellCheck(string word, out IEnumerable<string> suggestions) |
If word is correct returns true and empty suggestions.
If word is incorrect returns false and usually non-empty suggestions.
Word is normalized first and then the normalized form is searched in dictionary.
|
virtual bool SpellCheck(string word, out IEnumerable<string> suggestions, Options options) |
If word is correct returns true.
If word is incorrect returns false and usually non-empty suggestions.
Word is normalized first and then the normalized form is searched in dictionary.
options parameter controlls SpellChecker behavior.
With Options.Default or Options.None SpellChecker tries to correct misspelled word.
With Options.AutoComplete SpellChecker tries to complete the word, assuming it is a correct beginning.
Method returns true if word doesn't require completion. Method returns false if word requires completion and suggestions are possible completions.
Options.DisableWordBreak prohibits splitting misspelled word in multiple words. It can be usefull if you always expect a single word.
Options.ForceSuggestions will cause SpellChecker to fill in suggestions with nearest spellings even if word is correct and method returns true.
With Options.KeepCapitalization flag SpellChecker will try
to detect lower-case, UPPER-CASE or Title-capitalization in a word parameter and apply the same capitalization to suggestions.
The flag is included in Options.Default option.
|
void WriteXml(XmlWriter writer) |
Serializes SpellChecker into XML-Stream. |
| Properties |
Description |
int Count |
Number of words in a SpellChecker |
bool IsReadOnly |
Always returns false. |
int LCID |
Reserved for the future to apply different spelling algorithms depending on a language. |
int MaxSuggestions |
Maximum number of suggestions.
SpellChecker returns only the nearest words to the misspelled word.
If number of the nearest words is less than MaxSuggestions, then only these suggestions are returned.
That's why suggestions count may be less than MaxSuggestions.
You can set MaxSuggestions=1 if you need to do corrections without human intervention.
|
INormalizer Normalizer
|
Performs word normalization before it is processed.
Proprety can be null.
You cannot change Normalizer if SpellChecker is not empty.
|
object IUserData.UserData |
Your arbitrary object |