How to check spelling with NLP for .NET.

Article explains how to check spelling with NLP for .NET.

Text-proofing is one of the basic tasks in text processing. Searching for typos rarely requires cognitive activity. More over such activity is somehow undesirable because humans tend to correct mistakes unconsciously and just don't catch them. Computers could be more efficient than humans in this area, at least it greatly saves time to make a draft proofreading with a software program. Draft proofreading is relatively simple and is done on lexical level.

Suggesting correct spelling, finding a typo, which is lexically a correct word but logically is a mistake, or locating non-typical phrases ideally requires text understanding and common knowledge. That's why complete spell-checking task may be considered as part of artificial intelligence. Complete proofreading requires processing at semantic (or even pragmatic) level.

You can use NLP for .NET to create a draft spell checker or integrate proofing functionality into your application. The smallest sample requires 6 lines of code:

NLParser parser = new NLParser(); 
foreach(Lexeme lexeme in parser.Text<Lexeme>(@"c:\test.txt", Encoding.UTF8)) 
{ 
    if ((Lexeme.LexType.word == lexeme.LexemeType) && !lexeme.HasWords) 
        Console.WriteLine(lexeme.Text); 
} 

spell-checking tool demonstrates the online implementation.

similar code may be used to build a lexicon used by author or in corporate document storage.

Below is a proofreading console application written in C#. You can copy-paste, compile and execute the code in Visual Studio. In the same way you can process a bulk of documents or entire web-sites.

Suggestions

If you need suggestions for misspelled word, or you need your own dictionary, please use SpellChecker.
SpellChecker is specially designed for proofreading.

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

// reference to NlpLib.dll is required
using Nlp4Net.NlpLib;

namespace spellcheck
{
    class Program
    {

    // Return value: 
    // 'true' if lexeme is found in custom dictionary,
    // otherwise 'false'.
    private static bool IsInCustomDictionary(string szLexeme)
    {
        // Look up a lexeme in your own dictionary here.
        return false;
    }

    static void Main(string[] args)
    {
        if (0 == args.Length)
        {
            Console.WriteLine("usage: spellcheck.exe <fileName> [encoding] (default encoding is utf-8)");
            Console.WriteLine("examples:\r\n");
            Console.WriteLine("spellcheck.exe   text.txt");
            Console.WriteLine("spellcheck.exe   text.txt    ASCI");
            return;
        }

        string szFile = args[0];

        // assure file exists
        if (!File.Exists(szFile))
        {
            Console.WriteLine("Cannot find file: " + szFile);
            return;
        }

        Encoding encoding = Encoding.UTF8; // assume default encoding UTF8

        // is encoding specified explicitly?
        if (args.Length > 1)
            encoding = Encoding.GetEncoding(args[1]);

        NLParser parser = new NLParser();

        // counters
        long lngKnownWords = 0;
        long lngMisspelledWords = 0;

        // store misspelled words in dictionary
        SortedList<string, int> lstMisspelledWords = new SortedList<string, int>(StringComparer.InvariantCultureIgnoreCase);

        using (StreamReader reader = new StreamReader(szFile, encoding))
        {
            // enumerate through all Lexemes in the text stream.
            foreach (Lexeme lexeme in parser.Text<Lexeme>(reader))
            {
                // For spell-check proofing only words are relevant.
                // Skip spaces (LexType.space) and format lexemes (LexType.format)
                if (Lexeme.LexType.word != lexeme.LexemeType)
                    continue;

                // OK: If Lexeme has words, it is found in dictionary.
                if (lexeme.HasWords)
                {
                    lngKnownWords++;
                    continue; // OK: Lexeme found in built-in dictionary.
                }

                // Misspelled word. Last check: look up in user dictionary:
                if (IsInCustomDictionary(lexeme.Text))
                {
                    lngKnownWords++;
                    continue; // OK, lexeme found in custom dictionary. 
                }

                lngMisspelledWords++;
                if (!lstMisspelledWords.ContainsKey(lexeme.Text))
                {
                    lstMisspelledWords[lexeme.Text] = 1;
                }
                else
                {
                    lstMisspelledWords[lexeme.Text] = lstMisspelledWords[lexeme.Text] + 1;
                }
            }
        }

        // show results
        long lngTotalWords = lngMisspelledWords + lngKnownWords;
        if (0 == lngTotalWords)
        {
            Console.WriteLine("No words found in the file: " + szFile);
        }
        else
        {

            Console.WriteLine(string.Format("{0} lexemes in file: {1}", lngTotalWords, szFile));
            if (0 == lngMisspelledWords)
            {
                Console.ForegroundColor = ConsoleColor.Green;
                Console.WriteLine("All words are correct.");
            }
            else {

                Console.WriteLine(string.Format("Misspelled words: {0} ({1}%)"
                    , lstMisspelledWords.Keys.Count
                    , Math.Round((decimal) (lngMisspelledWords * 100 / lngTotalWords))
                ));

                Console.WriteLine("There are misspelled words:\r\n");
                Console.ForegroundColor = ConsoleColor.Red;
                foreach (string szMisspelledWord in lstMisspelledWords.Keys)
                {
                    Console.WriteLine(szMisspelledWord);
                }
            }
        }    
    }
    } // class
} // namespace