Keyword builder
demonstrates how natural language parser can be used to build a list of keywords.
Web user interface.
In the main field enter the text, for which you want to build a list of keywords.
Keywords count field limits the maximum number of keywords.
Use verbs check box controls if verbs can appear in a keyword list.
Verbs are essential because they characterize a topic of a document,
but in search queries verbs seems to be used less frequent than nouns.
The checkbox allows you to control what kind of keywords to build.
How it works.
To extract keywords Keyword builder
uses syntax relations between words in an utterance and word syntax
information like part of speech and additional syntax tags (like proper).
Importance of a word in an utterance
may be determined by its syntax role. Subject, verb or object typically convey
the most important information, while adjective or adverb modifiers play a helper role by adding more details to core meaning.
This fact may be used when calculating importance of a word in a text.
Keyword builder checks subject, verb, directObject,
indirectObject, and subectComplement:
| Enumeration |
Description |
| subject |
The king gave Anne Boleyn his love. |
| verb |
The king gave Anne Boleyn his love. |
| directObject |
The king gave Anne Boleyn his love. |
| indirectObject |
The king gave Anne Boleyn his love. |
| subjectComplementNoun |
The king is Henry VIII. |
Additionally part of speech may be examined. For example, auxiliary verbs can not describe
a topic of a document. Proper nouns better describe document idea than common
nouns because they are more unique.
Pronouns are skipped because they substitute nouns. (If pronoun appears as a
subject or complement, you can increase the weight of referenced nouns)
Syntactically important words, which appear in the text more frequently are
suggested as a keywords.
Demo doesn't use synonyms or word inflections.
For simplicity Keyword builder doesn't use word inflections. For example
gave and give will appear as different words.
Demo doesn't use synonyms. King and emperor will appear as different words.
It is not a limitation of an algorithm; for real application you need to take it
into account when gathering keyword statistics.
Keyword builder relies on syntax information. It is more accurate than pure
statistic approach or using format information (like headers).
It can be applied without having semantic database.
Keyword builder doesn't use word meanings. If you have meanings database you can
replace the lexemes with your meanings. It could
give you much better results.
Depending on your objectives, a small database for your knowledge domain may be sufficient.
Keyword builder is more efficient for monotonic documents. Though it is a
general limitation of characterizing a text with a set of keywords.
For developers
How to build a list of keywords