Old, but pretty scary!

Inventor(s): Nelson; Douglas J. , Columbia, MD
 Schone; Patrick John , Elkridge, MD
 Bates; Richard Michael , Greenbelt, MD

Applicant(s):The United States of America as represented by the National Security Agency, Washington, DC

Issued/Filed Dates: Aug. 10, 1999 / April 15, 1997

Application Number: US1997000834263

IPC Class: G06F 017/30;

Class: 707/531; 707/004; 707/532; 707/535; 707/512;

Field of Search: 704/010 707/512,532,535,531,3-5,7

Abstract:A method of  automatically generating a topical description of text by receiving the text containing input words; stemming each input word to its root form; assigning a user-definable part-of-speech score to each input word; assigning a language salience score to each input word; assigning an input-word score to each input  word; creating a tree structure under each input word, where each tree structure contains the definition of the corresponding input word; assigning a definition-word score to each definition word; collapsing each tree structure to a corresponding tree-word list; assigning a tree-word-list score to each entry in each tree-word list; combining the tree-word lists into a final word list; assigning each word in the final word list a final-word-list score; and choosing the top N scoring words in the final word list as the topic description of the input text. Document searching and sorting may be accomplished by performing the method described above on each document in a database and then comparing the similarity of the resulting topical descriptions.

"The NSA patent, granted on 10 August, is for a system of automatic topic spotting and labelling of data.  The patent officially confirms for the first time that the NSA has been working on ways of automatically analysing human speech. The NSA's invention is intended automatically to sift through human speech transcripts in any language. The patent document specifically mentions "machine-transcribed speech" as a potential source

