| Faced with the vast overload of potentially relevant information, language technology is also providing some solutions to the problem of making one's way through it, cutting down the bulk of longer texts, while preserving (in understandable form) their major points and perhaps references to points of specific interest.
Amazingly, for general summaries the most practical techniques do not try to build up an "understanding" of texts at all. Rather they adopt the Salience approach, ie: they use word frequency counts to identify and preserve the most significant sentences, measured against the background of words used in the rest of the article. This has the advantages of being language-independent as a technique, and also guaranteeing that the results are at least grammatical.
Another important application is to build up a cumulative record of interesting patterns of events that may be reported in a vast volume of free text. This is called Information Extraction: it might for example work through a set of casualty reports to create a database, from which an analysis of trends or underlying causation might emerge. Here, the best techniques employ a kind of template, looking to fit the activities described into a given pattern: so it makes sense to see the systems as having a rudimentary "understanding", distinguishing e.g. agents from victims or locations.
|