Question

Is there any news group on syntactic analysis?

Answer

Text summarization has become an important and timely tool for assisting and interpreting text information in today’s fast-growing information age. There is an abundance of text material available on the Internet, however, usually the Internet provides more information than is needed. Therefore, a twofold problem is encountered: searching for relevant documents through an overwhelming number of documents available, and absorbing a large quantity of relevant information. Summarization is a useful tool for selecting relevant texts, and for extracting the key points of each text. Some articles such as academic papers have accompanying abstracts, which present their key points. However, news articles have no such accompanying summaries, and their titles are not sufficient to convey their important points. Therefore, a summarization tool for news articles would be extremely useful, since for a given news topic or event, there are a large number of available articles from the various new agencies and newspapers. Because news articles have a highly structured document form, important ideas can be obtained from the text simply by selecting sentences based on their attributes and locations in the article, We propose a machine learning approach that uses artificial neural networks to produce summaries of arbitrary length of news articles. A neural network is trained on a corpus of articles. The neural network is then modified, through feature fusion, to produce a summary of highly ranked sentences in the article. Through feature fusion, the network discovers the importance of various features used to determine the summary-worthiness of each sentence, the input to the neural network can be either real or binary vectors. There are two divergent approaches to automatic text summarization: 1) summarization based on abstraction, where the text has to be understood, and the summary produces from such an understanding, and 2) summarization based on extraction, which involves selecting a number of important sentences from the source text. Summarization by abstraction is concerned with issues related to text understanding, semantic representation and modification, and natural language processing. A review of the abstraction approach can be found in . The first step in summarization by extraction is the identification of important features. There are two different types of features: non-structured features and structured features. One group of researches utilizes only non-structured features. On the other hand, a group of researches attempt to exploit structural relations between units of consideration. In our approach, we utilize a feature fusion technique to discover which features out of the available ones are actually useful, without manual intervention. There are three phases in our process: neural network training, feature fusion, and sentence selection. The first step involves training a neural network to recognize the type of sentences that should be included in the summary. The second step, feature fusion, prunes the neural network and collapses the hidden layer unit activations into discrete values with identified frequencies. This step generalizes the important features that must exist in the summary sentences by fusing the features and finding trends in the summary sentences. The third step, sentence selection, uses the modified neural network to filter the text and to select only the highly ranked sentences. This step controls the selection of the summary sentences in terms of their importance. These three steps are explained in detail in the next sections. The first phase of the process involves training the neural network to learn the types of sentences that should be included in the summary. The neural network learns the patterns inherent in sentences that should be included in the summary and those that should not be included. We use a three-layered feed forward neural network, which has been proven to be a universal function approximator. Once the network has learned the features that must exist in summary sentences, we need to discover the trends and relationships among the features that are inherent in the majority of sentences. This is accomplished by the feature fusion phase, which consists of two steps: 1) eliminating uncommon features 2) collapsing the effects of common features. Once the pruning step is complete, the network is trained with the same dataset in phase one to ensure that the recall accuracy of the network has not diminished significantly. If the recall accuracy of the network drops by more than 2%, the pruned connections and the neurons are restored and a stepwise pruning approach is pursued. In the stepwise pruning approach, the incoming and outgoing connections of the hidden layer neurons are pruned and the network is re-trained and tested for recall accuracy, one hidden layer neuron at a time. After pruning the network, the hidden layer activation values for each hidden layer neuron are clustered utilizing an adaptive clustering technique. Since dynamic clustering is order sensitive, the activation values are re-clustered. The radius of new clusters is set to one-half of the original clusters. The benefits of re-clustering are two-fold: due to order sensitivity of dynamic clustering, some of the activation values may be misclassified. Re-clustering alleviates this deficiency by classifying the activation values is appropriate clusters. Re-clustering with one-half of the original radius eliminates any possible overlaps among clusters. The combination of generalizing the effects of sentence features. Each cluster is identified by its centriod and frequency. Feature fusion phase provides control parameters, which can be used, for sentence making.

— Source: Wikipedia (www.wikipedia.org)