 
            



  \documentstyle[11pt]{article}
  \setlength{\textwidth}{12.2cm}
  \setlength{\textheight}{19.5cm}
  \begin{document}
  \pagestyle{empty}
  \bibliographystyle{plain}


\title {      A Rigoristic and Automated Analysis of Texts 
      Applied to A Scientific Abstract by Mark Sergot and Others}
\author{ Gregers Koch
\\       \\     Department of Computer Science (DIKU)
  \\                 Copenhagen University
    \\   Universitetsparken 1, DK-2100 Copenhagen, Denmark
    \\              gregers@diku.dk}
\date{}
\maketitle

\thispagestyle{empty}
\begin{abstract}
We discuss the automated construction of a translation mechanism 
capable of translating from any given textual input into some preferred 
logical notation. By constructing a syntax tree for the input text, we can 
augment it with semantic features through a dataflow analysis, in such a way
that the parser in the shape of a (DCG) Prolog program 
may be extracted immediately. Over the years, our methods building on those
principles, have been developed so that they seem now to be ripe for 
manipulating normal complicated texts meant for human communication. Here
scientific abstracts seem to be very interesting. We shall
discuss the analysis of a particular scientific abstract, namely that from an
article by M. Sergot et al. \cite{MS86}. A highly 
tentative comparison is made
with several important alternative approaches, known from the scientific
literature.
\end{abstract}

\section{The Method for Analysis of Texts}
\subsection{Introduction}

 As an activity combining logic programming and language technology,
 we are studying certain problems in computational linguistics,
 in particular problems concerning the automation of dataflow analysis
 leading to the automatic synthesis of a special kind of dataflow
 structures. This automated approach may be applied for inductive
 purposes. The approach is sometimes called logico-semantic induction,
 and it constitutes an efficient kind of automated program synthesis. 
    Here we shall discuss the automated construction of such a translation 
 mechanism capable of translating from any given textual input into some 
 preferred logical notation. By means of a context-free grammar and using 
 this for constructing a syntax tree for the input text, we can augment 
 the syntax tree with semantic features through a data flow analysis
 \cite{BK91}. 
 This can be done in such a way that the parser in the shape of a Definite
 Clause Grammar (DCG) or Prolog program may be extracted immediately. 


\subsection{A Small Example}


    Before starting with the text, we might as well analyse at least
 one utterly simple example in order to present and understand the 
 approach recommended here. The tiny example is the following sentence 
 consisting of three words only:

\begin{verbatim}
     Peter kicks Pluto
\end{verbatim}

 This sentence can trivially be described by the following context-free
 grammar

\begin{verbatim}
      S -> NP VP. 
      NP -> Prop. 
      VP -> TV NP.
\end{verbatim}

    The syntax tree corresponding to the sentence may be augmented into
 a dataflow structure.


\begin{verbatim}
  S :                           [       ^ ]
                                        |
    NP :              [ O<--^ ,     ^ , O ]
                        |   |       |
         PROP :         | [ O ]     |
                        |           |
    VP :             [  V ,         ^ ]
                        |           |
           TV :      [  V , ^ , O ] |  
                            |   |   |
           NP :           [ ^ , V , O ]
                            |
               PROP :     [ O ]

\end{verbatim}
 This structure can easily be transformed into a small logic program

\begin{verbatim}
     s(Z) --> np(X,Y,Z), vp(X,Y).
     np(X,Y,Y) --> prop(X).
     vp(X,W) --> tv(X,Y,Z), np(Y,Z,W).
\end{verbatim}
 Supplied with suitable lexical information, for instance

\begin{verbatim}
     prop(peter) --> [peter].
     prop(pluto) --> [pluto].
     tv(X,Y,kicks(X,Y)) --> [kicks].
\end{verbatim}

 it constitutes a executable logic program, the parser.
    If we execute this program with the given sentence as input, we shall 
 get the following matchings (here we change the names of the variables
 to obtain unambiguity)


\begin{verbatim}
                           s( Z )
                           |
          -------------------------------
          |                             |
         np( X , Y , Z )               vp( X , Y )
          |                             |
          |      Y = Z       ----------------------
          |                  |                    |
         prop( X )          tv( X , Y1 , Z1 )     np( Y1 , Z1 , Y )
          |                  |                    |
          | X = peter        | Z1 = kicks(X,Y1)   |       Z1 = Y
          |                  |                    |
          |                  |                   prop( Y1 )
          |                  |                    |
          |                  |                    | Y1 = pluto
          |                  |                    |
        peter              kicks                pluto

\end{verbatim}

 The matchings (unifications) obtained through the execution of the program
 would result in the following equations

\begin{verbatim}
     Y = Z
     X = peter
     Z1 = kicks(X,Y1)
     Z1 = Y
     Y1 = pluto
\end{verbatim}

 This small equational system will have the following solution

\begin{verbatim}
     [peter kicks pluto]
           = Z
           = Y
           = Z1
           = kicks(X,Y1)
           = kicks(peter,pluto)
\end{verbatim}

 And that gives us a precise suggestion for a semantic representation
 of the input sentence.
    Instead, we might have obtained the logic program from the suggested
 input and output. This can be done automatically by means of one of our
 inductive meta programs \cite{GK00},\cite{GK00a},\cite{GK00b}.
    This parser needs test runs, both forward and backward. In case all
 of this has been successful, we definitely have made a very
 detailed scrutiny of the constructive situation.
    And now, looking at the control aspects, let us make a list or schema
 of the analytical steps performed here. The steps are

\begin{verbatim}
     1. example
     2. syntax
     3. dataflow
     4. definite clause grammar
     5. lexical information
     6. execution 
     7. equations
     8. solution
     9. induction
     10. forward test run.
     11. backward test run.

\end{verbatim}

 The most important description levels in this example seem to be:
 1) the text (input), 2) the formula (output or semantic representation),
 3) the dataflow, 4) the parser.


\subsection{Application to a Scientific Abstract}

    Over the years, our methods building on those principles, have 
 been developed so that they seem now to be ripe for manipulating 
 normal complicated texts meant for human communication. Here we
 consider scientific abstracts to be highly interesting, mainly because they
 normally are concise and very complicated. In what follows we shall
 discuss the analysis of a particular scientific abstract, namely that from an
 article by M. Sergot et al. \cite{MS86}. The text simply reads
 
\begin{verbatim}
 "The formalization of legislation and the development
  of computer systems to assist with legal problem 
  solving provide a rich domain for developing and 
  testing artificial-intelligence technology".
\end{verbatim}

 The resulting semantic structure became the following


\begin{verbatim}
 result(empty(x,legislation(x)&
   the(y,of(formalization(y),x)&
    empty(x1,computer(systems(x1))
     &empty(w,legal(problem(solving(w)))&with(assist(x1),w))
     &the(y1,of(development(y1),x1)
        &empty(v,artificialintelligence(technology(v))
         &exists(z,for(rich(domain(z)),
               developingAndTesting(t,v))
             &provide(y&y1,z)))))))).
\end{verbatim}


 Comparison with some important alternative approaches like Kamp 
 et al. \cite{kamp},
 Schank et al. \cite{RS}, Kawaguchi 
 et al. \cite{EK97},\cite{SY00}, suggests a tentative conclusion that
 our approach seems to be unique in its capability of handling the design of
 representation and in its partial solutions to problems of manual programming
 and automated program synthesis.



\begin{thebibliography}{99}


\bibitem{BK91}
 C. G. Brown and G. Koch, eds., {\it Natural Language Understanding
        and Logic Programming, III}, (North-Holland, 1991).

\bibitem{kamp}
Kamp, H. and Reyle U.
\newblock {\em From Discourse to Logic.}
\newblock Kluwer, Amsterdam, 1993.


\bibitem{HK97}
 H. Kangassalo et al., eds., {\it Information Modelling and Knowledge
 Bases VIII}, IOS, 1997.

\bibitem{EK97}
 E. Kawaguchi et al., Toward development of multimedia database
 system for conversational natural language, 69-84 in [3].

\bibitem{EK00}
 E. Kawaguchi et al., eds., {\it Information Modelling and Knowledge
 Bases XI}, IOS, 2000.

\bibitem{GK00}
 G. Koch, Some perspectives on induction in discourse representation, 
 318-327, in A. Gelbukh (ed.), {\it Proceedings of
      CICLing-2000}, Mexico City, Mexico, 2000.


\bibitem{GK00a}
 G. Koch, A method for making computational sense of Situation Semantics, 
 308-317, in A. Gelbukh (ed.), {\it Proceedings of
      CICLing-2000}, Mexico City, Mexico, 2000.


\bibitem{GK00b}
 G. Koch, A method of automated semantic parser generation with an
      application to language technology, 103-108 in [5].

\bibitem{RS}
 R. C. Schank, {\it Dynamic Memory}, Cambridge University Press, 1982.

\bibitem{MS86}
 M. Sergot et al., {\it Communications of the ACM}, vol. 29 no. 5 pp. 370-386.

\bibitem{SY00}
 S. Yoshihara et al., An experiment on Japanese sentence generation
 from SD-formed semantic data, 205-221, in [5].


\end{thebibliography}

{\bf {\it Gregers Koch}} is a professor of computer science and
computational linguistics at the Department of Computer Science (DIKU)
at Copenhagen University, Universitetsparken 1, DK 2100 Copenhagen,
Denmark. He has published around 90 scientific articles. He can be reached
by email: gregers@diku.dk or Fax: (+45)35321401.

\end{document} 


