1798

Formal Grammars

The first endeavours to give a formal notation of linguistic strings were due to Axel Thue and Emil Post1 who adapted their theory to data processing in Turing machines. An epoch-breaking attempt in their formalisation was launched by Y. Bar-Hillel, who devised a quasi-arithmetical notation for deciphering syntactic phrases.2 In the mid-1950s it developed into categorial grammars that offered a recognoscative apparatus for evaluating the grammatical correction of sentences and lexical strings. His analysis of language structures arose as a by-product of the earliest research in rewriting systems designed for machine-processing. It provided a decision-making counterpart to Noam Chomsky’s generative phrase-structure grammars that made the greatest contribution to modern techniques of artificial intelligence. His results influenced generations of young researchers and became a headstone of theoretical computer science.

Formal Presuppositions of Semantic Description

Linguistic studies wend their way in two directions, one devoted to visible language form and another to the invisible sphere of meaning-oriented semantics. Their interrelations are not linked by strict mathematical homomorphism but allow us to speak informally about approximate mappings. Let there be a natural language L composed from its alphabet A, vocabulary V and the apparatus G of grammatical rules. Then it is possible to define algebraic semantics as a formal system dealing with their mapping f into the realm of semantic referents. The vocabulary is mapped into the set S of semantic meanings (sememes), while the grammatical apparatus G is projected upon the schematic layout C of logical and ontological categories.

f(words) = meanings f: V ® S

f(grammar) = categories f: G ® C

Natural languages involve much polysemy so it is necessary to restrain the reference of words to the kernel vocabulary of primary literal meanings. This methodological step presupposes abstracting from infinite varieties of figurative meanings implied by numerous secondary connotations. This is why algebraic semantics directly switches from linguistic form to the realm of meaning. For simplicity sake it treats words as basic sememes in their basic primary elementary sense. When dealing with the modal meaning of must, may, will, shall, it considers them right away as sememes and resigns from mentioning the irrelevant intricacies of their formal lexemes.

The present state of human cognition may be summed up by concluding that theoretical logic and mathematics give an exact formalisation of the most essential fields of human thought but cover only a small part of semantic fields. Even if they give a precise logical treatment of basic elementary concepts, they do not care to render an integral description of the whole layout of a given semantic area. They are engrossed too much in their special internal technicalities that hinder them from joining their subtheories into an all-inclusive picture of the outer world. Algebraic semantics works with less rigorous theoretical apparatus but relentlessly strives to ensure mutual convertibility between semantic, logical, mathematical and algebraic calculi.

Constituency and Dependency Grammars

Modern advances of formal grammars have devised two elementary types of formal linguistic analysis. One was based on Chomsky’s phrase-structure grammars and their close predecessor, the immediate constituent analysis proposed by Rulon Wells3. Both approaches treated linguistic structures as linear sequences of words made up from the vocabulary of a natural language and put forward useful methods of their hierarchical segmentation. Their chief weakness was seen in low sensitivity to the mutual subordination of constituents. This drawback was partly removed by L. Tesnière’s project of dependency grammars4. His verb-centred system focused on semantic actants and syntactic pairs relating heads and dependents. Their mutual advantages are elucidated by the comparison5 of two ways of analysing the sentence We are trying to understand the difference given below.

Table 1. Dependency and constituency grammars

The chief asset of grammatical trees is that they give a vivid illustrative representation of syntactic structures for common laic observers but this is debased by difficulties, which it brings about in automatic word processing. Hence, a convenient remedy is provided by parenthetical and fractional grammars.

Parenthetical Grammar

In formal linguistics it is essential to realise that the laws of associativity hold neither in lexical nor in syntactic strings. Their lack and absence advances a strong argument for parenthetisation. The structuring and inner hierarchy in the following German and English expressions is much easier to understand from the use of parentheses.

Parenthetical grammar is a formal rewriting system that applies parentheses for expressing the grammatical relations of dependency and semantic subordination. It provides the simplest method of syntactic parsing without requiring very demanding means of visual representation. It employs a simple apparatus of left brackets (‘{’, ‘[’ or ‘(’) in order to demark the initial boundary of linguistic expressions and right brackets (‘}’, ‘]’ or ‘)’) that delimit their end. As seen in the phrase a ladies’ dress parenthetisation induces considerable differences in meaning:

a ladies’ dress = a (ladies’ dress) ¹ (a lady’s) dress = a lady’s dress .

The expression on the left describes a dress for ladies, whereas the phrase structure on the right refers to a particular lady’s garment.

A simple example of sentence analysis is given by the collocation Such an extremely long journey exhausted our energy. Its parenthetical articulation grammar segments couples of heads and dependents into the ensuing hierarchy:

((((Such (an ((extremely long) journey))) (exhausted (our energy))).

When rendered in terms of phrase structures, its decomposition proceeds as follows:

S ® NP VP ® ((AP NP) VP) ® ((Adv AP NP) VP) ® ((D A NP) VP) ® ((D A NP) (V NP)) .

Another telling illustration is supplied by the string Little Red Riding-Hood went to her grandmother in another village:

((Little (Red Riding-Hood))) (went (to (((her grandmother)) (in (another village)))).

The main reason for introducing such adjustments in syntactic theory is not only that it saves space and simplifies analysis. Its most important theoretical facility consists in opening the second dimension of syntactic hierarchy. Parenthetical grammars turn linear sequences into 2D-patterns embedding strings into a two-dimensional Cartesian space. Its basic horizontal axis x depicts the linear sequencing of symbols, while the second vertical axis y plots strings with the scaled hierarchy of phrase-structures according to different levels of syntactic validity.6

Concatenative and Decatenative Grammar

In current string theory individual symbols and string are treated as immediate constituents linked by the binary operation of concatenation. InformalIy speaking, it is a procedure joining two strings of shorter length into a concatenation whose length is the sum of both segments. Given two arbitrary strings S₁= x₁...x_n and S₂= y₁...y_m,their concatenation S₁S₂results in the following formula:

S₁S₂= x₁...x_ny₁...y_m.

If x and y are basic symbols, their logical connective is written in different algebraic symbols such as

xy = x * y = x × y .

Bar-Hillel’s theoretical apparatus made use of analogies to arithmetical multiplication, division and cancellation but such conventions represented only a formal and artificial apparatus. In fact, they have little to do with properties of rational numbers featuring in arithmetical fractions. He may have applied also additive formalism that renders concatenating strings as a sum of two addends. An elementary case of additive binary concatenation can be illustrated by joining two lexical strings composed of several letters as in the formula below:

town + hall = townhall .

An inverse operation to concatenation may be denoted as decatenation – and defined as unlinking chains into short fragments. A simple illustration of decatenative cancellation is provided by

townhall – hall = town .

Neither concatenation nor decatenation is a commutative operation. This means that the order of addends and subtrahends cannot be switched:

town + hall = townhall ¹ hall + town .

This inconveniency makes us introduce a special symbol Ø for left subtraction:

-town + townhall = town Ø townhall = hall .

The chief argument for giving preference to additive notation for concatenative strings is that the slash sign for right and left division can be employed for other purposes such as syntactic dependence.

Some theoretical contributions have developed the idea of ‘right cancellation’ conceived as a string operation that deletes some symbols on the right end of the string: “The right cancellation of a letter a from a string s is the removal of the first occurrence of the letter a in the string s, starting from the right hand side. The empty string is always cancellable: Clearly, right cancellation and projection commute.”7 However, it cannot be regarded as identical to the concept of right decatenation.

Fractional Grammars

The formal apparatus of parenthetical grammars shares many inadequacies encountered in immediate constituent analysis. It chains subsequent neighbouring words into pairs but does not specify their grammatical interrelations expressed by their mutual syntactic dependency. A convenient solution is offered by the so-called fractional grammars. They combine the convenient properties of constituency and dependency by indicating the subordinate position of dependents by slash signs ‘/’ and ‘\’. This is how it is possible to analyse a simple sentence The extremely long journey exhausted our energy:

(((The\((extremely\long)\journey))\(exhausted/(our\energy))).

S ® NP\VP ® ((AP\NP)\VP) ® ((Adv\AP))\NP)\VP) ® ((D\(A\NP))\VP) ® ((D ((Adv\A)\NP))\(V/(D\NP))) .

The right slash in V/NP means that in accusative object constructions the noun phrase the NP functions as a dependent of the head V (verb). It is efficient especially in indicating the syntactic status of incongruent attributes following the governing nominal head. Its treatment of attribute constructions is illustrated by the phrase structure the flower of many colours:

(the\flower)/(of(different\colours)) .

NP® (D\N)/NP ® (D\N)/(A\N) .

The replacement of cancellation by subtraction seems convenient since it permits exploiting slash marks for designating other important string operations. One possible usage might serve for designating relations of syntactic dependency. The inner structure of a word would be comprehensible if we combined dependency with parenthetisation. The afore-mentioned phrases would beam with clarity and explicitness if they were segmented neatly by parentheses determining the hierarchy of terms:

Rücksichtslosigkeit » ‘inconsiderateness’ ,

((((Rück\sichts)\los)\ig)\keit) » ‘(in\((consider)\ate)\ness)’ .

In such lexical derivations suffixes act as the governing head because they explicitly give the whole expression its categorial and part-of-speech standing. If a lexical root is preceded by a few prefixes and appended by several suffixes, we do not consider the order of its etymological composition but the hierarchy of syntactic values. Etymologically speaking, in ‘boldness’ the adjective ‘bold’ is primary but in lexical analysis it is secondary because the part-of-speech value of ‘boldness’ is determined by the suffix ‘-ness’.