IEML Grammar and STAR Syntax
| Messages | |
|---|---|
| andrew ::: Sujet : IEML Grammar and STAR Syntax | |
| Posté le : lundi 21 janvier 2008 - 17:25 | |
|
Hi all, I was looking at the IEML Grammar and STAR Syntax in order to extract some rules necessary for a couple of services I am working on. Looks like the parser accepts both *E:O:E :.** and *O:M:M :.**, but the document states that such sequences are not supposed to happen. Also, does the parser allow multiple *...** constructs ? Best, Andrew |
|
| Steve Newcomb ::: Titre : IEML Grammar and STAR Syntax | |
| Posté le : mercredi 23 janvier 2008 - 21:28 | |
|
> Looks like the parser accepts both > *E:O:E :.** and *O:M:M :.**, but the > document states that such sequences > are not supposed to happen. That’s right, you observe correctly. Syntactically speaking, they are parsable. This parser reports what it sees, if it thinks it can see anything at all. Constraints that are not implicit in the nature of the parser’s approach to the problem of parsing the STAR syntax are not enforced. If the parser were to take it upon itself to enforce such rules, then the possibility would exist that the parser could not as well support some applications in some respects. (Consider, for example, the needs of editing applications.) We’d resist the idea of having the parser exit with nonzero status just because the incoming text specifies a translator at level 2 (events) or 3 (relations). Moreover, in the case of *E:O:E :.**, the final E : is not even semantically wrong, it is merely redundant. But this is not the real issue ; the real issue is more fundamental. Modularity should be the governing goal here. The scope of the parser should be narrow, so that within that narrow scope, its applicability to everything else can be as general as possible. We think there should be an additional module that performs checks of various kinds on the XML output of the parsing module. If we take a rigorously modular approach, there are many benefits for the development of IEML. The parser is best seen as an underlying processing module, and not as a user interface. IEML applications may involve many things, and a STAR parser is only one of them. The fact that the parser isn’t doing some checking of some kind is not a problem in the parser. The real problem is that the processing that the user needs is not being provided by the application that employs the parser as one of its modules. If the output of the parser requires further checking, the application should do it, either by itself or by means of other modules that it also uses. It’s not the parser’s job to process IEML. The parser’s only task is to parse a STAR expression and output an XML representation of it. > Also, does the parser allow multiple > *...** constructs ? If we understand your question, you’re asking whether a single invocation of the parser can receive more than one IEML expression as its input, and produce a corresponding number of XML documents as its output. The answer to that question is "No". This parser (starparser.py) must be invoked once for each expression that is to be parsed. It always outputs a single XML document. Steve Newcomb and Michel Biezunski |
|
| Pierre Lévy ::: Titre : IEML Grammar and STAR Syntax | |
| Posté le : jeudi 24 janvier 2008 - 15:17 | |
|
I had recently a voice discussion with Michel and Steve and they told me, that their extreme modularity approach is also inspired by their experience with SGML and XML and the fact that XML was a bigger success than SGML. The difference between the two (and the reason why XML is "extended") is precisely a matter of parsing. As wikipedia on SGML says : "XML is an application of SGML, designed so as to make the parser much easier to implement, compared to a full SGML parser. A consequence of the ease of implementation is that XML, rater than SGML, is nowadays widely used for deriving document specifications. Contributing to this is also the fact that few SGML-aware programs existed when XML was created. The number of XML applications today is large. XML also has a lightweight internationalization. XML is used for general-purpose applications, such as the Semantic Web, XHTML, SVG, RSS, Atom, XML-RPC and SOAP." Another argument that is not given by Michel and Steve in their response (but that they gave me verbally) is that some people could just want to make some "raw" checking of IEML-encoded data. In this case, the addition of the emptiness rules would only augment the processing time... So, the syntax checked by "star parser" is limited to the very basics
A further module that will probably be used by editing services will check the emptiness rules that are not currently enforced by the starparser :
The last rule is to reduce the cognitive complexity of these layers for human users. Thanks to this rule, the flows from the second layer are downsized to 31 events instead of 150, so they can be represented by the lower-case latin alphabet, at least for the full destination events. And it is easier to learn to read and write the third layer flows in STAR syntax, or to represent them by icons in whatever visual syntax, if there are 931 relations (order of magnitude of 10 power 3) instead of 3x10 power 6. The reason for the first two rules (If the source is empty, the destination and translator are empty AND If the destination is empty, the translator is empty) is cognitive congruency. If the source of the flow is empty, it sounds difficult to imagine a full destination of this same flow. In addition, these rules allow a great simplification of the notation, and particularly the omission of a maximum of empty role players at the end of the expressions. But one could still invent an application where it could be usefull to have some empty source / full destination flows, who knows ? |
|
| Steve Newcomb ::: Titre : IEML Grammar and STAR Syntax | |
| Posté le : jeudi 24 janvier 2008 - 16:59 | |
|
I would only add to Pierre’s message that perhaps XML’s most significant innovation is its notion of "well-formedness". It separated (modularized) the notion of parsability ("well-formedness") from the notion of conformance to a schema or "document type definition" (DTD). SGML documents weren’t necessarily even parsable if they didn’t conform to a DTD, and the DTD always had to be disclosed before the document could be parsed. XML documents, on the other hand, are parsable regardless of whether they conform to any DTD ; DTDs and schemas are entirely optional in XML. It is widely believed (and it’s hard to argue with the belief) that XML succeeded where SGML failed precisely because XML was rigorously modular in precisely this way. Anyway, there are now several popular approaches to the problem of specifying constraints on XML documents, and each one presumably meets the needs of one or more distinct communities. And yet all of the communities that use XML use exactly the same XML syntax, and all *can* use the same bare-bones parser, if they want to. This has been a clear win for XML, and a win for everybody who wants to use XML in his or her own peculiarly-constrained way. The only losers are the people who mistook themselves for the center of the universe. |
|
| Pierre Lévy ::: Titre : IEML Grammar and STAR Syntax | |
| Posté le : vendredi 8 février 2008 - 22:35 | |
|
SYNTAX MODIFICATIONS I have listened to the arguments of Michel and Steve, but I have also listened very carefully to Andrew’s concerns about the discrepancy between the document presenting the syntax (on one hand) and the parser (on the other hand). I have also made recently several public presentations of IEML, discussed 2 half-days with Candide Kemmler and remembered some objections of Sylvain Boucher who worked six months at the CI lab last year. The result of my meditations is the following : 1) There is no need to engrave in the syntax the fact that there is currently no interpretation in natural language of the events and relations with full translator. Anyway, the lower case notation foster the use of empty translator events and relations. 2) Similarly, there is no need to engrave in the syntax the fact that there is currently no interpretation in natural language of...
Anyway, the notation shortcuts foster the use of empty translators when destinations are empty and empty destinations when sources are empty. So, except if any of you find some inconsistency in reducing the syntax to what is currently checked by the parser, I think that we should do it. It will simplify the syntax and it will add a lot of new possibilities in the dictionary. So, I propose the following changes in the syntax document, in order to align it with the parser : Changes in 3.1. Suppress : "The set of possibilities that I : represents may be restricted by the context in which the I : appears ; see below." Changes in 4.1. First change Current paragraph : " The translator role may have no role player. If this is the case, the destination role may also be unplayed. The source cannot be unplayed if the destination role is played. When a role is unplayed, the role is empty. " New paragraph : " Any role may have no role player. When a role is unplayed, the role is empty. " Second change Current paragraph : "However, a category may be an empty set ; in this case the category is either an expressed in STAR-IEML as the empty primitive category or as an empty generated category." New paragraph : "However, a category may be an empty set ; in this case the category is either an expressed in STAR-IEML as the empty primitive category or as an empty category. An empty category is a catogory in which all the roles are empty" Changes in 4.2. Table 4 New table : 1° primitive 6 2° event 216 3° relation 10 power 6 4° idea 10 power 18 5° phrase 10 power 54 6° seme 10 power 162 Changes in 4.2.1. and 4.2.2. 4.2.1. (emtpy generated categories) and 4.2.2. (special rules for layers 2 and three) : just suppress these paragraphs Changes in 4.2.4. 4.2.4. special characters for layer 2 (lower case letters) Current paragraph : "There are 31 single-event categories in IEML. In STAR-IEML, 25 of them can be represented by a specific lowercase letter, or by two lowercase letters. In the table below, one of the 31 categories is shown in each cell. In each cell, the first line shows the shortest way to notate the event in STAR-IEML — usually one of the 25 lowercase-letter codes. Below that is the descriptor in the English language. At the bottom of each cell, the event is notated in terms of its primitive role players" New paragraph : "There are 6 power 3 (216) single-event categories in IEML. In STAR-IEML, 25 of them can be represented by a specific lowercase letter, or by two lowercase letters. In the table below, one of these 25 categories is shown in each cell. In each cell, the first line shows the shortest way to notate the event in STAR-IEML — usually one of the 25 lowercase-letter codes. Below that is the descriptor in the English language. At the bottom of each cell, the event is notated in terms of its primitive role players" Change in table Table 5 : lower case event symbols in IEML Suppress the extreme left column Change in 7 : Current paragraph : " In the above example, the first undetermined subset is . Its parameter identifier is 1. Since O : is the set (U :|A :), the undetermined subset may be any one of the following : • U : • A : • (U :|A :) • E : (theoretically) E : is not a valid subset for undetermined subset 1 because this subset plays the source role, and because undetermined subset 2, which plays the destination role in the same category, is F :, and F : does not include the possibility of E :. Therefore, if the player of the source role were E :, the category would be an invalid expression. See above." New paragraph : " In the above example, the first undetermined subset is . Its parameter identifier is 1. Since O : is the set (U :|A :), the undetermined subset may be any one of the following : • U : • A : • (U :|A :) " NOTE : Anyway *E :** is not a possibility because *E :** is not an element of *O :** (*O** is itself a subset of *F** that excludes *E**) Change in 9.2.1. 9.2.1. In translators at layers 2 and 3, I : means E : Just suppress the paragraph 9.2.1. Change in 9.3. Suppress the following paragraph : "At layers 2 and 3, where the translator role is always unplayed, the bang operator copies the source role player only to the destination role ; the translator role remains unplayed." Change in 10.1. Flow : suppress the two last sentences : " It always has a source, and it may have a destination. If it has a destination, it may have a translator." |
|
| 4 posts | |