An XML-Based Transforming Text Editor: Basic Principles

Dmitry Kirsanov

Some quarter-century ago, Richard M. Stallman was fascinated with Lisp, and he needed a good text editor. He went on to synchronize these two seemingly unrelated motives, and Emacs was born as a result. Nowadays there is a community of people fascinated with all things XML, and many of us are still in search of a perfect text editor. It is natural to try to envision a new editor to which XML is what Lisp is to Emacs. Hopefully, this editor may ultimately become every bit as powerful as Emacs — and even go further. This document is an attempt to work out the basic principles it could be built upon.

As shown below, XML paradigms allow to formulate many common editing tasks in an elegant way, making the editor very usable and customizable. If nothing else, this editor may become a testbed for the (still maturing) XML technologies and a source of useful extensions that might one day be incorporated back to the corresponding standards. If the project gains popularity, it will serve to evangelize the XML mentality, approaches, and terminology to a wide audience.

Note that this paper describes a generic text editor using XML internally, not an editor specifically for XML. The flexible XML-based architecture described below is capable of handling texts of any degree of structuredness, from very structured to absolutely unstructured.

1what’s wrong with Emacs, anyway?

2the core

2.1tree access

2.2tree updates

2.3XSLT/XPath extensions

2.4validation

3views and aspects

3.1views: representing a document in more than one way

3.1.1the hierarchy of views

3.1.2creating views

3.1.3transforming and updating views

3.1.4views: conclusion

3.2aspects: tracking data across representations

3.2.1aspect propagation

3.2.2inheritance of aspects

3.3the principal views

3.3.1chars

3.3.1.1document and window positions

3.3.1.2visualization

3.3.1.3tracking marks

3.3.1.4cursor position

3.3.2lines

3.3.3tokens

3.3.3.1untagged data and marks

3.3.4tex-pars

3.3.5text

3.3.5.1applicability of aspects

3.3.6java-code

3.3.7xml-code

3.3.8xml-tree

3.3.8.1handling badly-formed data

3.3.8.2preserving lexical details

3.3.8.3aspects and validation

3.3.8.4namespaces in xml-tree

3.4constructors and deconstructors

3.5local views

3.6compound views

3.7deconstructor-only views

4transforms

4.1priority rules

4.2execution

4.3examples

4.3.1basic operations

4.3.2two transforms working together

4.3.3reformatting a paragraph

4.3.4syntax coloring

5conclusion

1what’s wrong with Emacs, anyway?

Emacs Lisp is a full-fledged programming language, which is not only usable without restrictions from within the editor’s macros and applications, but is also unusually well integrated with the editor’s fundamental concepts and surface controls. This allows for flexible and powerful programming solutions. So from the viewpoint of a Lisp programmer, there’s nothing really wrong about Emacs. (If there is, you just go ahead and program your solution, and the problem is gone.)

Unfortunately, not everyone is a Lisp programmer, and not all problems are best solved in Lisp. More importantly, for many problems, the language in which algorithm is expressed is less important than the underlying data structure on which the algorithm is supposed to work. If you have your data specifically structured to the requirements of your task, the programming part often becomes trivial.

This is the main problem with Emacs Lisp and, in fact, with many programming environments of pre-XML era. What a Lisp programmer faces when starting to code an extension is essentially a flat document, nothing more than a stream of characters, with no immediately obvious structure. Therefore, often the main bulk of a Lisp mode or macro consists of parsing that stream of characters in an attempt to find out where the interesting fragments of text start and end.

What’s even worse, the results of this parsing are, most often, not stored in structures that live long enough to be useful. This means that every single piece of code must do its own parsing over and over as the document changes. Syntax coloring does its own regexp parsing, so do cursor movement commands, indentation, sorting, extraction — almost every command has to search for some characteristic fragments in the document before it can do its useful work. The result is often inefficient, messy, prone to errors, and hard to debug.

The proposal presented below solves this problem by separating a parsing layer from everything that’s built on top of it. In this model, parsing is done once and reused many times. Any editor command can access, instead of a flat character stream, any of the well-defined structures that are automatically created and updated with the document itself.

The next section discusses the implementation options for storing, accessing, and manipulating XML trees within the editor. If you are more interested in how the strictly hierarchical XML can be used to accommodate the many overlapping and often incomplete structures of real-world documents, you can skip forward to the section on views.

2the core

With its hierarchical meta-markup capabilities, XML is a natural choice for storing the data structures resulting from parsing the document. As these structures are primarily intended for representing the parsed document for so long as it’s being edited, we’ll mostly have to deal with XML trees in memory rather than with serialized XML files stored on disk. Your documents will remain plain text, Java code, and whatever format they were created in; it’s only upon loading them into the editor that they are silently translated into a variety of XML representations that enable the editor’s software to efficiently manipulate them.

2.1tree access

The most widely used standard governing representation of XML trees in memory is called DOM. However, for most editing applications, pure DOM may be too low level as it does not provide efficient means for node selection and tree traversal. Also, DOM is not very efficient because it is too general, so perhaps a better approach would be to use a specialized XML tree implementation, for example that used by the Saxon XSLT processor.

What we need to efficiently manipulate in-memory XML trees is the XPath language whose expressions enable arbitrary selection of nodes of an XML tree. XPath is already becoming one of the key XML standards widely used in other standards and applications; it is a powerful language, easy to use and learn, and easily extensible. Perhaps the only disadvantage to using XPath is that it is slower than direct tree access, and without proper care, it is possible to write an XPath expression that will take ages to evaluate even for a simple document.

Thus, the core of the editor functionality is XML, an XML memory tree implementation, and XPath; everything that can be built on top of this core is not bound to any particular language or standard and can be implemented in any language, so long as it can access the XML tree and/or XPath. For direct tree access, perhaps the most natural choice is Java or C++. For access via XPath, currently the most mature option is XSLT, a template-driven tree transformation language that works natively with XPath. However, other languages such as Perl or Python could be plugged in just as well. For each new language, an interface module must be provided to connect it to the built-in XPath processor in the editor.

Applications that use direct tree access are likely to be the low level parts of the system, those that are performance-critical but not too complex in terms of the algorithms they implement. However, most higher level editor applications and user macros will be much more convenient to write using the XPath engine. It remains to be seen whether XSLT will be the dominant scripting language of the editor, or some other XPath-enabled language will assume this role. For the purposes of the examples in this paper, we will use XSLT with XPath, with certain extensions described below. In fact, XSLT’s template-based processing model is very relevant for a text editor, where most tasks are programmed in the form of “what to do if a certain key is pressed” or “what to do if a certain piece of text is encountered.”

Interestingly, it is widely held that the two most powerful features of Lisp are its facility of handling flat and hierarchical lists and its ability to treat code as data. From this viewpoint, XML/XSLT is probably the best successor to Lisp: First, its hierarchical representation of data is natively powerful, and second, XSLT code is valid XML, so XSLT transforms can easily generate other transforms just as they generate any XML data.

2.2tree updates

Tree traversal and node selection are only a part of the problem (although, arguably, the most difficult part). After we found the nodes we are interested in and processed their data, we need to update the memory tree. DOM provides some functionality for that, although it is rather awkward. XPath currently cannot update a document tree, only query it.

There may be two different approaches to updating an in-memory representation of an XML document. You can remove, add, or modify nodes of an existing tree; this is what the built-in DOM methods support. Alternatively, you can build a parallel tree from scratch, perhaps using some data from the original tree, and then discard the original tree. The second approach is obviously more costly in terms of speed and memory, and it only makes sense when you need a new tree that is really very different from the original.

XSLT implements the second approach; the template rules of an XSLT transform, triggered by the nodes of the original tree, are used to construct a parallel tree that goes to output. It is possible, however, to optimize this process in some cases. If most of the original tree nodes will go unchanged into the result tree — and it can be expected that the majority of editing transforms will satisfy this requirement because of the built-in template rule in use — you can create the new tree pretty quickly using references to the source tree nodes, without actually allocating new memory or copying these nodes over. This may significantly improve the XSLT performance in the editor.

Other programming languages used for writing editor applications can take one of several approaches to updating XML memory trees. First, if the chosen implementation supports this (e.g. DOM), they can use the native tree methods for this purpose. Second, a language-neutral wrapper layer may be written on top of the tree implementation in order to facilitate its use in applications and hide details that are irrelevant in the context of the editor. Finally, a set of XPath extensions can be supported by the editor’s XPath implementation that would provide basic update functionality through XPath expressions. The best way to go is a matter for discussion.

2.3XSLT/XPath extensions

Unavoidably, both XSLT and XPath will have to be extended in order to be fully usable as a text editor’s scripting language. It seems likely that the following features will have to be added:

All the editor-specific extensions should be explicitly separated, e.g. by means of their namespace(s), from the language core which should be kept compliant to the current XSLT and XPath specifications. In this paper, however, instead of writing “XSLT with editing extensions” every time, we will abbreviate it to “ET” (“Editing Transformations”), with a single namespace prefix for all of its components. Similarly, the extended XPath is referred to as “EPath.” As for the editor itself, it could be named “TE,” which stands for “Transforming Editor.”

2.4validation

Transforming XML trees in memory according to some algorithm is only one part of what editor applications might want to do. As these applications become more and more sophisticated, they might need to exchange XML trees or tree fragments with other applications, allow the user to edit them (as opposed to editing the plain text document itself from which these trees are constructed), or read and write them to disk. For all these tasks, a validation layer might be needed that makes sure a (sub)tree satisfies to all the restrictions imposed for this particular kind of trees.

So, just as the editor provides a built-in EPath engine used by applications, it might also be a good idea to include a validator into the editor’s core and provide an interface by which application languages (XSLT, Python, Perl, etc.) can access it to validate the XML data they work with. This validator can use XML Schema or a different schema language (for example, Schematron which uses XPath expressions for validation). In the great majority of cases, XML data will be generated by the editor itself (more precisely, by the (de)constructors, described below) and thus won’t require validation. However, when (a part of) an XML tree is constructed from an outside XML file or has been edited by the user, the validation capability may be very useful.

3views and aspects

3.1views: representing a document in more than one way

A document being edited in TE is represented internally as one or more named views. This concept is fundamental to TE; the views are superficially similar to Emacs’ modes but are much more powerful.

Each view is an XML tree, stored in the editor’s memory (and normally never serialized, unless by user request). Usually, all of the elements in the tree belong to one namespace, specific to that view. (It is convenient to always map this namespace to a prefix which is the same as the name of the view.) Originally, only the chars view is active for a document that has just been loaded or created. Other views are activated automatically as soon as they are first accessed, and remain active until they are explicitly shut down or the corresponding document is closed. Any number of views may be active simultaneously for the same document.

3.1.1the hierarchy of views

Views that represent the same document are not required to describe the same structure; each view may select its own pieces of content to mark up and do it in very different ways. Each view must be well-formed (XML term meaning, among other things, that there must be no overlapping tag pairs), but no two views are obliged to be “mutually well-formed.” That is, their markup may “overlap” (if you compare them character-per-character).

For example, one view may represent the document as a sequence of words and sentences, while another view may divide the same document into screen lines and pages. You cannot store both structures in a single XML tree, because XML cannot contain overlapping elements. However, by storing the same document data in a number of copies, each one parsed and marked up differently, we can make efficient use of the XML formalisms and still be able to reflect the true multilayer complexity of a real document.

If one view’s structure is easier to construct from another view than from the raw document data, then it is natural to consider these two views as a parent/child pair. It is also natural to consider the raw character data as yet another view, although this view is special in that it has no parent. This view, called chars, is the root view of the views hierarchy; all other views have it as their ultimate ancestor.

For example, a useful view that marks up parts of speech (verbs, nouns, etc.) does not have to be constructed from scratch (that is, from chars). It can reuse an existing view where sentences and words are already parsed out, by becoming a child of that view. Similarly, before creating a view to support some programming language syntax, you may first want to write a simpler view that only divides the document into tokens using rules specific to this language. Then, by creating your language view as a child of the tokens view, you can conveniently code in terms of preparsed tokens and forget about low-level syntax details.

Note that even a child view is not necessarily “mutually well-formed” with its parent. The only thing that connects a parent view with its child is some algorithm that performs one-to-one mapping between these views; once this mapping is one-to-one, the parent and the child may have nothing in common. (Of course in reality, they will often be quite similar, but this is not a requirement.)

3.1.2creating views

Each view (except chars) is constructed from the XML tree of its parent view. This is done by a piece of code called constructor of the view. A constructor may be implemented as an ET (XSLT) transform, or as a module in some other programming language that can access the editor’s in-memory XML tree (either directly or through EPath). Each view must also have a deconstructor which makes the reverse transformation, producing the parent view from the child view.

New views can be created by the user, who must supply the constructor and deconstructor transforms that link the new view to its parent. Obviously, the constructor and deconstructor must be proper reversals of each other, so a parent view transformed into the child view and then backwards must stay the same, without any loss or damage of information. However, no formal check is made to ensure this, as it is left to the responsibility of the developer of the view.

This requirement is not as scary as it sounds. Indeed, for an arbitrary tree transformation (e.g. an XSLT stylesheet) creating a proper reversal may be difficult or even impossible. However, in TE, most often going from a lower level view to a higher level view is little else but adding some extra start- and end-tags to the existing markup (that is, if we consider a serialized form of a view). If your view needs to remove too much of the parent view’s markup, perhaps you should just choose another parent for your view. As for character data, it is almost always completely preserved in all views. It is obvious that for such transformations, creating a proper reversal is not a problem at all.

Constructors and deconstructors can access information from any views other than the source view, including the original target view which they are going to replace, and even from other documents. This allows them to not only rearrange or parse information when creating a new view, but to add some new data as part of the abstraction of the view. When such an augmented view is deconstructed to its lower level parent view, this additional data is removed, but when it is updated from its parent, the constructor can optionally preserve the additional data by retrieving it from the target view before replacing it. This makes it possible to implement some quite interesting algorithms.

3.1.3transforming and updating views

Each document editing operation is equivalent to a transformation of one of the views, whereby this view’s source tree is patched or simply replaced by a new tree in the same view. This transformation is done by some piece of code called a transform. While transforming one of the views, you can access (but not change) information from any other views, representing the same document or any other documents (in the editor’s memory, in the local filesystem, or elsewhere).

Once a transform has finished, all active views of the document are updated from the view which was transformed. For this, constructors and deconstructors of all active views are fired automatically by the editor, so that the change propagates both downstream (to the descendants of the changed view) and upstream (to its ancestors). No other change to any of the views is allowed until the entire hierarchy of active views is in sync again.

3.1.4views: conclusion

The advantages of the many levels of abstraction which exist simultaneously are immense. Now the user whose needs vary from simple keyboard macros to complex document transformations does not need to write any parsing or searching code that is so prone to errors. All that is necessary is to choose an appropriate view and then apply the power of EPath and your favorite programming language to traverse the XML tree of that view, search, rearrange, or change the properties of elements. All such changes are automatically deconstructed all the way down to separate characters, to be at once reflected in the editing window. For example, if you want to treat lines of text as separate entities, work in the lines view; if end-of-line separators are not relevant, and any whitespace is to be treated on equal terms, use the tokens view, etc. No more messy regexps — no matter what your needs are, you always have well-defined information at your disposal!

Of course, if you want to create a view of your own, you’ll probably have to do some regexps and parsing work when writing your constructor and deconstructor. But this is only done once and reused many times, as opposed to the old approach where each editor macro must take care of its own parsing needs. Also, constructing a new view in TE is often made easier by the fact that you don’t have to go all the way up from basic characters, but can reuse the abstraction level of other views, by basing your view on a higher level parent rather than chars.

3.2aspects: tracking data across representations

Although the elements in each view are usually in one common namespace, they can take attributes from many different namespaces. A group of related global (prefixed) attributes and global variables (whose names are in that namespace as well) is called an aspect. As with views, the prefix to which the namespace of an aspect is mapped is used as the aspect’s name.

Any aspects can be set for elements of any view, but it is the responsibility of constructors and deconstructors to copy or process them when creating the views. Moreover, in some cases (see untagged data and marks below) a (de)constructor may be obliged to supply values for certain aspects even if they are missing in the source tree. In the chars view, the root view of the system, some aspects are set automatically by the editor, and some are user-settable but have side effects. The main aspects are described below in the sections dealing with the views they apply to.

Thus, in the editor’s data model, aspects represent a second coordinate that is orthogonal to the views and allows to track certain data across views. Users can create their own aspects, some of them intended for use in all views (including chars), others only for coordination between several related views.

3.2.1aspect propagation

All (de)constructors must know how to treat all built-in aspects of the editor (for example, the mark aspect requires that its attributes’ values are calculated to reflect the position of the marked characters in the current element). For user-defined aspects, the default behavior of a (de)constructor is copying all the attributes from a source element to those target element(s) which correspond to it.

However, a user who introduces a new aspect can also supply a piece of code (e.g. an ET template or a Java method) whose purpose is to process the attributes of this aspect when constructing or deconstructing relevant views. When such an aspect propagator module exists for a specific aspect, all (de)constructors are obliged to call it when they encounter an attribute from this aspect, passing full context information (the source attribute, names of source and target views, source and target element nodes, etc.) as arguments. The propagator must then return the new value for the aspect attribute, which the (de)constructor must insert into the target tree.

3.2.2inheritance of aspects

There are two sides to the inheritance of aspect values in TE:

The editor itself acts on the aspect values only in the chars view, the root view of the hierarchy. All other views carry the same aspects, but they are only used by transforms and (de)constructors in those views and do not affect the editor display until they propagate down to the chars view. Therefore, in a transform you only have to set a particular aspect on one of the elements in a view; you don’t have to process all of its parents and children elements to see that they receive correct values. However, the deconstructor of this view (or the aspect propagator called by the deconstructor) must know what to do if different levels of element hierarchy in the view have different aspect values, and how to combine these values to make sure that the aspects in the chars view are set correctly.

3.3the principal views

3.3.1chars

The chars view is the most basic representation of a document. It breaks the text into separate characters. The XML tree of this view is flat, consisting of a series of empty-content char children of the view root element. If serialized, a chars tree might look like this:

<view  
    xmlns:chars="http://tte.sourceforge.net/views/chars" 
    xmlns:doc="http://tte.sourceforge.net/aspects/doc" 
    xmlns:win="http://tte.sourceforge.net/aspects/win" 
    xmlns:vis="http://tte.sourceforge.net/aspects/vis" 
    xmlns:mark="http://tte.sourceforge.net/aspects/mark" 
    document-url="document.txt">

  <!--...-->
<chars:char 
    chars:unicode="65"
    doc:line="5" 
    doc:col="6" 
    doc:num="12"   
    win:line="2"  
    win:col="0" 
    vis:font-family="#default" 
    vis:font-size="#default"
    vis:visible="true"
    mark:cursor="1"
/>
<!--...-->

</view>

Attributes that belong to a view and are used to store some information in the tree of that view only (as opposed to aspects that are passed between views) are called native attributes of the view. Native attributes use the namespace of the view. In the chars view, the only native attribute is unicode that stores the Unicode number of the character.

3.3.1.1document and window positions

The doc attributes in the chars view are added automatically, by counting the preceding siblings in the tree and the newline characters among those siblings. This aspect is not user settable; that is, if you write a transform working in the chars view (or a deconstructor which traverses from some other view to chars) which sets some arbitrary values of these attributes, these values will be ignored and re-set by TE upon completion of the transform.

In other views, the doc attributes of an element always apply to the first character of the element’s character data. In those higher level views, doc values are user-settable, although this can only be done if the view’s deconstructor knows how to meaningfully interpret these values.

The win aspect reflects the position of the character in the editing window. It is settable by a transform, but setting an attribute from this aspect may have side effects. For example, if you set win:line="1" and win:col="1" in a transform, this will cause the document to scroll so that the corresponding character is in the top left corner of the window. However, this will also change the win aspect of other characters whose visible positions are affected by the scroll. If a character is not visible, its win attributes are set to 0.

Similarly, in views other than chars, the win attributes can be used to control window scrolling with respect to higher-level structures rather than characters.

The char element in the above example represents a character “A” in line 5, column 6 of the document, which comes 12th from the document start, and is displayed in the window line 2 but not in a visible column (i.e. is horizontally scrolled so as to not be visible).

3.3.1.2visualization

The vis aspect governs the visual presentation of the elements of a view. It might include a subset of attributes defined in XSL FO to control font faces, colors, and other similar properties. The default behavior for (de)constructors with regard to this aspect is to copy all attributes from a source element to the corresponding target element(s). The value space for the attributes should include a #default value which refers to the default text formatting set in the editor preferences (this is the default value, but it can be set explicitly to override values inherited from other views).

The vis:visible attribute controls visibility of elements. An element with vis:visible="false" is not only invisible on the screen, but must also be ignored by most editing transforms as if it’s non-existing. Only (de)constructors should treat invisible elements as regular elements, but copy the value of vis:visible from a source element to the corresponding target element(s).

3.3.1.3tracking marks

Attributes from the mark aspect are used to track certain points in the document across views. This aspect is built-in, but users can add their own attributes to it, and all (de)constructors are obliged to treat all attributes from this aspect uniformly, as described below.

Namely, if a mark attribute in a char element has a zero value, this means that this character is not marked (in the sense of this attribute); a non-zero value means it is marked. Generally, more than one character in the document may be marked by the same mark attribute, although some attributes may limit this. In views whose elements contain character data, a zero value of a mark attribute means that the corresponding mark is not within this element, while a non-zero value gives the ordinal number of the marked character within this element’s character data.

It is the responsibility of the constructors and deconstructors to properly set the mark values in all views. A constructor or deconstructor, having encountered a mark attribute, must determine in which target tree element the mark appears and what is the number of the marked character within that element’s character data. (This general rule has some exceptions, see untagged data and marks below.)

3.3.1.4cursor position

In particular, the mark aspect is used to track the position of the cursor. The mark:cursor attribute is special only in that TE uses it to position the cursor when updating the window; in all other respects it is a regular mark attribute as described in the previous section. Many of the examples use the EPath expression [@mark:cursor!=0] to find the element that is currently under cursor.

[To be added: a similar mechanism for persistent and non-persistent blocks (or “marks,” or “selections,” as various editors call them).]

An alternative way of accessing the cursor position is via the global variables in the doc and win aspects. Namely, $doc:line, $doc:col, and $doc:num define the position of the cursor in document coordinates, and $win:line and $win:col define the position of the cursor in window coordinates. When any of these variables is changed, TE updates other variables accordingly, then resets the mark:cursor attributes in the chars view and re-constructs all active views to update them.

3.3.2lines

The lines view is a child of chars. It is another flat view which represents the document as follows (namespace prefixes and attributes stripped for brevity):

<!--...-->
<line>The lines view is a child of chars. It represents</line>
<line>the document as follows: </line>
<line/>
<!--...-->

where the line elements correspond to how the document is currently broken into lines. The deconstructor which goes from lines to chars must take into account the character(s) that are used to separate lines. The editor must allow the user to either set line separators to some fixed value, or to preserve the EOL characters of the file being edited.

Aspects applicable to this view are the same as those of the chars view. The doc aspect describes the first character in each line, so doc:line gives the line number while doc:col is always 1. The win aspect describes the first visible character in the editing window, so win:col may be different from 1 if the window is scrolled horizontally.

This view can be used, for example, for line duplication, highlighting the line under cursor, and other line-oriented editing tasks.

3.3.3tokens

The tokens view is also a child of chars. It is a flat view that breaks the document into tokens separated by whitespace, for example:

<!--...-->
<token>public</token> <token>class</token> <token>XSLTRunner</token> <token>{</token>

  <token>public</token> <token>void</token> <token>main(String[]</token> <token>args)</token> <token>{</token>
<!--...-->

which corresponds to the following plain text fragment:

...
public class XSLTRunner {

  public void main(String[] args) {
...

Note that the whitespace separating tokens is not lost; in the XML tree of the view, it is present as text nodes between the token element nodes. The deconstructor which goes from tokens to chars must process this untagged data as well as the tokens themselves, so that the original formatting of the document is preserved. In other views, untagged data may include not only whitespace but any character data that is not relevant to the abstraction implemented by this view.

3.3.3.1untagged data and marks

Untagged data, being in no element except the root element of the view, cannot track marks because there is no element to add mark attributes to (they cannot be added to the root element of a view). This means that if a view which allows untagged data has been transformed, the deconstructor should pass mark attributes to its parents only if they are present in this view (i.e. if the corresponding marks are within non-root elements of this view). Otherwise, the deconstructor must preserve the mark aspect of the parent view. This makes sense, because if a view does not tag something, it generally means it is not concerned with this portion of the document at all, so it won’t need to place or move any marks there. Thus, in the tokens view, what matters is only the non-whitespace tokens, and if this view is used e.g. for cursor movements, you’ll probably want to jump to first or last characters of tokens, ignoring the separating whitespace.

Suppose the cursor is within some whitespace which is untagged in the tokens view and you perform an operation which is tokens-specific, such as swapping places of the two tokens before and after the cursor. The deconstructor from tokens to chars, lacking the cursor information in the tokens tree, will have to keep mark:cursor at the character with the same doc:num, or with the same win:line and win:col in the original chars tree. Any of these approaches appears to be reasonable in this situation.

Conversely, if the cursor is within one token and you swap this token with its neighbor, then the cursor position in the tokens view gets swapped also, and the cursor jumps to follow the new position of the token — again, the behavior which is to be expected in this situation. Note that in this case, you don’t need to do anything special in your transform to achieve that effect; simply copy the element node with all of its children and attributes, and the cursor position will be updated automatically.

3.3.4tex-pars

Still another flat view, tex-pars simply breaks the document into fragments separated by empty lines (i.e. by whitespace sequences containing two or more newlines). Each fragment is in a tex-par element node. Empty lines are used in TeX to separate paragraphs (hence the name of the view), and are also frequently used in plain text for the same purpose. For an example of using this view to reformat a paragraph, see below.

To make things easier to remember, the name of each flat view with elements of one type is the plural of its element type name.

3.3.5text

This view parses the document as an English text (similar views for other languages can be created), attempting to identify words, punctuation, sentences, and other relevant constructs. Here is a serialized example (newlines inserted for readability):

<!--...-->
<par><sentence><punc type="open quote" closes-at="id023" id="id022">`</punc><word>Imperial</word>
<word>fiddlestick</word><punc type="excl">!</punc><punc type="close quote" opens-at="id022" id="id023">'</punc> 
<word>said</word> <word>the</word> <word>King</word><punc type="comma">,</punc> 
<!--...--> </sentence> <!--...--> </par>
<!--...-->

which corresponds to the following plain text fragment:

...
`Imperial fiddlestick!' said the King, ...
...

This view is not flat, as it has a hierarchical tree of element nodes. Note that only the metadata is stored in the element and attribute nodes of the view; the text itself is in character data, so if we strip away all the markup, we’ll get exactly the original unmarked document. This is a general rule to be followed by all views (with the exceptions of chars, where the characters are stored as attribute nodes and no character data is allowed, and xml-tree, where part of the character data within element tags becomes element and attribute nodes rather than character data nodes).

If applied to a document which is more than just text, for example text marked up in XML, the view must try to recognize the non-text fragments and leave them in untagged data. Failing to do so may cause all kinds of problems, from messed up syntax coloring to wrong word counts. However, enabling the text constructor to recognize all possible sorts of non-text content may be difficult and not feasible, and compound views may be a better solution for this problem.

An intelligent descendant of the text view may do some more exciting tricks, such as marking parts of speech, doing a grammatical analysis of sentences, etc. This is one example of an XML-based environment where pieces of code written by different people with different objectives not merely coexist, but actively help each other, making life easier for developers and users alike.

3.3.5.1applicability of aspects

The applicability of aspects in text, as well as in most other higher-level views, largely depends on what transforms you plan to implement for it. For example, if you write a transform which does something to a word depending on the document coordinates of its first character (not likely), you must enable the text constructor to add the doc aspect to the view; otherwise you may leave it out. Most probably you’ll need the vis aspect for syntax coloring and the mark aspect for text-specific cursor movement and selections, so the constructor/deconstructor of the view must process at least these aspects, passing their information back and forth between the text view and its parent.

3.3.6java-code

This view is described here as an example of a language-specific view; views with similar properties can be defined for any non-XML language. This view parses the document and marks up certain Java-specific fragments:

<!--...-->
<keyword>public</keyword> <keyword>class</keyword> <identifier>Allfiles</identifier> <open-delimiter id="id022" closes-at="id023">{</open-delimiter>

<comment><comment-delimiter type="eol">//</comment-delimiter> Returns the list of all files in a given folder as a nodeset object</comment>
<!--...-->

which corresponds to the Java source fragment of:

...
public class Allfiles {

// Returns the list of all files in a given folder as a nodeset object
...

In addition to providing many operations useful for Java editing, this view can be the basis for numerous higher-level views working with Java code. For example, a user can create a view that augments java-code by adding higher-lever markup for functions, classes, etc., and tracks dependencies between these objects.

The java-code view is flat, while the higher level views will likely be hierarchical. In the java-code view, only some basic syntax checking is performed, such as matching delimiters. More complex java-specific checks, such as searching for objects’ declarations, can be done in higher-level views.

3.3.7xml-code

The xml-code view interprets the document as XML. However it does not do actual XML parsing, that is, it does not convert the XML markup in the document into a corresponding tree (it is xml-tree that does this). This is a “low level” XML view which simply recognizes various parts of an XML document and marks them up correspondingly. For example (newlines inserted for readability):

<!--...-->
<element-tag-delimiter type="open-start" id="id922" matches="id923">&lt;</element-tag-delimiter>
<element-type-name>greeting</element-type-name> 
<attr-name>type</attr-name><attr-equals>=</attr-equals>
<attr-quote type="double" id="id022" matches="id023">"</attr-quote>
<attr-value>warm</attr-value>
<attr-quote type="close" id="id023" matches="id022">"</attr-quote>
<element-tag-delimiter type="close" id="id923" matches="id922">&gt;</element-tag-delimiter>
Hello!
<element-tag-delimiter type="open-end" id="id924" matches="id925">&lt;/</element-tag-delimiter>
<element-type-name>greeting</element-type-name>
<element-tag-delimiter type="close" id="id925" matches="id924">&gt;</element-tag-delimiter>
<!--...-->

is a serialized view tree for the following plain text fragment:

...
<greeting type="warm">Hello!</greeting>
...

This view can be used to simplify the most basic XML editing tasks, such as automatic closing of tag delimiters, jumping to matching attribute quotes, and generic syntax coloring for atoms of the markup. However, since this view does not do actual XML parsing, it will work even for documents which are incomplete and therefore not yet valid or even well-formed. It is very useful to have some XML-specific transforms available right from the start, before your document becomes real XML.

So, the xml-code view is similar to language-specific views such as java-code described above in that it implements an “atomic” view on the document, recognizing the most basic components of the language’s syntax but not building larger constructs from these components (which may not be in place yet in a document being worked on). Such “pre-parsing” views not only allow to have useful editing tools even for incomplete documents, but also greatly facilitate the work of creating higher-level views which now do not need to do all the parsing from the level of characters. Such higher-level views can therefore be easily created by users with minimal programming expertise, only with subject field expertise.

3.3.8xml-tree

The xml-tree view, a child of xml-code, attempts to parse the document as XML and build the corresponding tree, so the constructor of this view must implement a reasonably complete validating XML parser. In the process, it uses the readily available xml-code view tree, rather than parses the document from characters as most parsers do.

The transforms working in the xml-tree view will provide a variety of tools for making XML editing easier (such as displaying a context for the element under cursor, providing a list of possible child elements based on a DTD/schema, etc.). A syntax coloring transform in this view could change formatting for character data of elements, for example, depending on their element-type names (as opposed to xml-code where character data is untagged, and only parts of XML tags can be syntax-colored).

Naturally, all the views that work with specific XML vocabularies (such as XHTML, XSLT, or ET) can be implemented as children of this view. A constructor for such a child view can be very simple, as it won’t have to perform any tree transformation (the tree constructed by xml-tree is usable no matter what XML vocabulary is used), only to add some native and aspect attributes to enable the transforms defined for that view.

3.3.8.1handling badly-formed data

To be useful in an editing environment, the xml-tree parser must have some special features. For example, it should attempt to silently add missing end-tags at the end of a document, so a document which is unfinished but “almost well-formed” can still use transforms specific to this view. However, in this case the constructor must clearly indicate to the user that the document is not actually well-formed and should not be expected to pass for one once saved to disk.

If all attempts to “patch the holes” in the document fail to make it well-formed (for example, if it contains overlapping elements), then the constructor of the view must express its displeasure by creating an appropriate output tree (rather than by complaining with error messages). Such a tree will leave all of the document untagged except for those fragments which caused the parser to fail, which will be marked up by special elements holding additional information about the error in their native attributes.

This will allow the syntax coloring transform to paint these fragments bright red, and the cursor movement transforms to jump to these fragments. Still another transform (and not the constructor itself) can report the number of errors and other relevant details in the status line. As soon as the user performs some editing operations (necessarily, in a different view — probably in xml-code — because most editing transforms of xml-tree are disabled in the broken document), a new attempt to parse the document as XML is toggled as the change propagates through the hierarchy of active views.

A similar approach to handling badly formed documents may be used by other views which expect well-defined data, such as higher-level programming language views.

3.3.8.2preserving lexical details

The untagged data in the xml-tree view is what is left untagged in the document itself. However, this view is special in that some of the lexical details of the document, such as extra whitespace within the element-tags, would be lost if you simply parse your document into an XML tree and then serialize it back. An XML tree cannot store information about how its tags are formatted simply because it stores nodes, not tags.

In TE, this problem is solved by adding a special aspect for storing an XML document’s lexical information in xml-tree and passing it between this view and its descendants. By default, the xml-tree constructor (parser) creates lex aspect attributes to store such details as the order of attributes on each element, intra-tag whitespace, presentation of empty elements, coordinates of CDATA sections, use of entities, etc.

Having all this information available allows the deconstructor to exactly reproduce the original document after a change was made to its xml-tree view (or its descendant). This also makes it easy to write transforms that modify the lexical representation of the XML document in some meaningful way (for example, rearrange attributes, normalize whitespace, use CDATA sections for the character data of certain elements, etc.).

Other transforms that work on the content elements and attributes (i.e. those present in the original document) will not conflict with the lex aspect data because of namespace separation. Also, it is easy to produce a child view of xml-tree whose constructor completely strips away all aspects, enabling you to work with a logical tree fully equivalent to the source XML. Such an aspect-less view would probably have to be read-only, but it may still be useful for purposes such as building a graphical tree of the document or gathering statistics.

As mentioned above, child views of xml-tree also get the lex attributes from xml-tree and can make changes to them, which are then passed back to xml-tree and used by its deconstructor when serializing the tree. Each child view may have its own preferences in this area, however, so some of xml-tree-based views may, upon activation, immediately enforce a certain normalization of the document by unifying the lex attributes in some way specific to that view. Of course, what normalizations are performed (if at all) should be fully configurable by the user by setting corresponding parameters of the view’s constructor and transforms.

3.3.8.3aspects and validation

As in any view, elements in the xml-tree view will have some native attributes (used to store metadata), aspect attributes, as well as the attributes that are actually present in the XML document being edited. Therefore, the validating parser in the constructor of this view must be aware that attributes in certain namespaces (those of the view and its applicable aspects) must not interfere with validation. For example, if a DTD or a schema declares no attributes for a certain element type, such an element must still be considered valid even though it has some aspect attributes added.

3.3.8.4namespaces in xml-tree

Usually, elements in the xml-tree view will be in arbitrary namespaces, as declared in the XML document they originate from. (The view still has its own namespace, but it is not used for elements, only for native attributes.) Transforms defined for this view must therefore be applicable to an arbitrary mix of namespaces.

3.4constructors and deconstructors

For performance reasons, most constructors and deconstructors are likely to be implemented in some compiled language capable of accessing the in-memory XML tree directly. However, any constructor or deconstructor can also be expressed in pure ET or another EPath-enabled language. This means that the user who wants to create a new view can do so using the convenient EPath tools, and it will work. Only if performance of that view slows down the editor considerably, it may be necessary to rewrite critical parts of, or even all of, the view’s constructor and/or deconstructor in a compiled language with direct tree access.

Generally, as we go up in the views hierarchy, constructors become easier to write using EPath and the need for direct tree access becomes less pronounced. For example, a constructor going from chars to tokens is not likely to be very elegant or efficient if written with EPath (although this is possible). On the other hand, a constructor to traverse from java-code to a view supporting some higher-level Java constructs looks like a perfect application for ET/EPath.

Thus, ET has the construct and deconstruct instructions, similar to a stylesheet of XSLT as they are containers for template rules, variable declarations, and other top-level elements:

<et:construct source-view="chars" target-view="tokens"
                          xmlns:chars="http://tte.sourceforge.net/views/chars"
                          xmlns:tokens="http://tte.sourceforge.net/views/tokens"
>
  <!--...-->
  <et:template match="chars:char[...]"> <!--...--> </et:template>
  <!--...-->
</et:construct>

<et:deconstruct source-view="tokens" target-view="chars"
                          xmlns:chars="http://tte.sourceforge.net/views/chars"
                          xmlns:tokens="http://tte.sourceforge.net/views/tokens"
>
  <!--...-->
  <et:template match="tokens:token[...]"> <!--...--> </et:template>
  <!--...-->
</et:deconstruct>

3.5local views

In addition to global views that represent the entire document, you can use local views that are enabled only for a fragment of a document which is a separate element node in one of the views. For example, a local view can be used to parse a line into tokens using the standard tokens constructor. This can be done in a transform using the EPath function view. Its first argument is either a node or a literal text fragment, the second is the name of the view to apply to that node or fragment. This function searches for the shortest path to traverse the views hierarchy from the view of the source node to the specified target view. So, if you work with lines but want to access a second token in certain lines, write

<et:transform target-view="lines">
  <et:template match="line[...]">
    <!--...-->
    <et:value-of select="et:view(., 'tokens')/token[2]"/>   
    <!--...-->
  </et:template>
</et:transform>

In this example, the shortest path from lines to tokens is through chars, so the view function first deconstructs the current line into a chars tree and then constructs a tokens tree from it. If we wanted to traverse between two views both of which are based on tokens, then the view function would only deconstruct the source view as far as tokens and not all the way down to chars.

The view function returns a complete tree compliant to the schema of the target view, because the corresponding constructor acts as if the fragment it got as input is a complete document.

3.6compound views

The view function can be used not only in transforms but in constructors as well. This means that a constructor can only perform some high-level operations on the source tree, such as separating the document into large chunks, then call other views to process each chunk, and finally include the resulting tree fragments into its own target tree (probably with some modifications).

For example, if you edit a document which includes some Java code examples in a text marked up in XML, you can construct a view which uses xml-tree to parse the document, then selects those elements which are supposed to contain Java code and calls a java-code view to parse their contents. Then it could apply the text view to those elements which are supposed to contain text. The resulting tree in this case will contain elements specific to all the views involved, with their corresponding namespaces.

If view A calls view B and includes elements from B into its own tree, then all transforms specific to the view B will also work in A. The transforms’ default behavior of copying everything onto output will ensure that if activated, such transform will only apply to (some of) the elements in its native namespace and leave all others untouched. There may also exist transforms native to the compound view itself, which may affect elements in many namespaces.

For complex documents, such compound views have many advantages. The default behavior of several views being applied to the entire document simultaneously can be messy: for example, you don’t want your cursor movement and selection macros to work the same way in Java code as they are supposed to work in textual fragments, and you don’t want the Java syntax coloring to affect words in your plain text that happen to be Java keywords. Compound views eliminate these problems.

3.7deconstructor-only views

It is also possible to envision views that are linked to their parents only through deconstructors but not constructors. This means that such a view does not get any data from lower-level ancestor views but only passes its own data down to the ancestors. Instead, the XML tree of such a view is constructed from some outside information. One example of such deconstructor-only view is a specialized file manager view, similar to dired in Emacs.

All the information displayed by the editor when this view is active comes from outside of the editor, so the view only needs to properly deconstruct its tree to a lower level view to enable the editor to display the view’s interface. With such one-way link to the parent, changes from any lower-level transforms will not reach the deconstructor-only view, so that view should shadow such transforms and provide its own substitutes instead (for example, in a vertical file listing, left and right keyboard arrows can be disabled, and up and down arrows will move a file mark rather than cursor).

4transforms

A transform is TE’s name for a piece of code which is run in certain circumstances. It can be a keyboard macro, a function called by several keyboard macros, or a function toggled by some system event (this list is not exhaustive). In ET, a transform is a complete program (roughly equivalent to a stylesheet in XSLT) that converts the tree of one of the views into a different tree in the same view. A difference from XSLT is that an et:transform is not the root element of a document but can be part of larger structures.

Another difference is that the default built-in template rule within et:transform copies all nodes, including element nodes, text nodes, attribute nodes, etc., to the output tree. That is, instead of the built-in template rules of XSLT, the following single built-in template rule is in effect:

<et:template match="node()">
  <et:copy><et:apply-templates select="node()"/></et:copy>
</et:template>

Also, transforms in ET never strip whitespace-only nodes by default; this is important for preserving untagged data. When doing serialization, ET only support the xml output method and always has indent set to no.

Note that these rules can also improve performance of ET transforms, because in an average transform, most nodes in the tree won’t need to be processed or even copied from a source tree to target tree. Applying the built-in template rule to a node means that you can simply reuse the original node object by putting a reference to it into the tree you are constructing.

A number of built-in transforms is supplied for TE’s built-in views. Users can add their own transforms for any views. Each transform is applicable to only one view, specified in ET as follows:

<et:transform target-view="chars"> 
  <!--...-->
</et:transform> 

The target-view attribute of a transform does not specify the namespace of the elements it affects. This namespace must be declared separately in each transform, and some transforms can work on elements from more than one namespace (e.g. transforms for compound views and for xml-tree). The target-view only tells TE when to run this transform (namely, it can only be run if the specified view is active or when that view is called from a constructor of an active view).

Generally speaking, a view is more than just its constructor and deconstructor. A set of useful transforms which work on the tree of the view is likely to be distributed as a package together with the constructor/deconstructor pair. So, besides the meaning of “XML tree produced by the corresponding constructor,” the term view can also mean a package consisting of the constructor/deconstructor, a number of transforms, and related documentation (including a schema for the view).

4.1priority rules

Most basic editing operations (single-character cursor movements, typing text, del, backspace, etc.) are built-in transforms in the chars view. These transforms are bound to the most often used keys and to common system events. Generally, descendant views should avoid defining new transforms for these keys or events. However, they can do so if they have a reason, in which case it is said that the transforms of the lower-level view are shadowed.

It is natural to postulate that whenever two views have transforms for the same event, the one from the view which implements a higher level abstraction (i.e. is farther away from the root of the hierarchy) will have the priority. It is yet unclear how to handle situations where two conflicting views are on the same level (explicit priority setting? the one defined last wins?), and even whether this situation is going to be common. Also, it probably can be useful to be able to designate some of the transforms as additive, meaning they can run after some other transform bound to the same event has finished (normally only one transform per event is allowed).

4.2execution

Only one transform in one view may be processed at a time, and no constructors or deconstructors may be activated until a transform on their source view is finished. However, constructors and deconstructors may work in parallel if they do not conflict in the hierarchy of the views. So as soon as a transform on a view is over, one deconstructor to its parent and several constructors to its independent children may be started simultaneously, to speed things up.

4.3examples

This section is necessarily very incomplete, as extensive examples can only be written after the basic principles, outlined before, are more or less stable and there is at least a preliminary implementation of these principles. So, more examples may be added to this section in the future. Still, the examples below can give you some idea of TE’s approach to managing editing tasks.

Namespace declarations are stripped for brevity; we assume that the et prefix is bound to the ET namespace and all other prefixes refer to the namespaces of corresponding views or aspects.

4.3.1basic operations

4.3.2two transforms working together

Imagine you need to highlight all lines containing the word “Thus” in the beginning of a sentence. For this, you need to transform two different views:

<et:transform target-view="text"> 
  <et:template match="word[text()='Thus'][parent::sentence][not(preceding-sibling::word)]">
    <et:copy>
      <et:attribute name="mark:mymark">1</et:attribute>
      <et:apply-templates select="node()"/>
    </et:copy>
  </et:template>
</et:transform> 
<et:transform target-view="lines"> 
  <et:template match="line[@mark:mymark != 0]">
    <et:copy>
      <et:attribute name="vis:color">#bright-yellow</et:attribute>
      <et:apply-templates select="node()[not(self::attribute() and self::mark:mymark)]"/>
    </et:copy>
  </et:template>
</et:transform> 

This editing operation consists of two transforms, one immediately following the other. However, as soon as the first transform, in text, has finished, TE updates the view hierarchy, and the new mark:mymark attribute propagates into all active views. This is why the second transform, in lines, can rely on having that attribute in the line that needs to be highlighted. After that, the mark:mymark attribute is removed, by specifically excluding that attribute from the node set recursively processed by et:apply-templates.

4.3.3reformatting a paragraph

When editing texts, one often needs to fit a piece of text into a column of given width, adding or removing newlines appropriately. In TE, the following simple transform will reformat (or “fill,” in Emacs terminology) the current paragraph:

<et:transform target-view="tex-pars"> 
  <et:variable name="w" select="0"/>
  <et:template match="tex-par[@mark:cursor != 0]">
    <et:copy>
      <et:apply-templates select="@*"/>
      <et:for-each select="et:view(., 'tokens')/*">
        <et:assign name="w" select="$w + length(.) + 1"/>
        <et:value-of select="."/>
        <et:choose>
          <et:when test="$w &gt; $par-width">
<et:text>
</et:text>
          <et:assign name="w" select="0"/>
          </et:when>
          <et:otherwise>
<et:text> </et:text>
          </et:otherwise>
        </et:choose>
      </et:for-each>
    </et:copy>
  </et:template>
</et:transform> 

We assume that the target width of the paragraph is stored in the global variable $par-width. The transform constructs a tokens tree out of the current tex-par (paragraph separated by empty lines) and traverses all of its element nodes, thereby ignoring the untagged whitespace. For each token, the counter variable $w is incremented by the length of the token plus 1, then the token itself is output, and finally either a newline or a space is output depending on whether $w exceeded $par-width or not. Thus, this transform not only reformats the paragraph but normalizes space within it, i.e. replaces multiple adjacent whitespace characters by a single space or newline.

4.3.4syntax coloring

Syntax coloring is a transform that converts a view’s tree into itself, adding or modifying the vis aspect attributes along the way. After the transform has finished, this modified aspect propagates down to the chars view, which is where TE takes the vis values from to repaint its editing window.

A fragment of a simple syntax coloring transform could look like this:

<et:transform target-view="java-code"> 
  <!--...-->
  <et:template match="keyword">
    <et:copy>
      <et:attribute name="vis:color">#white</et:attribute>
      <et:apply-templates select="node()"/>
    </et:copy>
  </et:template>
  <!--...-->
</et:transform> 
A generic syntax coloring transform, applicable to many views, could be part of the editor’s system core. Such a generic transform could take its table of correspondences between element type names and their formatting parameters from some external XML repository. So for the user, the task of modifying a color scheme would amount to editing a simple XML file rather than rewriting the ET code.

Just like views in TE form a hierarchy, syntax coloring can be hierarchical, too. As a rule, higher level views have priority in setting the vis aspect, but they don’t need to completely obscure the coloring done by their ancestors. Instead of setting some fixed values, a transform can use the original vis values (inherited from the parent view) in calculating the new values for those attributes.

For example, a lower-level java-code view could add its own coloring for keywords, identifiers, delimiters, etc. Now, if there is a higher-level view used, for example, to highlight all static function definitions, a corresponding transform, instead of setting some uniform color for the entire definition, could simply make all colors within those definitions 10% brighter. With this approach, a higher-level construct (function definition) becomes easily recognizable among its siblings, but all lower level constructs (keywords etc.) also remain discernible. A similar approach could be used for font sizes (using relative changes rather than absolute values) and even font faces (using generic font families in one view and font varieties, such as bold or italic, in another).

5conclusion

Experience shows that starting a new project from scratch is often preferable to trying to build upon a huge old codebase with plethora of historical quirks, doubtful design decisions, and plain weird code. This rule, in my opinion, applies perfectly to the situation with Emacs. There are more than a few problems with Emacs which, although solvable to some extent, would be much easier to deal with on a fresh soil.

One such problem is Unicode support. Most flavors of Emacs are having hard time trying to properly support this new (relative to the Emacs itself) standard. A new editor based on XML will be much more straightforward in this regard, because XML included Unicode right from the start.

So far, there’s very little done for TE except the present paper. I need your support to get the project rolling. I have set up a page at sourceforge that you can use to discuss the project, contribute your ideas and critique. Most importantly, the project is in need of developers willing to create at least a proof-of-the-concept demo that would allow to estimate how the above ideas perform in practice. If this works out, the next step is an open source implementation of the editor’s basic architecture.

SourceForge Logo Last updated: Sat Apr 27 13:34:52 GMT-04:00 2002