XPS Documentation

Syntax vs. Semantics

An Old Debate

From the very early days of true markup languages (GML, SGML, XML), there has always been discussion about separating form from content. This is a similar discussion that programmers have about separating data from function. For document processing and markup, separating the form from the content is crucial. That is, there needs to be a distinction between the structure, content and information in a document from the way in which that information and structure is presented to a human. Furthermore, it is crucial that the content of a document stand alone since the use of the document may not have anything to do with rendering or presentation. Documents may need to be stored in a database, queried, analyzed to determine facts, etc. Embedding formatting and rendering instructions into a document makes processing it for other purposes very difficult.

Fortunately, the people that have dealt with markup languages for over 30 years recognized this need and also recognized that the complexity of SGML was getting in the way of markup languages being adopted widely. The result was the creation of the XML set of standards which defined a simple and regular syntax for markup languages and made structured document processing available to wide audiences and easier to implement in software systems. XML has also defined ways in which the content can be transformed to new form of content through the use of XML Stylesheets (XSL) and XSL for Transformations (XSLT). This makes it possible to render the same content in multiple styles. These standardized capabilities are exploited by XPS to support programming in a powerful new paradigm.

The concept of separating form from content also applies to programming languages. The syntax of a language can be compared to the form of a document. The semantics of a language can be compared to the content of a document. That is, the syntax is merely an expression or rendering of the underlying semantics which make up the content.

Programs Are No Different

Let me make one point very clear: Programs are no different; they are documents. Their content should be separated from their form. The systems that compile and interpret programs should not need to worry about the forms in which humans manipulate programs. They should just obtain the raw content of the program and process it. There needs to be a degree of separation between the language (or other mechanism) humans use to instruct the computer, and the input that is compiled to machine code. With the advent of XML, XSL, and XSLT there is now a way in which this can be accomplished. Furthermore, XML, although verbose and unforgiving, is quite readable and writeable by humans should there be a lack of an appropriate rendering system for XPS programs.

If you've done any significant amount of programming in large settings, you will have undoubtedly encountered a phenomenon known as the "language war". That is, certain zealots for a new programming language will rise in defiance against the reigning establishment of the existing programming language. Many of the language wars occurred because it was not possible to use the languages together (even though they do essentially the same thing) and thus most large organizations require that only one language be used for standardization purposes. In my experience, the language wars happened when C++ took over from Object Pascal, and when Java superseded C++. Undoubtedly there are camps dedicated to Eiffel, Ada, Lisp, and Prolog. Each camp has valid points that they bring to the table from the particular perspective of their programming needs. There's nothing wrong with a better idea. And, there's nothing wrong with using the language or tool that best matches the job at hand. The problem lies in the incompatibility of the languages in question; especially at the semantic level.

By separating form from content, we first eliminate the syntactic part of the language war. In the XPS, all languages use the syntax of XML. Period. End of discussion. While XML is not particularly convenient or friendly for programmers, it is expected that the programming community will eventually develop translators from a given language syntax (such as Java or FORTRAN) into a corresponding XML-based language which can then be translated for execution by the XPS. Thus, the use of XML as the base syntax for the XPS allows any syntax style to be used since the XML syntax is highly regular and simple (which makes translating to it simple).

A similar mechanism is used to eliminate the varying runtime issue: encapsulation. XPS encapsulates the details of a specific machine and operating system through an interface to an abstract virtual machine (XVM). Java and other byte code based languages use a similar technique to hide the platform dependent issues of their runtime environments. However, the XVM compiles to native code and thereby eliminates the inefficiencies of bytecode interpretation. Although a level of indirection exists between the XVM and the underlying platform, this is no more costly than the runtime libraries supported by existing programming languages.

So, now that we have eliminated the syntax and portability issues, we are left with language semantics. Since there are no concrete ways to determine if one semantic concept is better than another, XPS adopts the stance that no semantic concept is a bad one: all semantics are on equal footing. For example, if one programmer likes to program using a declarative logic programming style (such as in Prolog) while another prefers a procedural style (such as C), XPS simply and quietly says "you're both right" and allows programs in either semantic style to be expressed, compiled, and executed. Indeed, one of the goals of the XPS is to provide an infrastructure in which programming languages, styles, and semantics that have not yet been dreamt of come to fruition quickly and compatibly with existing technologies.

Once you give up your passion for your favorite language (semantics and syntax), then you begin to realize that:

  • All programs are highly structured specifications that lend themselves naturally to expression as XML documents.
  • An extensible programming language needs a simple and standardized syntactic and structural language (XML in this case) that forms the basis for interpretation of programs.
  • Use of XML does not preclude the creation of programs in alternative syntactic styles because it is relatively simple to translate an arbitrary language into XML.
  • An extensible programming language should support all semantic styles of programming, including those that have not yet been invented.
  • Programmers should not be dealing with programs at the level of syntax. They should only have a semantic model of what a program can do and a way to translate their semantic ideas into the syntax required by the programming system (XML in this case).