Syntax vs. Semantics
An Old Debate
From the very early days of true markup languages (GML, SGML, XML),
there has always been discussion about separating form from
content. This is a similar discussion that programmers have about
separating data from function. For document processing and markup,
separating the form from the content is crucial. That is, there needs
to be a distinction between the structure, content and information in
a document from the way in which that information and structure is
presented to a human. Furthermore, it is crucial that the content of a
document stand alone since the use of the document may not have
anything to do with rendering or presentation. Documents may need to
be stored in a database, queried, analyzed to determine facts,
etc. Embedding formatting and rendering instructions into a document
makes processing it for other purposes very difficult.
Fortunately, the people that have dealt with markup languages for
over 30 years recognized this need and also recognized that the
complexity of SGML was getting in the way of markup languages being
adopted widely. The result was the creation of the XML set of
standards which defined a simple and regular syntax for markup
languages and made structured document processing available to wide
audiences and easier to implement in software systems. XML has also
defined ways in which the content can be transformed to new form of
content through the use of XML Stylesheets (XSL) and XSL for
Transformations (XSLT). This makes it possible to render the same
content in multiple styles. These standardized capabilities are
exploited by XPS to support programming in a powerful new paradigm.
The concept of separating form from content also applies to
programming languages. The syntax of a language can be compared to the
form of a document. The semantics of a language can be compared to the
content of a document. That is, the syntax is merely an expression or
rendering of the underlying semantics which make up the content.
Programs Are No Different
Let me make one point very clear: Programs are no different; they
are documents. Their content should be separated from their
form. The systems that compile and interpret programs should not need to worry
about the forms in which humans manipulate programs. They should just
obtain the raw content of the program and process it. There needs to
be a degree of separation between the language (or other mechanism)
humans use to instruct the computer, and the input that is compiled to
machine code. With the advent of XML, XSL, and XSLT there is now a way
in which this can be accomplished. Furthermore, XML, although verbose
and unforgiving, is quite readable and writeable by humans should
there be a lack of an appropriate rendering system for XPS
programs.
If you've done any significant amount of programming in large
settings, you will have undoubtedly encountered a phenomenon known as
the "language war". That is, certain zealots for a new programming
language will rise in defiance against the reigning establishment of
the existing programming language. Many of the language wars occurred
because it was not possible to use the languages together (even though
they do essentially the same thing) and thus most large organizations
require that only one language be used for standardization
purposes. In my experience, the language wars happened when C++ took
over from Object Pascal, and when Java superseded C++. Undoubtedly
there are camps dedicated to Eiffel, Ada, Lisp, and Prolog. Each camp
has valid points that they bring to the table from the particular
perspective of their programming needs. There's nothing wrong with a
better idea. And, there's nothing wrong with using the language or
tool that best matches the job at hand. The problem lies in the
incompatibility of the languages in question; especially at the
semantic level.
By separating form from content, we first eliminate the syntactic part
of the language war. In the XPS, all languages use the syntax of XML.
Period. End of discussion. While XML is not particularly convenient
or friendly for programmers, it is expected that the programming
community will eventually develop translators from a given language
syntax (such as Java or FORTRAN) into a corresponding XML-based language
which can then be translated for execution by the XPS. Thus, the use
of XML as the base syntax for the XPS allows any syntax style to be
used since the XML syntax is highly regular and simple (which makes
translating to it simple).
A similar mechanism is used to eliminate the varying runtime issue:
encapsulation. XPS encapsulates the details of a specific machine and
operating system through an interface to an abstract virtual machine
(XVM). Java and other byte code based languages use a similar
technique to hide the platform dependent issues of their runtime
environments. However, the XVM compiles to native code and thereby
eliminates the inefficiencies of bytecode interpretation. Although a
level of indirection exists between the XVM and the underlying
platform, this is no more costly than the runtime libraries supported
by existing programming languages.
So, now that we have eliminated the syntax and portability issues,
we are left with language semantics. Since there are no concrete ways
to determine if one semantic concept is better than another, XPS
adopts the stance that no semantic concept is a bad one: all semantics
are on equal footing. For example, if one programmer likes to program
using a declarative logic programming style (such as in Prolog) while
another prefers a procedural style (such as C), XPS simply and quietly
says "you're both right" and allows programs in either semantic style
to be expressed, compiled, and executed. Indeed, one of the goals of
the XPS is to provide an infrastructure in which programming
languages, styles, and semantics that have not yet been dreamt of come
to fruition quickly and compatibly with existing technologies.
Once you give up your passion for your favorite language (semantics
and syntax), then you begin to realize that:
- All programs are highly structured specifications that lend themselves
naturally to expression as XML documents.
- An extensible programming language needs a simple and standardized
syntactic and structural language (XML in this case) that forms the basis
for interpretation of programs.
- Use of XML does not preclude the creation of programs in alternative
syntactic styles because it is relatively simple to translate an arbitrary
language into XML.
- An extensible programming language should support all semantic styles
of programming, including those that have not yet been invented.
- Programmers should not be dealing with programs at the level of syntax.
They should only have a semantic model of what a program can do and a way to
translate their semantic ideas into the syntax required by the programming
system (XML in this case).
|