Open Node Syntax

Written By: Seairth Jacobs
Document Status: Draft
Version: 0.6.9
Last Updated: June 17, 2005

Introduction

Welcome to the Open Node Syntax (ONX, pronounced "onyx"). ONX was created as an alternative to the current markup languages used today. In particular, ONX spawned from my experiences with XML and the problems that I and others encountered along the way. ONX is designed to be data-oriented instead of document-oriented and is intended for use in platform-independent transfer of data over distributed systems, though it can be used for non-networked applications just as effectively. While it has similarities to other markup languages like XML, it is designed from the ground up with the following goals in mind:

To get a quick feel for what ONX looks like, here are a few examples (more thorough ones are given later in the specification):

:onx{
    :calendar{
        :entry{
            :date["2003" "1" "1"]
            :type["event"]
            :note["Happy New Year!"]
        }
        :entry{
            :date["2003" "3" "8"]
            :type["birthday"]
            :note["Buy self a present..."]
        }
    }calendar
}onx

:onx{
    :fields{
        :field["ID" "integer"]
        :field["city" "string"]
        :field["state" "string"]
    }
    :data{
        :record["1" "Norfolk" "VA"]
        :record["2" "Salem" "MA"]
    }
}onx

:onx{:request{:host["www.seairth.com"]:resource["/web/onx/onx.html"]}}onx

NOTE: It is important to point out here that ONX is not intended to be a universal replacement for XML, SGML, or any other markup language. It contains many features that are similar to other languages. In many cases, ONX could be used as an effective alternative. There are also some features that are not commonly found in other markup languages which make ONX more ideal for some uses. Take a look at the specifications. Figure out what your needs are and see if they match with ONX's capabilities. In many cases, you will be quite surprised to find how effective a solution ONX will be for you.

Specification

This is the entire set of rules for ONX (in EBNF):

  1. RootNode ::= ':onx{' (S | Node)* '}onx'
  2. Node ::= ValueNode | ContainerNode
  3. ContainerNode ::= ':' Name '{' (S | Node)* '}' (Name)?
  4. ValueNode ::= ':' Name '[' (S | Value)* ']' (Name)?
  5. Name ::= (Letter | '_') (NameChar)?
  6. NameChar ::= (Letter | Digit | '_' )+
  7. Letter ::= ['A'-'Z'] | ['a'-'z'] | [0xC0-0xD6] | [0xD8-0xF6] | [0xF8-0xFF]
  8. Digit ::= ['0'-'9']
  9. Value ::= '"' [0x00–0xFF]* '"'
  10. S ::= (0x09 | 0x0A | 0x0D | 0x20)+

ONX is organized into Information Blocks, or "infoblocks". An infoblock is defined as the RootNode and its contents. Inside the RootNode, there may be zero or more additional "nodes". Since the RootNode is used to identify the beginning and end of the infoblock, it is possible to have multiple infoblocks in a single stream without ambiguity.

The most basic structure in ONX is the "Node". Each Node starts with a Name. Node names have a standard format and are unlimited in length. However, any Name that starts with “ONX” or any case variation (e.g. Onx, onx, onX, etc.) is reserved for current and future standardization purposes. This means that user-defined nodes should never use this string to start a Name, since it is possible that some future standard would use that same name as a reserved node name.

Before continuing in detail about node types, note the use of whitespace [Rule 10] in ONX. This is entirely optional and is only intended as an aid when a human must directly read an ONX infoblock. As a result, arbitrarily inserting whitespace for readability does not change the meaning of the data. However, in production situations, it is more likely that all whitespace will be left out of the infoblock to reduce size and processing time.

Hint: ONX parsers should first test for the brace, bracket, quote, etc. characters before testing for whitespace. In production, this should reduce the number of tests performed, again assuming that a production system will not generate whitespace.

There are two Node types in an ONX infoblock:

ValueNode

The ValueNode is the next most basic type of Node. In this case, the purpose of the node is to provide the ability to convey Name/Value pairs. The name can be associated to either a no value, a single value, or a list of values separated by commas. The contents of a value can be anything including binary data. The value itself is always enclosed in a pair of quote symbols. Examples of a ValueNode are:

:Node[]
:Node["This is a value of this ValueNode"]
:Node["This is a value of this ValueNode" "This is also a value in the same ValueNode"]

The ValueNode can end in one of two formats:

Since the beginning and end of a value is delimited by the quote symbol, this means that the quote symbol cannot be in the value without additional consideration. To solve this problem, values can contain Escape Sequences. An escape sequence in ONX is similar to the C/C++ notion of escape sequences. Each escape sequence starts with the escaping character "\". There are a total of five allowable escape sequences:

Examples of escape sequences in ValueNodes are:


:phrase["The word \"test\" is used here."]
:string["First Line\x0D\x0ASecond Line."]
:sample["Showing Escape Sequence \\x0D\\x0A"]
:data["\[9]\"As-Is\" and \"Not As-Is\""]

The decoded values of the above examples would like like:


The word "test" is used here.
First Line
Second Line.
Showing Escape Sequence \x0D\x0A
\"As-Is\" and "Not As-Is"

ContainerNode

The ContainerNode is the most complex Node type. The purpose of a ContainerNode is to contain other Nodes. The contained nodes can be of any node type. This allows Nodes to be arranged or organized in a hierarchical manner. There is no limitation to the maximum depth that ConatinerNodes may be nested. As a result, both simple and complex data structures can be represented as needed.

Note: Is is also important to point out here that the RootNode is a special-purpose ContainerNode. As a result, it may be possible for developers to take advantage of this characteristic when writing or using ONX parsers/processors.

The ContainerNode can end in one of two formats:

Sample ONX Infoblock(s)

Here are a few samples of complete ONX infoblocks:

:onx{
:Messages{ :Message{ :From["seairth@seairth.com"] :To["seairth@seairth.com"] :Subject["Concerning ONX..."] :Body["This is a simple ONX sample."] }Message :Message :From["seairth@seairth.com"] :To["seairth@seairth.com"] :Subject["A Slightly Bigger Message"] :Body["In this case, each value in this ValueNode might indicate" "each \"line\" of this message body." "" "Seairth" "seairth@seairth.com" ]Body }Message :Message :From["seairth@seairth.com"] :To["seairth@seairth.com"] :Subject["Whitespace in a value"] :Body["As was stated above, the value can contain anything that we want, which means that the CR/LF combination at the end of the prior line is actually part of the value. In the next value, the \\x0D\\x0A are escaped versions of the CR\LF pair and are also considered to be embedded whitespace (once they are evaluated, anyhow)." "\x0D\x0ASeairth seairth@seairth.com" ]Body }Message }Messages }onx :onx{:Request{:Name["GetPopulation"]:Parameters["US" "Virginia" "Norfolk"]:ReturnAs["Number"]}}onx :onx{ :Database{ :Name["Inventory"] :Tables{ :Table{ :Name["Items"] :Header{ :Field{ :Name["id"] :Type["unsigned integer"] :AutoIncrement[] :PrimaryKey[] }Field :Field{ :Name["itemnumber"] :Type["string"] :Length["10"] :DefaultValue["New Item"] }Field }Header :Records{ :Record["1" "ABC123"] :Record["2" "XYZ789"] }Records }Table }Tables }Database }onx :onx{ :install-file{ :name["Super Application 1.0"] :platforms{ :platform["windows" "target1"] :platform["linux" "target2"] }platforms :target1{ :default-path["c:\program files\superapp\"] :run-post-install["sa_setup.exe"] :data["\[113C7F](imagine 1,129,599 bytes of raw binary data here)"] }target1 :target2{ :default-path["/bin/superapp/"] :run-post-install["sa_setup"] :data["\[E6DD6](imagine 945,622 bytes of raw binary data here)"] }target2 }install-file }onx

What's Missing/Different

Attributes

For those who are familiar with XML, attributes are Name/Value pairs associated with an element. One of the common disputes among XML developers is when to use attributes or child elements. In ONX, this is not an issue. There are only ValueNodes. ValueNodes can represent attributes or content (which can be thought of as just another attribute).

Comments

Many markup languages allow comments. While this may be fine for hand-coded document-oriented markup, comments do not fit into the ONX goals stated above. If comments are required for a given implementation, it can be implemented as a ValueNode.

Document Type Declaration/Definition

This has intentionally been left out of ONX primarily due to the lesson(s) learned from XML. XML comes with a DTD mechanism. However, many have found it to be inadequate. As a result, multiple XML-based schema languages have been created (e.g. W3C XML Schema, RELAX-NG, Schematron, etc.). It is likely that the same would happen for ONX. As a result, a DTD mechanism has been left out. I intend to create a schema language for ONX, but welcome any other implementations as well.

Special-Purpose Attributes

XML, SGML, and possibly other markup languages have special-purpose attributes types. For instance, XML has id, idref, idrefs, etc. If one or more of these special-purpose attributes are needed in an ONX implementation, then they can be implemented as a ValueNode that is recognized within the particular application that needs it.

Whitespace

When developing applications that use XML, it is common to use whitespace characters (0x09, 0x0A, 0x0D, 0x20) to make it easier to read and debug the XML documents. However, it can also be used as the content of a given element. In ONX, the whitespace serves no purpose other than making it easier to develop/debug infoblocks. Under ideal circumstances, whitespace would be left out altogether. However, this would make development and debugging more difficult.

What's Left

Unicode/ISO Character Sets

One of the next things that is apparent in the above specification is the lack of support for multi-byte character sets. There is a simple reason for this: I do not have enough experience with multi-byte character sets to implement this aspect myself. In order to get international support for ONX, Unicode support needs to be added. Anyone want to take a crack at it? If so, e-mail me.

ONX Parser(s)

Well, there is plenty to do here. I have one parser which I quickly wrote as a starting point. It is a simple event-generating parser similar to XML's SAX or expat.

ONX Representation

As with XML, it makes sense to have some sort of object model for ONX infoblock representation. However, since ONX is not document-oriented, the use of the Document Object Model would be confusing. So instead I suggest the object model be called the Infoblock Object Model (IOM). As for the implementation of such a model, this still has to be determined. If I come up with something workable, I will make it available online. In the meantime, any suggestions?.

Versioning

Currently, ONX does not have a way to indicate what version it is. I have a few thoughts on this:

Schema Language(s)

ONX is only concerned with well-formedness and not validity. This is where schema languages come in. What we need are some usable schema languages that allow ONX infoblocks to be validated. My only suggestion here is that it is important to keep the schema language as simple as possible. I frequently hear how large and complex the W3C XML Schema language is. ONX is meant to be simple. I would like to see a schema language to match.

Namespacing Mechanism

I have mixed feelings about adding this functionality to ONX, either directly or indirectly. Namespaces have been one of the most contentious issues for XML and I would expect no less than that for ONX. I tend to see namespaces being connected moreso to the issues of validation, not the issues of well-formedness. As a result, it makes more sense to me to address this issue at the schema-level, not the markup language level. Of course, lots of people seem to be very opinionated about this matter and I welcome those opinions (as long as they are constructive).

Collaborative Development

The current version of ONX was written with onyl a little input and feedback, much of which I got indirectly from conversations, articles, and other resources concerning markup languages and XML in particular. The current draft has had some additional direct feedback. However, in order for ONX to move towards its potential, I know that it must become a truly open and collaborative effort. To that extent, I would like to work towards putting together a group of people who would be willing to take the time and effort necessary to help promote and and improve ONX. With the immense popularity of XML these days, this is no small task. But, I feel that it is a task worth taking on. If you are someone who would like to help improve ONX now and in the future, e-mail me. Also, I would like some suggestions on how to go about providing the tools for the collaborative effort online. I know that there are many projects at SourceForge, but wondered if there was anything else out there, etc. I would even consider setting up a more formal location for ONX... but I feel that I am getting ahead of myself here. Let's start of simple and see where we go.

Much, Much More...

This is only the beginning of a new markup language. I feel that it has some good potential and can be used to help solve some of the problems we face today. As this markup language matures, there will be plenty more to add, improve upon, etc. And my hope is that anyone who is interested will help out. I think that, together, we can make ONX as popular (if not more so) as XML.

Miscellaneous

Specification Version Number

The specification/document version at the top of this document is indicated by three numbers separated by periods. The first number is the major revision number. This is used to indicate major changes in the specification such as the addition of new features. The second number is the minor revision number and indicates updates to the specification due to errata such as ambiguities or typographical errors. These changes can change the actual meaning of the specification, but not the intended meaning. The third number is the document revision and indicates changes to the documentation that do not affect the specification, such as spelling and gramatical corrections, adding new examples, rewording for increased clarity, etc.

License

This document is licensed under Creative Commons Attribution-ShareAlike (version 2.5). Specific details can be found at http://creativecommons.org/licenses/by-sa/2.5/.