XML

XML stands for eXtensible Markup Language. XML is used to store data. Over the years it become major format for transporting data. XML is supported by almost any computer language: from Java, C, to PHP and JavaScript. XML document is easily readable to humans and it forms a tree structured data hierarchy.
Although XML looks very similar to HTML it is not used to display data, but rather to store and carry data between applications.

- XML is W3C recommendation.
- XML tags are not predefined. You must define your own tags.
- XML is more or less self-descriptive.
- XML documents are tree-structured.

Example:

<?xml version="1.0" encoding="UTF-8"?>
<orders>
<order id="123456" >
<name>John Hungry</name>
<food>Pizza</food>
<drink>Orange Juice</drink>
</order>
</orders>

XML syntax rules

- XML document must have one element which is the parent of all other elements, called ‘root’ element. Other elements are called ‘parent’ and ‘child’ elements.
- All XML tags must have a closing tag.
- XML tags are case sensitive.
- XML elements must be properly nested.
- Avoid using special characters, since you might end up with parsing errors (instead of < use its entity reference &lt;)
- Comments look just like in HTML: <!-- This is comment -->
- XML attribute values must be quoted.
- Errors in XML will stop your application!!

Attributes provide additional data about the element. In some cases elements use attributes as references to elements (eg. id="123456"). Such data is called metadata, since it has no value to the user, but your xml application would need these attributes to identify different elements.
Another example would be: <file type="jpeg">photo1.jpg</file>.

In general it is better to use elements rather than attributes. Comparing to elements, attributes cannot contain multiple values, cannot form tree structures and are not easily extendable.

Displaying XML

XML documents are plain text files and can be opened with any text editor. Even web browsers are able to display something. Don’t expect to see a page like you would with HTML.
Some browsers (IE, Firefox...) display XML as text with plus (+) and minus (-) signs, so you can expand or collapse the tree structure of the displayed XML document. Other browsers (Safari...) will display only text inside start and end tags. To see complete XML document choose 'View Source' option within web browser.

But there is still a way to display XML as HTML with XSLT. Google around to find more information.

XML validation

An important property of XML is that element names are custom and can be defined in another file. This file is used to validate XML document if it conforms certain rules. 'Valid' XML document is 'Well formed' XML document which is structured according to the rules defined in file. Two types of prescriptions are commonly used to define the structure of XML: DTD and XSD.

DTD

Document Type Definition defines the XML structure with all elements and attributes. DTD can be be declared within an XML document or as external file.

Internal DTD

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE orders [ <!ELEMENT orders (order+) >
<!ELEMENT order (name, food, drink) >
<!ELEMENT name (#PCDATA) >
<!ELEMENT food (#PCDATA) >
<!ELEMENT drink (#PCDATA) >
]>

<orders>
<order>
<name>John Hungry</name>
<food>Pizza</food>
<drink>Orange Juice</drink>
</order>
</orders>

External DTD(save this in file named order.dtd in the same directory as order.xml)

The order.xml file would look like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE orders SYSTEM "./order.dtd">
<orders>
<order>
<name>John Hungry</name>
<food>Pizza</food>
<drink>Orange Juice</drink>
</order>
</orders>

XML Schema

Alternative to DTD is XML Schema. It does the same as DTD: defines elements, attributes and the overall structure of xml. Schema is stored in a file .xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="orders">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="order">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="name" type="xs:string"/>
            <xs:element name="food" type="xs:string" maxOccurs="unbounded"/>
            <xs:element name="drink" type="xs:string" maxOccurs="unbounded"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:element>
</xs:schema>

Import schema into xml

<?xml version="1.0" encoding="UTF-8"?>
<orders xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="./order.xsd">
<order>
<name>John Hungry</name>
<food>Pizza</food>
<drink>Orange Juice</drink>
<drink>Bottled Watter</drink>
</order>
</orders>

Many development platforms (Eclipse, Netbeans, XMLSpy...) have XML validation feature already included. It helps developers not to make mistakes implementing xml for data transfer. Play with the examples - remove a line in XML document and you should see an error.

Also many APIs and libraries for parsing xml data include XML validation functionality. When your application receives xml data it should first validate it against the known schema, to be sure that the format of xml can be processed further.