XML
XML stands for eXtensible Markup Language. XML is used to
store data. Over the years it become major format for
transporting data. XML is supported by almost any computer
language: from Java, C, to PHP and JavaScript. XML document
is easily readable to humans and it forms a tree structured
data hierarchy.
Although XML looks very similar to HTML it is not used to
display data, but rather to store and carry data between
applications.
- XML is W3C recommendation.
- XML tags are not predefined. You must define your own tags.
- XML is more or less self-descriptive.
- XML documents are tree-structured.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<orders>
<order id="123456" >
<name>John Hungry</name>
<food>Pizza</food>
<drink>Orange Juice</drink>
</order>
</orders>
XML syntax rules
- XML document must have one element which is the parent of all other elements, called ‘root’ element. Other elements are called ‘parent’ and ‘child’ elements.
- All XML tags must have a closing tag.
- XML tags are case sensitive.
- XML elements must be properly nested.
- Avoid using special characters, since you might end up with parsing errors (instead of < use its entity reference <)
- Comments look just like in HTML: <!-- This is comment -->
- XML attribute values must be quoted.
- Errors in XML will stop your application!!
Attributes provide additional data about the element.
In some cases elements use attributes as references to elements
(eg. id="123456"). Such data is called metadata, since it has no
value to the user, but your xml application would need these attributes
to identify different elements.
Another example would be: <file type="jpeg">photo1.jpg</file>.
In general it is better to use elements rather than attributes. Comparing to elements, attributes cannot contain multiple values, cannot form tree structures and are not easily extendable.
Displaying XML
XML documents are plain text files and can be opened with any text
editor. Even web browsers are able to display something. Don’t expect
to see a page like you would with HTML.
Some browsers (IE, Firefox...) display XML as text with plus (+) and
minus (-) signs, so you can expand or collapse the tree structure of
the displayed XML document. Other browsers (Safari...) will display only
text inside start and end tags. To see complete XML document choose
'View Source' option within web browser.
But there is still a way to display XML as HTML with XSLT. Google around to find more information.
XML validation
An important property of XML is that element names are custom and can be defined in another file. This file is used to validate XML document if it conforms certain rules. 'Valid' XML document is 'Well formed' XML document which is structured according to the rules defined in file. Two types of prescriptions are commonly used to define the structure of XML: DTD and XSD.
DTD
Document Type Definition defines the XML structure with all elements and attributes. DTD can be be declared within an XML document or as external file.
Internal DTD
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE orders [
<!ELEMENT orders (order+) >
<!ELEMENT order (name, food, drink) >
<!ELEMENT name (#PCDATA) >
<!ELEMENT food (#PCDATA) >
<!ELEMENT drink (#PCDATA) >
]>
<orders>
<order>
<name>John Hungry</name>
<food>Pizza</food>
<drink>Orange Juice</drink>
</order>
</orders>
External DTD(save this in file named order.dtd in the same directory as order.xml)
The order.xml file would look like:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE orders SYSTEM "./order.dtd">
<orders>
<order>
<name>John Hungry</name>
<food>Pizza</food>
<drink>Orange Juice</drink>
</order>
</orders>
XML Schema
Alternative to DTD is XML Schema. It does the same as DTD: defines elements, attributes and the overall structure of xml. Schema is stored in a file .xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="orders">
<xs:complexType>
<xs:sequence>
<xs:element name="order">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="food" type="xs:string" maxOccurs="unbounded"/>
<xs:element name="drink" type="xs:string" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Import schema into xml
<?xml version="1.0" encoding="UTF-8"?>
<orders xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="./order.xsd">
<order>
<name>John Hungry</name>
<food>Pizza</food>
<drink>Orange Juice</drink>
<drink>Bottled Watter</drink>
</order>
</orders>
Many development platforms (Eclipse, Netbeans, XMLSpy...) have XML validation feature already included. It helps developers not to make mistakes implementing xml for data transfer. Play with the examples - remove a line in XML document and you should see an error.
Also many APIs and libraries for parsing xml data include XML validation functionality. When your application receives xml data it should first validate it against the known schema, to be sure that the format of xml can be processed further.