Working
with DTDs
Understanding
Document Type Definitions (DTDs)
A
Document
Type Definition (DTD) is a file that describes the structure of a
group of documents, by means of declarations written in a formal notation
described by the international XML standards committee. Think of it as the
definition of the grammar of an XML document.
If you will be
working regularly with XML, it's a good idea to bookmark the URL of the
Worldwide Web Consortium (W3C) standards organization (http://www.w3.org/XML/)
to stay abreast of critical developments, and to refer to current standards.
The DTD is at
the heart of your XML or SGML implementation. It defines the types of data you
want to collect, process, and present, as well as the relationships between
data elements. Data elements defined in the DTD are the building blocks of your
entire customization.
Every XML file
you edit with XMetaL must be associated with either a schema or a DTD. SGML
files can only be associated with DTDs. These are typically provided to
developers by systems analysts and designers. There are also numerous proven
DTDs available without charge from reputable bodies on the Web.
Defining your
DTD is critical to your project's workflow design. Before you begin any coding,
you should have a carefully considered DTD already in place. Even so, DTDs will
usually require tweaking or maintenance as the project develops.
This lesson
assumes you are familiar with XML. However, we will review some core concepts
as a refresher, and also as a way of introducing you to some important rules
about XMetaL's treatment of DTDs, rules files, and schemas.
This lesson
will take about 30 minutes to complete.
Understanding
Schema and Rules Files
A schema is the definition of a
structural model for a group of XML files. Schema are typically used for
machine validation of XML document structure. If you work with a software
application or client/server application that recieves a large quantity of XML
files from a variety of sources, you may have to work with Schema.
Human beings
use schema in the broad sense all the time in their daily lives. Imagine a
postal address. You'll know the structure of an address immediately from your
mental schema of an address - it has a name, a street address, a city, a
country, and a postal code or zip code.
XML capital-S
Schemas most often define what is invalid, through what are known as
"constraints". In the XML world, Schemas are regularly used to ensure that
things like address elements don't contain two instances of <city>, or a
<country> value that contain only integers.
The W3C
committee also defines rules for Schema, and if you work with Schema, you
should visit their site (http://www.w3.org/XML/Schema). Here you'll find
definitions and standards, as well as valuable tools for validating schema
files.
One example of
a schema with wide industry support is the RIXML schema specification, which
provides a common language to improve the value of broker investment research.
Support for RIXML schemas can be developed using XML.
Schema files
have the file suffix "XSD", for XML Schema Definition. If properly constructed,
XML Schema can replace a standard DTD - and add functionality at the same
time.
XML Schemas
have certain advantages over DTDs. They:
- are extensible to
future additions
- are richer and
more useful than DTDs
- are written in XML
- support data types
- support
namespaces.
There are
certain limitations to the use of schemas with XMetaL:
-
Identity-constraint definitions are ignored
- Wildcards are
ignored
- The
<redefine> tag is not supported
- The instance
attributes xsi:nil and
xsi:type are ignored, and cannot be
edited in Normal or
Tags On view.
- Checking an XML
Schema (.xsd file) for errors is limited in XMetaL. We recommend the use of
third-party tools, such as those available from W3C
(http://www.w3.org/XML/Schema).
XMetaL can use
DTDs or Schema in two formats: as text files, or as a
rules file. A rules file is a DTD or
Schema that has been compiled into binary format. When you open or create a
document that uses a DTD for which there is no corresponding rules file, XMetaL
automatically compiles a rules (.rlx) file that encodes the information in the
DTD. Rules files for DTDs have a ".rlx" extension, while rules files for a
Schema have a ".rld" extension.
Viewing and Modifying DTDs and Schemas
You can view
DTDs and Schemas from XMetal Developer by just double-clicking on their file
names in the Solution Explorer.
Try
double-clicking on the "Meeting.dtd" file in the
MeetingMinutes customization. You'll
see a list of all elements in the project, with their properties listed in the
Properties window. You'll find that
you will often reference this file when building XML solutions.
To edit a DTD
or Schema, use the Visual Studio .NET XSD Viewer/Editor. For further
information, see the VS .NET online documentation.
Understanding DOCTYPE Declarations
An XML
document starts with a declaration called a document type declaration
(DOCTYPE). The DOCTYPE associates the document with a DTD or rules file by
means of an external identifier.
Every XML
document you create will contain a DOCTYPE declaration, and unless its syntax
is correct, your document will not be valid, so it's a good idea to become
familiar with DOCTYPEs.
Here is an
example of a DOCTYPE declaration:
<!DOCTYPE
BOOK PUBLIC "-//Blast Radius//Book v1.0//EN" "book.dtd">
Following the
DOCTYPE keyword is the document type
name. In the above example, the document type name is BOOK. By default,
this is the top-level element in the DTD or rules file. However, as you are
editing a document, XMetaL changes this to the current top-level element in the
document.
Following the
document type name is an external
identifier. An external identifier consists of the keyword SYSTEM or
PUBLIC, followed by a string of characters inside double quotes that indicate
the location of the DTD or Schema. If the external identifier starts with
SYSTEM, it has only a system identifier; if it starts with PUBLIC, it has a
public identifier followed by a system identifier.
The
system identifier is generally the
filename or URL of the DTD or rules file. The
public identifier is an arbitrary
identifier, usually one agreed upon by various organizations that use the DTD.
Certain DTDs used by a large number of organizations have a standard public
identifier.
Here are two
examples of DOCTYPE declarations, one with a PUBLIC keyword and one with a
SYSTEM keyword, that could be used to refer to the same DTD:
<!DOCTYPE
BOOK PUBLIC "-//Blast Radius//Book v1.0//EN" "book.dtd">
The keyword
PUBLIC indicates that the first string in quotes that follows it is the public
identifier, and the second string in quotes that follows it is the system
identifier. This DOCTYPE refers to a DTD that has the public identifier
-//Blast Radius//Book v1.0//EN and
the system identifier book.dtd.
Now let's look
at another reference to the same file:
<!DOCTYPE
BOOK SYSTEM "book.dtd">
The keyword
SYSTEM indicates that the identifier that follows it is the system identifier.
If the external identifier starts with SYSTEM, there cannot be a public
identifier. This DOCTYPE refers to a DTD that has the system identifier
book.dtd.
Understanding the Internal Subset of a DOCTYPE
Instead of, or
in addition to, the external identifier, the DOCTYPE declaration can have an
internal subset containing further declarations. An external DTD file is known
as the "external subset", while similar definitions inside an XML document are
called the "internal subset". Both work together to create a document type
definition.
For
example:
<!DOCTYPE
Article SYSTEM "journalist.dtd" [ <!ENTITY Title "Weasel populations in a
forest in Poland"> ... ]>
Take a look at
the ENTITY declaration above. ENTITY is an XML "attribute" type. Attributes are
additional information associated with an element type, intended mainly for
text and markup interpretation by a software application. All attribute values
must be in quotes.
The internal
subset can contain attributes such as ELEMENT, ATTLIST, and ENTITY
declarations. Attribute declarations in the subset are read before declarations
in the external DTD or rules file, and therefore they override any external
declarations of the same attribute or entity.
ATTLIST
declarations identify which element types may have attributes, what type of
attributes they may be, and what the default value of the attributes are.
ATTLIST declarations specifying different attributes of the same element are
combined, but if the same attribute is specified both internally and
externally, the specification in the internal subset takes precedence.
Duplicate
ELEMENT declarations are not allowed and result in an error message.
A
DOCTYPE declaration can omit the external identifier, so that the document's
DTD is internal (contained completely in the internal subset). For example:
<?xml
version="1.0" standalone="yes"?> <!DOCTYPE Article [ <!Element Article
(Title, Sect1+)> <!Element Title (#pcdata)> <!Element Sect1
(Title,Para+)> <!Element Para (#pcdata)> <!Attlist Article Id ID
#IMPLIED> ]> <Article> ... </Article>
The internal
subset can refer to an external DTD using a parameter entity reference:
<?xml
version="1.0"?> <!DOCTYPE Article [ <!Entity % journalist.dtd SYSTEM
"journalist.dtd"> %journalist.dtd; ]> <Article> ...
</Article>
When the users
create an entity with any of the entity-creation commands in XMetaL Author's
Tools menu, the entity declarations are placed in the internal subset. However,
if the internal subset contains any declarations other than ENTITY
declarations, it is read-only from the Tags On and Normal views, and the
entity-creation commands are unavailable.
Mapping External Identifiers to Files
XMetaL uses the OASIS catalog mechanism to associate the external identifier in
a DOCTYPE or in an external entity declaration with the name and location of a
DTD, rules file, or entity file.
You would
typically use this mechanism only in the following situations:
- If the document's
DOCTYPE contains only a public identifier.
- If the DTD or
rules file is not stored in the Rules folder.
- If the system
identifier in the DOCTYPE does not match the DTD or rules file that you want to
use.
If the catalog
mechanism does not provide a result, XMetaL tries to resolve the external
identifier using the following methods, in the order given, until a result is
obtained.
- The external
identifier map file (extid.map). This mechanism is provided for backward
compatibility with previous versions of XMetaL, and can be disabled.
- Attempting to
retrieve the system identifier as a URL (relative URLs are relative to the
document instance).
- Attempting to
retrieve the system identifier as a file path (relative paths are relative to
the document instance).
For a complete
and formal OASIS specification, see OASIS Technical Resolution 9401:1997
(http://www.oasis-open.org/specs/a401.htm).
Understanding the External Identifier Map File
XMetaL provides a backup mechanism called "the external identifier map file"
for mapping the external identifier in a DOCTYPE to the name and location of a
DTD or rules file. XMetaL uses this mechanism if the catalog mechanism does not
resolve the public identifier.
Note: You can disable the external
identifier map mechanism by setting use_extid_mapping to false in the
xmetal45.ini file.
You would
typically use this mechanism only in the following situations:
- If the document's
DOCTYPE contains only a public identifier.
- If the DTD,
Schema, or rules file is not stored in the Rules folder.
- If the system
identifier in the DOCTYPE does not match the DTD or rules file that you want to
use.
- If you want to use
patterns (regular expressions) to match a set of public or system identifiers
and map them on to a set of filenames.
The external
identifier map file is, by default, the file "extid.map" in the top-level
XMetaL folder. You can use a different file by specifying a value for extid_map
in the xmetal45.ini file.
The external
identifier map file consists of lines in this form:
public-id
system-id DTD/rulesfile
The first two values are strings or patterns that match the public and system
identifiers respectively. The third value is the name of the DTD or rules file
that these identifiers refer to. Here is an example:
"-//Blast
Radius//Book v1.0//EN" ! book.dtd
If you open a
file whose DOCTYPE contains the public identifier
-//Blast Radius//Book v1.0//EN,
XMetaL scans the external identifier map file until it comes to the line in the
example. It sees that the two identifiers match, and therefore it looks for the
DTD "book.dtd". The exclamation mark (!) is a special character that means
"match any identifier", so in this example it does not matter what the system
identifier is, or if one is present.
Setting up the External Identifier Map file
XMetaL needs
to refer to the external identifier map file (extid.map) only when the DOCTYPE
in a document does not have a system identifier that is the same as the
filename of a DTD or rules file stored in the Rules folder. Let's look at some
examples of this:
Using an alternative DTD/rules location
If you store
your DTD, Schema, or rules file somewhere other than the Rules folder, there
are two way to tell XMetaL the location.
You can put
the rules file location in the DOCTYPE explicitly:
<!DOCTYPE
BOOK SYSTEM "C:\DTDs\book.dtd">
Or, you can
use the extid.map to point to the location of the DTD or rules file.
"-//Blast
Radius//Book v1.0//EN" ! "C:/DTDs/book.dtd" ! "book.dtd" "C:/DTDs/book.dtd"
The first
example maps a public identifier to a DTD; the second maps a system identifier
to a DTD. Either form is valid.
Mapping one system identifier to another
By default, if
the system identifier specifies "dtdname.dtd", XMetaL automatically looks for
the rules file "dtdname.rlx". If the system identifier does NOT correspond to
the desired DTD or rules file in this regular way, you must create an entry in
the external identifier map file.
The system
identifier in the DOCTYPE may specify a DTD name, as in this example:
<!DOCTYPE
BOOK SYSTEM "book.dtd">
If you want to
use the rules file realbook.rlx, instead of book.rlx, you can either change the
DOCTYPE to refer to the rules file, or create an entry in the external
identifier map file that tells XMetaL which rules file corresponds to the DTD
name.
Note: If the DOCTYPE contains a reference
to a rules file (instead of a DTD), the DOCTYPE no longer adheres to the XML
specification.
To map a
public identifier to a file name, use an entry like this example:
!
"book.dtd" "realbook.rlx"
If you use
several rules files, and there is a regular correspondence between DTD names
and rules file names (other than the default correspondence between .dtd and
.rlx files), you can map them all using one entry.
For example,
if you use names of the form "anything.dtd" for all your DTD file names, and
call the corresponding rules files "anything.rules", the following line in the
external identifier map tells XMetaL to use the .rules file corresponding to
the DTD (no matter whether the public identifier is present, or what it
is):
!
(.*)\.dtd \1.rules
Using Catalogs
"Catalogs"
allow XML processing tools like XMetaL to use a local copy or fragment of a DTD
or Schema if it is available, even if your local XML document refers to a DTD
at an external URL.
XML Catalogs
are anchored in the root catalog (usually /etc/xml/catalog or defined by the
user). Catalogs are a tree of XML documents defining the mappings between the
canonical naming space and the local installed ones, in a static cache
structure. When XMetaL is asked to process a resource, it will automatically
test for a locally available version in the catalog, starting from the root
catalog, and possibly fetching sub-catalog resources, until it finds (or does
not find) that the catalog has that resource.
If the catalog
can't help XMetaL locate a resource locally, it will look to the Web, allowing
in most cases for a recovery from a catalog miss. This gives the document
considerable platform independence.
XMetaL can use catalog files to help identify external references. Let's look
at some examples:
PUBLIC
"ISO 8879-1986//ENTITIES Added Latin 1//EN" "isolat1.ent" SYSTEM "sqdoc.dtd"
"sqdoc-xml.dtd" ENTITY face1 "c:\project1\smallfaces\face1.gif"
The PUBLIC
entry in the first line above associates the public identifier "ISO
8879-1986//ENTITIES Added Latin 1//EN" with the filename "isolat1.ent". This
entry could resolve the following declaration in a DTD file:
<!ENTITY
% isolat1 PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN"> %isolat1;
When XMetaL
encounters the "%isolat1" entity reference, it scans the declaration of the
isolat1 entity, and finds a public identifier. It then looks in the catalog
file for a PUBLIC entry matching the same identifier. The filename
(isolat1.ent) specified in this entry is then used as the replacement for the
entity reference.
The SYSTEM
entry associates the system identifier "sqdoc.dtd" with the filename
"sqdoc-xml.dtd". This entry could resolve the following DOCTYPE declaration.
<!DOCTYPE
DOC SYSTEM "sqdoc.dtd">
When XMetaL
reads this declaration at the top of an XML document, it finds the system
identifier "sqdoc.dtd" and then looks in the catalog file for a SYSTEM entry
matching that identifier. The filename (sqdoc-xml.dtd) found in this entry is
then used as the DTD file for the document.
The ENTITY
line associates the entity name "face1" with the filename
"c:\project1\smallfaces\face1.gif". When XMetaL encounters a reference to the
external entity "face1", it scans the declaration of face1 for a system and/or
public identifier. It then reads the catalog file, looking for SYSTEM and/or
PUBLIC entries specifying these identifiers. If it does not find a matching
entry, it then looks for an ENTITY entry that matches the entity name in
question. In the example above, the file "c:\project1\smallfaces\face1.gif"
would be used as the replacement for the entity reference.
Note: Filenames can be absolute or relative
paths, or URLs. Relative filenames in a catalog are interpreted as relative to
the location of the catalog file, unless the catalog file contains a BASE
entry. It is best to use backslashes with Windows file paths, but forward
slashes are accepted.
Locating Catalog Files
Let's say for
the sake of illustration that our current document is called "docname.xml", and
it's located in the folder "docfldr". If XMetaL needs to resolve a reference to
an external entity by referring to a catalog file, it searches for the files
listed below, in the order given, until it finds the catalog file that
references it.
The file
referencing the needed resource is the root catalog, and it may have links to
other catalog files via the CATALOG and DELEGATE keywords (see below), which
may in turn have their own links, and so on. XMetaL looks for matches only in
the root catalog file and its linked files (at all levels of linking).
-
docfldr\docname.soc (a file in the same folder as the XML document, whose name
is the same as the document except for the .soc file extension)
- docfldr\catalog (a
file called catalog in the same folder as the XML document)
- docfldr\catalog.soc (a file
called catalog.soc in the same folder as the XML document)
- Rules\catalog (a
file called catalog in the XMetaL Rules folder)
- Rules\catalog.soc
(a file called catalog in the XMetaL Rules folder).
There are two
ways to specify alternate catalog files from within a catalog file:
A catalog file entry such as:
CATALOG
"catalog2"
specifies an
alternate catalog file. If XMetaL encounters such an entry, it continues
reading the current catalog file, and if it does not find a matching entry, it
reads the alternate file. If no matching entry is found, XMetaL continues with
the next catalog file in the normal sequence. A catalog file can contain
several CATALOG entries.
A
catalog file entry of the form
DELEGATE
public-id-prefix catalog-file
can be used if
XMetaL is currently attempting to match a public identifier (though PUBLIC
entries take precedence). If XMetaL encounters one or more DELEGATE lines (in a
single catalog file) in which the public-id-prefix matches a substring of the
public identifier in question (starting at the first character) then XMetaL
looks for matching entries in the catalog files specified by the DELEGATE
entries. It does not return to the normal sequence of catalog files.
Using Catalogs to give Priority to Identifiers
The system
identifier (if there is one) in an external entity declaration is generally the
real name of the file represented by the entity. Sometimes, however, this may
not be the case, and the catalog mechanism provides the option of using other
means to obtain the filename:
If the catalog
file contains a SYSTEM entry matching the system identifier in question, then
the filename specified in that entry is used to resolve the entity reference.
If the catalog
file contains the entry OVERRIDE YES and there is no matching SYSTEM entry,
then
- If the entity
declaration contains a public identifier, and a matching PUBLIC entry is found,
then the filename specified in that entry is used to resolve the entity
reference.
- If a matching
ENTITY entry is found, then the filename specified in that entry is used to
resolve the entity reference.
- Otherwise, the
system identifier is used to resolve the entity reference.
If the catalog file contains the entry OVERRIDE NO and there is no matching
SYSTEM entry, then the system identifier is used to resolve the entity
reference. In this case XMetaL does not attempt to match the public identifier
or entity name.
An OVERRIDE
YES or OVERRIDE NO entry is in effect until the end of the current catalog
file, or until an OVERRIDE entry with the opposite setting is encountered.
The default
mode (YES or NO) is set using the "OASIS_override" setting in the
"xmetal45.ini" file. The default setting is true (YES).
Setting up a DTD
In the
simplest case, a DTD consists of only a single file. Often, however, several
files are involved:
- The main DTD file
- DTD fragments
referred to in the main DTD file
- Files of entity
declarations referred to in the main DTD file or a DTD fragment
- An attribute
description file.
In order for
your DTD to be read correctly by XMetaL, any required DTD fragments and entity
files must be at the locations specified by the system identifier used to refer
to them. For example, a DTD fragment may be referenced in the following entity
declaration in the DTD:
<!ENTITY
% calsdtd PUBLIC "CALS Table DTD" "dtds/cals.dtd">
In this case, the required DTD fragment should be in the file "cals.dtd",
located in the folder "dtds", which should be in the same folder as the main
DTD file.
The attribute
description file should be located in the same folder as the main DTD file; if
the DTD is named "dtdname.dtd", the attribute description file should be named
"dtdname.att".
Note: When you open or create a document
that uses a DTD for which there is no corresponding rules file, XMetaL compiles
a rules (.rlx) file that encodes the information in the DTD. XMetaL then uses
the rules file instead of the DTD. If the DTD is changed, the rules file must
be deleted so that XMetaL can automatically recompile a new rules file.
Understanding the Attribute Description File
An attribute
description file provides help strings for the Attribute Inspector. This file
contains descriptions of attributes, which are displayed at the bottom of the
Attribute Inspector when you click an attribute name. The attribute description
file consists of entries of the form:
Element
Attribute "Help String"
This example
supplies a help string for the SECURITY attribute of PARA:
Para
Security "Security level"
Attribute
description files can be used with DTDs or compiled rules files. The attribute
description file for a DTD must have the same name as the DTD, but with the
file extension changed to .att; it should be in the same folder as the DTD (by
default, the folder Rules). If you are compiling a rules file, you can choose
the attribute description file from the XMetaL Rules Maker interface.
Understanding Content Types
XMetaL follows
the content types set out in W3C XML Specifications. Content types that are
accepted by XMetaL are:
- Mixed content -
Can contain mixed content (a mixture of element, CDATA...)
- Element content -
Contains only elements.
- Character data
(CDATA) - Contains only CDATA.
- Replacable
character data (PCDATA) - Contains only PCDATA.
- Any content - Can
contain any or none of the different sets above.
- Empty content -
Must be empty i.e., <element/>.
Go to the next lesson: Creating Simple XFT
Forms...
Last modified: Friday, May 21, 2004 4:26:43 PM