3.4 DTD-Handling
The DTD (Document Type Definition) is a separate entity in sgml2pl, that can be created, freed, defined and inspected. Like the parser itself, it is filled by opening it as a Prolog output stream and sending data to it. This section summarises the predicates for handling the DTD.
- new_dtd(+DocType, -DTD)
- Creates an empty DTD for the named DocType. The returned DTD-reference is an opaque term that can be used in the other predicates of this package.
- free_dtd(+DTD)
- Deallocate all resources associated to the DTD. Further use of DTD is invalid.
- load_dtd(+DTD, +File)
- Define the DTD by loading the SGML-DTD file File. Same as load_dtd/3 with empty option list.
- load_dtd(+DTD, +File, +Options)
- Define the DTD by loading File. Defined options are the
dialect
option from open_dtd/3 and theencoding
option from open/4. Notably thedialect
option must match the dialect used for subsequent parsing using this DTD. - open_dtd(+DTD, +Options, -OutStream)
- Open a DTD as an output stream. See load_dtd/2
for an example. Defined options are:
- dialect(Dialect)
- Define the DTD dialect. Default is
sgml
. Usingxml
orxmlns
processes the DTD case-sensitive.
- dtd(+DocType, -DTD)
- Find the DTD representing the indicated doctype. This predicate
uses a cache of DTD objects. If a doctype has no associated dtd, it
searches for a file using the file search path
dtd
using the call:..., absolute_file_name(dtd(Type), [ extensions([dtd]), access(read) ], DtdFile), ...
Note that DTD objects may be modified while processing errornous documents. For example, loading an SGML document starting with
<?xml ...?>
switches the DTD to XML mode and encountering unknown elements adds these elements to the DTD object. Re-using a DTD object to parse multiple documents should be restricted to situations where the documents processed are known to be error-free.The DTD
html
is handled separately. The Prolog flaghtml_dialect
specifies the default html dialect, which is eitherhtml4
orhtml5
(default).3Note that HTML5 has no DTD. The loaded DTD is an informal DTD that includes most of the HTML5 extensions (http://www.cs.tut.fi/~jkorpela/html5-dtd.html). In addition, the parser sets thedialect
flag of the DTD object. This is used by the parser to accept HTML extensions. Next, the corresponding DTD is loaded. - dtd_property(+DTD, ?Property)
- This predicate is used to examine the content of a DTD. Property is one
of:
- doctype(DocType)
- An atom representing the document-type defined by this DTD.
- elements(ListOfElements)
- A list of atoms representing the names of the elements in this DTD.
- element(Name, Omit, Content)
- The DTD contains an element with the given name. Omit is a
term of the format
omit(OmitOpen, OmitClose)
, where both arguments are booleans (true
orfalse
representing whether the open- or close-tag may be omitted. Content is the content-model of the element represented as a Prolog term. This term takes the following form:- empty
- The element has no content.
- cdata
- The element contains non-parsed character data. All data up to the matching end-tag is included in the data (declared content).
- rcdata
- As
cdata
, but entity-references are expanded. - any
- The element may contain any number of any element from the DTD in any order.
- #pcdata
- The element contains parsed character data .
- element(A)
- n element with this name.
*
(SubModel)- 0 or more appearances.
?
(SubModel)- 0 or one appearance.
+
(SubModel)- 1 or more appearances.
,
(SubModel1, SubModel2)- SubModel1 followed by SubModel2.
- &(SubModel1, SubModel2)
- SubModel1 and SubModel2 in any order.
(SubModel1, SubModel2)|
- SubModel1 or SubModel2.
- attributes(Element, ListOfAttributes)
- ListOfAttributes is a list of atoms representing the attributes of the element Element.
- attribute(Element, Attribute, Type, Default)
- Query an element. Type is one of
cdata
,entity
,id
,idref
,name
,nmtoken
,notation
,number
ornutoken
. For DTD types that allow for a list, the notationlist(Type)
is used. Finally, the DTD construct(a|b|...)
is mapped to the termnameof(ListOfValues)
.Default describes the sgml default. It is one
required
,current
,conref
orimplied
. If a real default is present, it is one ofdefault(Value)
orfixed(Value)
. - entities(ListOfEntities)
- ListOfEntities is a list of atoms representing the names of the defined entities.
- entity(Name, Value)
- Name is the name of an entity with given value. Value is one
of
- Atom
- If the value is atomic, it represents the literal value of the entity.
- system(Url)
- Url is the URL of the system external entity.
- public(Id, Url)
- For external public entities, Id is the identifier. If an URL is provided this is returned in Url. Otherwise this argument is unbound.
- notations(ListOfNotations)
- Returns a list holding the names of all
NOTATION
declarations. - notation(Name, Decl)
- Unify Decl with a list if
system(+File)
and/orpublic(+PublicId)
.
3.4.1 The DOCTYPE declaration
As this parser allows for processing partial documents and process the DTD separately, the DOCTYPE declaration plays a special role.
If a document has no DOCTYPE declaraction, the parser returns a list holding all elements and CDATA found. If the document has a DOCTYPE declaraction, the parser will open the element defined in the DOCTYPE as soon as the first real data is encountered.