XML
Probably someone should look at Jonathan Barman's article on XML in Vector
17.4 - conversion of arbitrary arrays to and from XML which is probably how
they should be passed around. This code is very portable. Is this our VML parser?
Also, Mark Osborne's digital image facility uses a very simple XML file format.
Let's start working with this.
Also see Bill Parke's function "ConvertHTMLtoXHTML" and Eric Lescasse function "ConvertToXHTML ".
XML Tools for APL+Win (excerpts), by Davin Church, Creative Software Design, October 2004
Introduction
As XML plays an increasing role in the computer industry, APL developers
increasingly need a fast and simple way to process data in this
array-oriented environment. This document describes some APL utility
functions that can make that job easier.
XML Primer
If you have not yet learned XML, a brief introduction to its concepts,
terminology, and structure is in order. Those already experienced with
XML may wish to skip this section.
XML is technically a data representation. It is a way of describing data
in a standard text-based format so that different computer programs (and
even humans) can easily read and write it. XML is not particularly good
for use inside a single program ? it is instead designed to transfer data
between programs. Thus it is said to be useful ?only at the borders? of
applications and machines.
XML uses plain ASCII text to represent its data (even numbers are written
as text). This text may be stored in any convenient form, but is usually
found as a file on disk. It may also be sent as a data stream over the
Internet or held in memory during processing. In APL, raw XML may be kept
as a simple character vector in a variable. (See the section below for a
way of storing it structurally in APL.)
XML text contains data, of course, but it also contains a structural
representation for that data. In many ways, this is similar to a nested
array in APL where data has a size and shape, and each item of that data
may itself contain more structured data. This is done in XML in a similar
way. XML holds data in something called elements, and each element may
itself contain other elements as needed.
While data in XML is stored as simple text, XML structural information is
set off from that data by enclosing it in a pair of angle-bracket
characters (you know these as the less-than and greater-than symbols, ?<
>?). Anything enclosed in these symbols is known as an XML tag, and is
processed by XML-aware programs to provide structure. Anything outside
these symbols is data. An example tag is:
XML tags normally come in two forms: a start-tag and an end-tag. In XML,
these always come in pairs and surround the data they describe. The first
word in a start-tag is called the tag name (also element name, see below)
and its matching end-tag has the same name preceded by a slash (?/?)
character. So, if a start tag was then its matching end-tag would
be .
The combination of a start-tag, some optional contents, and an end-tag is
called an element in XML. While elements always have a matching start-tag
and end-tag, the tag/element is named by the XML developer (similarly to
naming a variable). Thus, while all XML files are structurally similar,
they all look different because they all name their data differently. Of
course, any programs sharing such a file would have to agree on what
names to use. Here's an example XML element: DreamĀPark
If an element has no contents, then the start-tag may be immediately
followed by the end-tag. Or, you may use a special-case composite tag to
represent a null element by placing a slash (?/?) at the end of the sole
tag (combining both the start & end tags into one), as in: .
An element's start-tag may also contain extra information, usually used
to describe the element contents in some way. These extra bits of
information are called attributes, and they are each composed of an
attribute-name (arbitrarily named), an ?=? separator, and an
attribute-value (which is simple text, always enclosed in quotation
marks). The attributes are listed inside the start-tag, separated by
spaces. End-tags never have attributes. For example: Dream Park
Multiple XML elements are simply listed one after the other, though they
are often placed on separate lines for human readability. For example:
_Dream Park
_The Barsoom Project
_The California Voodoo Game
Of course, this doesn't represent very complex data. Usually, XML data is
nested, where an element may contain one or more other elements. (It is
acceptable, but not often done, that an element may contain both data and
sub-elements in any combination.) For instance, if you wanted to track
the book's title and author, it might be done in this way:
_
__Dream Park
__Larry Niven
__Steven Barnes
_
Notice that the element and sub-elements were shown listed on separate
lines and indented. This is purely for readability and is ignored when
processing the XML data. The element content itself may also be listed on
a separate line, if preferred, as in:
_
__Dream Park
_
All XML files must contain exactly one main (top-level, outside) element,
called the root element. All the actual data in the XML file is contained
somewhere within the root element, often as a list of parallel
(identically-named) sub-elements.
Just in case you'd like to use them, comments may also be included
anywhere in the XML text for human readability (they are not processed by
applications). Comments are defined as a standalone tag that starts with
the text ??
(including the leading space, or at least a character that?s not another
dash). Any other text may be contained within the comment tag (except the
end marker text), including XML tags & data (which are ignored because
they're in a comment). So be sure to end any comments exactly right.
Comments may not be nested.
Well, that's the basic idea. To learn about the more complicated aspects
of XML, numerous books and on-line resources on the subject are
available.
An XML Data Structure for APL
Naturally, programs need to be able to read and write XML. It is possible
to do this character-by-character, but this process is extremely clumsy
(especially for reading). To make this easier for programmers, generic
XML processing programs have been written (in and for various languages
and operating systems). For instance, Microsoft has created one such
program (/library/object) for Windows called MSXML. However, they wrote
it primarily for use by scalar languages like VB and C++. We can
certainly use this in APL, but it is slow, awkward, complicated, and
requires lots of looping and arcane commands. Wouldn't it be easier just
to store the XML in a variable using a nested data structure so we can
just process it with APL functions and primitives (particularly ?Each?)
as a whole? Here's one way to do just that.
Note: The following description contains some seemingly complex technical
details. Not all readers will wish to know the internal structures
described here (especially if they are only writing XML and not
interpreting it) and may prefer to skip over this section until it is
needed.
This structure is quite simple in concept, but can be deeply nested and
the variety of options may be confusing at first, so be patient when
reading (and re-reading if necessary) the description below.
The content of this XML data structure is inherently a character (text)
vector. If it contains no XML coding, then it is a simple (unnested) text
vector. If an XML element is found, then the entire element (start-tag,
contents, and end-tag) is coded, APL-enclosed into a nested scalar, and
substituted into the vector as if it were a single character. For
instance, the following XML fragment:
_? the book Dream Park is about ?
would produce a vector that looks something like:
_? the book @ is about ?
where the ?@? in the above text is actually a nested scalar. (See below
for the structure of the nested item.)
Multiple consecutive elements without any text data around them simply
produce a vector of these nested items without any normal characters.
Since a valid XML file must contain exactly one root element (and rarely
has anything special [of interest] outside it), then decoding such a file
would typically produce a vector of length one containing the nested
version of the sole root element.
Does that make sense so far? If not, try reviewing the discussion above
once more. Otherwise, you are likely to become more confused when we
re-use this same concept again below.
Ready? Now for the complicated part... Any XML element will be a nested
scalar (as an item in the above vector) containing a three-item vector,
as follows:
_[1] The element name
_[2] The element's attributes (described below)
_[3] The element content
All XML elements will have exactly those three parts, and each part is
(of course) itself nested to contain the above information.
The first item is the simplest and just contains the name of the XML
element as a character vector. Use this, especially with ?first-each?
(??), to locate any elements that you would like to process in parallel.
The second element is the most complicated, but fortunately it is the
least used. It contains a list of all the attributes given in the
element's start-tag. If there are no attributes given (which is quite
common), then this will be an empty vector. If there are attributes, then
this is a nested vector containing one item per attribute. Each such
attribute item is itself a (nested) two-item vector, containing (nested
again) the name of the attribute and its value. The attribute name and
value are simple character vectors (text strings). Since attributes are
always two-item vectors, it would normally make sense to structure them
as a two-column matrix. Unfortunately, this makes APL code more difficult
to process lists of elements and their attributes with ?each? (?). Since
this is likely to happen often with attributes, the structure is instead
defined as a more deeply nested vector (but see below).
The third item in the element vector is the element?s content. This is
defined to be precisely the same as the top level of the data structure
as described above! Thus, this could be called a recursively-defined
structure. So an XML element that contains only text (the data content)
would have a simple (unnested) character vector here. An empty (null)
element would have an empty vector. And an element with only one or more
sub-elements would have here a nested vector with that many items in it,
one for each sub-element and each one defined as above. A mixture of text
(data) and sub-elements is also possible, though rarely used in practice.
This data structure can become quite deep, depending on the source XML,
but processing it is usually rather easy. Most XML files are simply a
list of parallel elements. In APL, this is represented by a vector of
(nested singleton) items, each of which is one element. Such an element
list can be easily processed in sequence by calling a function (to
process one element) with ?Each? (?), or by looping through them with
?:FOR?. If those elements contain sub-element lists, then they too can be
processed in the same manner. Also, if you have a list of different kinds
(names) of elements, then ?First-Each? (??) will extract out the names of
those elements. Those names can be examined with simple APL and the
vector compressed (/) to select only the desired elements for further
processing. The ?2-Pick-Each? (???) on the vector will return only the
element?s attributes for examination, and a ?First-Each? (??) on those
will return just the names of the attributes for each element. The
?3-Pick-Each? (???) will, of course, return all the elements? contents
without their names or attributes.
Processing most incoming XML is usually relatively simple because the
expected structure is known in advance. However, if you need to detect
the actual parent-child structure, then examining the depth of an
element?s content will tell you whether it is plain (text) data or
whether it still contains sub-elements. Or if you?re more comfortable
with matrices, change an (element-only) vector into a three-column matrix
with ?disclose? (?) ? the names will be in ????, the attributes in ????,
and the content (possibly nested further) in ????. The attributes column
could even get a further ?disclose-each? (??) to turn each of them into a
two-column matrix, if that is preferred.
For example, the following XML:
_
__Dream Park
__Larry Niven
__Steven Barnes
_
?would be structured in APL code (with lines wrapped & indentation added
for readability) as:
????????????????????????
_?????????????????????
__???????????????????????????????????
__????????????????????????????????????
__?????????????????????????????????????
_?
?
Finally, there is one additional structure that may occur occasionally.
Instead of the 3-item ?element? structure shown above (which always has a
relative depth of at least 2), a nested item could instead be a simple
(relative depth 1) character vector. This might occur when a ?symbolic
name? is included in the XML content. There are several standard symbolic
names in XML and these are usually handled automatically for you. But in
the cases where the XML document author has invented new symbolic names
(with entity declarations), these are not automatically converted into
their equivalent values. In such cases, they are identified by being
enclosed as an independent, singly-nested item in the coded vector. For
instance, the following XML fragment:
_? the book &booktitle; is about ?
would produce a vector that looks something like:
_? the book @ is about ?
where the ?@? in the above text is actually a nested scalar containing
only the character vector ?&booktitle;? within it. Your program would
then have to know what to do with such a nested scalar if it is
encountered.
Parsing (Decoding) XML
When your application is given XML to process, it is necessary to
interpret it logically. This is difficult while it is still in plain text
form. Rather than using external programs to perform the interpretation
for you, use the ???????? utility function to turn the text into the
nested APL data structure described above.
This function does not use Microsoft?s MSXML library. One reason for this
is because MSXML is not guaranteed to be available on any particular
machine, and even if it is there the version is in doubt (and is
important). Another reason is that MSXML has a reputation for being quite
slow. ???????? was written to be a standalone function and to run as fast
as APL allows.
Once the data structure has been created, it may be processed using
loops, subroutine calls with ?Each?, or just straight-line code to handle
whole vectors at once or individual scalar pieces. Use of ?First-Each?
(??) and ?Pick-Each? (??) are extremely useful in this regard, as noted
in the structural description above. However, this style can be quite
imposing for many APL programmers and usually produces less-than-readable
code. To this end, an additional utility function is available to help
process data in this form. It is called ??????? and uses a syntax similar
to XML?s standard XPath language. The entire XML structure (or any legal
subset of it) is passed to ??????? along with the element name(s) or
other information to select from it and the matching subset of the
structured XML is returned as a result, ready to be processed.
So, a simple example for processing the previous XML example would be:
??????????????????????????????
????????????????????????????????
??????????????????????
???
??????????????????????????????????????????????????????
??????????????????????????????????????
???
????????????????????????????????????
??????????????????????????????????????????????????
?????????????????????????????????????????????
???????????????????????
?????
??????????????????????????????????
????????????????????????????????????????
????????????????????????????????????
???
???????????????????????????????
??????????????????????????????????????
???
???????????????????????????????????????????????????
?????
Constructing (Encoding) XML
For simple XML, it is quite reasonable for your application to produce
formatted text directly. However, there are many details that still need
to be handled and it can often be rather unwieldy. For one thing, you?ll
usually want to create the XML in a variable before disposing of it by
writing it to disk or sending it over the Internet or out via email. But
if you?re producing a large XML output, this can become very slow due to
repeated copying of the data during memory management. Also, you will
often wish to produce properly indented lines for good human readability,
but keeping track of this is tedious and any changes to the indentation
depth (especially if adding a new level at the top) can be particularly
frustrating. Large and complex structures are also very error-prone in
several different ways. Many other issues are likely to be encountered as
well, so an alternative mechanism is in order here.
To deal with all of these problems, utility functions have been written
to simplify XML-creation coding and make it faster and more readable. The
same data structure described above is also used for output. Once the
structure is created, the ???????? function is used to turn the whole
thing into plain text for final output. To assist in creating the
structured data, a function named ??????? is provided that produces a
nested singleton containing an entire XML element encoded as described
above. For most needs, this is as simple as providing the element name as
a left argument and the element contents as the right argument. Here is
an example function to create the sample XML shown above:
??????????????????????????
????????????????????????????????????????????????
???????????????????????????????????????
??????????????????????????????????????????????
????????????????????????????????????????????????
?????????????????????????????????????????????????????????????
???
????????????????????????????????????????????????????????????????
??????????????????????????????
???
????????????????????????????????????????????????????????????????
???????????????????????????????????????????????????????????????
????????????????????????????????
????
????????????????????????????????????????????????????????????
???????????????????????????????????????????????????????
??????????????????????
?????
For more complicated situations, subroutines could be called in loops or
with ?Each? (?) to produce the output in pieces, and then assemble them
together for the final output..
Now, this can still be a bit tedious if you have lots of data or a
complex structure. And since we use an array-based language, it might be
assumed that we have our data to be encoded already available in a nested
array. So here?s an alternate way to do the same thing by using a data
array and the ???????? function (designed to work with entire arrays at
once):
????????????????????????????????
?????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????
?????????????????????
??????????????????????????????????????????????????????????????????
???
?????????????????????????????????????????????????????
??????????????????????????????????????????????????????????
???
?????????????????????????????????????????????????????????????????
???????????????????????????????????????????????????????????????
??????????????????????????????????????
?????????????????????????????????????????????????????????????????????????
????
???????????????????????????????????????????????????????????????????
??????????????????????
?????
So it?s quite easy to handle regularly structured data, and methods and
pieces of data can be combined together as desired. XMLItem (and
XMLItems) can also deal with element attributes and unusual elements like
comments or declarations. And as you might have noticed, ???????? and
???????? are approximate inverses of one another.
Other XML Operations
Validation
While ???????? is a quick and easy way to get XML text converted into a
useful form, it does not also validate the incoming XML at the same time.
Validation is a special term used by XML to mean (in general) that the
XML conforms to an agreed-upon naming and construction convention. This
is not to be confused with another XML term: well-formed. XML that is
well-formed obeys the syntactical construction of XML, with angle
brackets around tags, matching and nested start and end tags, quotes
around attribute values, etc. All XML processing routines (including
????????) verify well-formedness, making sure that they?re looking at
legal XML encoding. But validation goes a step further to verify things.
For example, one valid element name is and that one or more of
them must be (and can only be) contained within a parent
element.
XML standards allow XML processing engines to be designated as either
validating or non-validating processors. ???????? is a non-validating
processor. So this means that while it correctly deconstructs the XML, it
does not confirm that it was what you were expecting. That is, in
general, left up to your application to detect and either ignore or
respond to it however you feel is appropriate. This is usually fine as
your code is normally already going to have to know what to expect and
what to do with it, and you?ll have already determined that the source of
your XML is producing correct code for you. However, in some cases you
may not be too sure about your source data and you?d like it checked more
thoroughly. In that case, you can call Microsoft?s MSXML validating
processor to do a complete analysis for you and report any problems. A
cover function has been written to do this for you (since it?s not used
as often and therefore a standalone solution isn?t as needed). The
utility function is named ???????????. Call it with the source XML text
and it will return a nested vector of error information. The first item
of this vector is either empty (a ??) if the XML is valid or the text of
an error message otherwise.
Transformation
There are many generic things that can be done with XML, and one of the
most common is called transformation. This is an operation that takes an
XML file as input data, along with a second XML file that describes the
transformation to be performed, and it produces a new output using those
transformation rules. This second file is designated as an XSL or XSLT
file and is called a stylesheet or template. A transformation could
potentially produce nearly any kind of text file as output, but its two
most common forms are to produce either (1) a new XML file in a different
structure, or (2) an HTML file that can be displayed on a browser. In
fact, this second usage is so popular that today?s modern browsers know
enough to recognize an XML file being returned from a web site and will
perform the transformation to HTML internally so it can be displayed to
the user already formatted.
Sometimes, you will want to perform this transformation yourself under
program control. In that case, call the ???????????? utility function
with your XML (data) as the right argument and the XSL
(stylesheet/template) as the left argument. Microsoft?s MSXML library
will be invoked to perform the transformation for you and the text output
will be returned as the result.
SOAP
A relatively new use for XML is to pass processing requests and results
back and forth (usually across the Internet) between otherwise
unconnected machines. This can be used to implement certain types of
Remote Procedure Call (RPC) facilities or similar functionality. An
example of this might be a travel reservations web site that accepts a
ticketing request in SOAP form, processes it, and returns a SOAP
confirmation to the requestor. SOAP is a standardized protocol that uses
XML-formatted data to exchange this information.
Since we now have tools for reading and writing XML easily in APL,
handling SOAP becomes a simple matter. SOAP requests are just
application-defined XML data wrapped in a specific structure known as a
SOAP Envelope (which is itself also structured as XML). This is normally
a very simple process, you may just add the SOAP Envelope yourself when
creating SOAP messages and strip it off when reading the results. But
building a SOAP Envelope makes a good sample program for demonstrating a
simple use of these XML tools and it?s also a useful utility in its own
right. So the ???????????? function can be passed a nested XML data
structure (produced by ???????[?]) and it will wrap a SOAP envelope
structure around it.
Syntax Descriptions
BuildXML - Convert a nested structure into text
Syntax
textxml ? [indent [pack]] ???????? xml
General Information
BuildXML is the final stage in creating XML output. It takes the nested
data structure created by XMLItem and XMLItems and converts it into a
formatted text vector with proper syntax. It handles line breaks and
indentation for neat and easy human viewing.
BuildXML is the approximate inverse of ParseXML.
Right argument
This is a vector of nested XML data, constructed in the form produced by
XMLItem and XMLItems. If there are multiple results to be formatted
together, they should simply be catenated together into a (longer)
vector.
Left argument
The function?s left argument is used to specify how line breaks and
indentation are to be applied. It is optional and defaults to a common
formatting style. The left argument may have up to two numeric items:
[1]_Indentation spacing
The number of spaces to use for each level of element indentation. The
default is 4 (spaces). There are three special values that may be used
here:
??_All lines start at the left margin.
??_Use single ????? (Tab) characters to indent lines instead of spaces
(which uses fewer bytes).
??_Put entire XML result in a single line (no ??????s).
[2]_Leaf node packing levels
Usually, the lowest-level (leaf) nodes of an XML structure are listed
with the start-tag, content, and end-tag on the same line of text (no
??????s between them). The default value of 1 performs this function.
Supplying a 0 here suppresses this behavior and will place the element
tags and content on separate lines of output. A value larger than 1 will
pack more than one level of elements together onto a single line.
Result
The result of this function is a character (text) vector, usually with
imbedded ????? (new line) characters, of the formatted XML information.
Examples
????????????????????????????????????????
????????????????????????????????????????????
ParseXML - Convert text XML into a nested structure
Syntax
xml ? ???????? textxml
General Information
Given an XML data stream as a character (text) vector, decompose it into
its constituent elements and produce a nested APL vector containing the
same data in an array-friendly form. Most XML ?files? are composed of
exactly one ?root? element at their outer level. From such an input,
ParseXML will produce a nested vector of length 1 as a result.
Processing notes
* Element names effectively have their ? & ?>? brackets removed and
their attributes and contents separated out. All these parts of the
element are then enclosed in an APL singleton within the returned result.
Elements within content are similarly extracted and enclosed at a deeper
level.
* All leading and trailing white space (including spaces, tabs, and
newlines) are removed from the element content at all levels.
* All prologue and epilog information (anything outside the root element)
is removed.
* DTD or XSD validation is not performed (and is removed).
* Comments, special declarations, and processing instructions (those
terms beginning with ?) are ignored and
removed.
* strings are decomposed into actual raw data (ready for
use).
* Symbolic names (those beginning with ?&?) are converted to raw data
(ready for APL processing) if they are UTF-8 decimal or hex, or are one
of the 5 standard symbols (&, <, >, ", or '). Other
symbolic names (which are not commonly encountered) are not expanded and
are instead nested as a depth+1 text string for your application to
examine.
* Improperly-formed XML is reported with an APL error.
Processing feedback for very large parsing tasks (such as displaying a
progress bar) can be provided by writing an optional custom external
feedback function named ParseXMLStatus. It should accept as its right
argument the current decoding ?step? and should return a boolean to
indicate whether to continue (0) or abort (1) the processing. Further
details on the feedback function can be found in the ParseXML comments.
ParseXML is the approximate inverse of BuildXML.
Right argument
Character (text) vector containing valid XML text. This often comes from either reading a text file or downloading data across the Internet. Both newlines (?????) and linefeeds (?????) are treated as white space (and thus ignored), though in APL variables it is common to have only newline characters (as line separators) and not linefeeds.
Left argument
None.
Result
The result of parsing XML is a deeply nested vector containing all the
XML data represented in a hierarchical structure. This structure is
described in detail in an earlier section of this document, but it is
fundamentally a text vector with nested XML elements taking the place in
the vector of a single character.
Example
??????????????????????????????????
SOAPEnvelope - Wrap a SOAP Envelope around nested XML
Syntax
soap ? [headers] ???????????? body
General Information
The functionality provided by this routine is minimal, and may need to be
customized for particular needs, but it serves as a coding example as
well.
SOAP messages are built of XML data surrounded by a SOAP Envelope. This
?envelope? is used by SOAP processing programs to identify and handle the
message contents. It is a simple matter for your application to add the
necessary XML elements to enclose your data in a SOAP wrapper, but this
function does just that if you?d prefer to use it.
Right argument
This is the main content of the SOAP message. It should be an
application-specific, nested XML data structure of XMLItem(s) of the SOAP
content to be transmitted. This XML data will be enclosed in an
element and included in the result.
Left argument
The left argument is optional (indicating that no headers are present).
Simple SOAP applications usually require no SOAP headers. If it is
provided, it should be a vector of XMLItem(s) to be used as one or more
SOAP header blocks and will be enclosed in an element. SOAP
header block contents, if needed, are defined by the SOAP application?s
protocol and needs.
Result
The result of the function is a nested XML data structure. The SOAP
header, if any (after being enclosed in its element) is prepended to the
SOAP body (after being enclosed in its element). These objects are then
enclosed in an element wrapper to complete the SOAP
Envelope. A standard prefix is also added as a convenience and
the entirety is returned as a nested XML data structure, ready to be
formatted by BuildXML.
Example
??????????????????????????????????????????????????????????
TransformXML - Convert text XML into another form
Syntax
output ? textxsl [var]? ???????????? textxml [var]?
General Information
XSL stylesheets (templates) are used to convert XML data into another
form. Usually this new form is either a different XML structure or is
HTML suitable for displaying the data to human readers. This function
accepts XML input data and an XSL stylesheet/template and returns the
transformed data. This allows such transformations to be done easily
under program control. The work is done by Microsoft?s MSXML library. All
work is done in memory ? temporary disk files are not used.
Note: When transforming to HTML, MSXML forces the output to be in the
UTF-16 character set. Therefore, trying to set it to use an alternate
output character set (like ?windows-1252? or ?iso-8859-1?) will not be
successful.
Right argument
This should be the XML data to be transformed. It should be supplied in
text (character vector) form.
Advanced feature: XSL ?parameters? may be supplied as extra items in
either argument. They should be specified as (enclosed) name-value pairs
and appended to the (enclosed) argument data. These can be referenced
inside the XSL to vary the processing being performed.
Left argument
This should be the XSL (stylesheet/template) to control the
transformation. It should be supplied in text (character vector) form.
Advanced feature: XSL ?parameters? may be supplied as extra items in
either argument. They should be specified as (enclosed) name-value pairs
and appended to the (enclosed) argument data. These can be referenced
inside the XSL to vary the processing being performed.
Result
The result is the text output from the transformation, as generated by
applying the stylesheet/template to the text XML data.
Example
?????????????????????????????????????????????????????????????????????????
ValidateXML - Verify that text XML follows conventions
Syntax
error ? ??????????? textxml
General Information
All XML has to follow the basic XML syntax, which is always verified. But
beyond that, XML data should also conform to a naming and structural
convention agreed upon by the sender and receiver of that data. This
convention is often formalized in a separate document using one of two
descriptor languages, either a DTD or an XSD, and referred to by the XML
document itself. In such cases, the logical structure of the XML document
can then be checked against this description and verified that it follows
those conventions. This process is called ?validating? the XML.
Normally, the ParseXML utility does no validating of incoming XML. This
saves time and is usually not necessary. But for those cases where the
XML data needs to be more thoroughly checked, ValidateXML provides an
interface to Microsoft?s MSXML library where a complete validation can be
done on the data to ensure that it?s in proper form.
Right argument
The text (character vector) containing the XML to be checked. Normally,
this will include within the text a reference to the DTD or XSD document
containing the structural definition to be followed.
Left argument
None.
Result
The result is a nested vector of error information. The first item of the
result is the most reliable indication of an error. If it?s an empty
character vector (??), then no error has occurred. Here are the items
being returned:
[1]_Text error message, or ?? if no error was found.
[2]_Numeric error code, or 0 if no error was found. (This doesn?t seem to
be reliable on some systems.)
[3]_Text of XML source line that caused the error.
[4]_Byte position in the XML text where the error occurred.
[5]_Line number in the XML text where the error occurred.
[6]_Character position of the error within the failing line of XML.
[7]_URL of file containing the error (if the error occurred in an
external file).
Example
????????????????????????????????????????????????
XMLItem - Create an XML element as a nested singleton
Syntax
xml ? element [attributes] ??????? contents
encodedtext ? ??????? rawtext
General Information
The purpose of this function is to build any single item of the nested
XML structure that was described earlier in this document. It takes as
arguments the name of the element to create (and optionally any
attributes) and the content to be placed inside that element. It returns
a one-item nested vector (the one item representing the single element
being created) which internally contains the three-part structure
defining that element. If more than one of these resulting elements are
created, then they (the 1? elements, after encoding each one with
XMLItem) should simply be catenated together to form a vector of the same
length as the number of sequential elements. If the resulting element(s)
are to be enclosed in another element, then use this result directly (or
catenated with additional elements) as the right argument to a further
call to XMLItem to create the additional level of element nesting. The
result of this function may also be catenated with ordinary text if a
non-homogenous structure is desired.
XMLItem can also perform a secondary utilitarian function. Since
non-printing and reserved symbols may not be directly included as XML
data, they must be specified using an entity-encoding scheme. XMLItem,
when used monadically, can perform this encoding for you. Anytime you?re
enclosing potentially unknown text (such as that entered by a user) in an
XML element, you should first make sure that any special characters (such
as ampersands or angle brackets) are correctly encoded. So pass that text
(with an extra call) to XMLItem monadically before passing it on to the
usual XMLItem or XMLItems call to create the element vector. For example:
_????????????????????????????????
This will ensure that all special characters are converted before being
included in the element. Only perform this operation on simple text
(character vectors), and not on anything already returned from any call
to XMLItem or XMLItems, or the data will be re-encoded incorrectly. This
operation is done automatically for attribute values (which can only be
text), so it only needs to be done manually for unknown element contents.
It does not need to be performed at all on text which you are sure does
not contain any special characters (such as typical constant text from
your application).
Right argument
The right argument is the data to be ?enclosed? in this XML element. This
should either be plain text data or the (concatenated) results of one or
more XMLItem or XMLItems calls. (Mixtures of these are uncommon but
permitted.) Empty elements should just supply this value as an empty
vector (??). Numeric values are also permitted for coding ease and are
??d before use ? but watch out for formatting problems (like negative
numbers, limited precision, etc.) and format them in advance if you have
any special requirements.
Any data argument that might contain special characters (those that need
entity-encoding, like: newlines as data, ampersands, less-than or
greater-than symbols, non-ASCII characters, etc.) must be
character-encoded prior to enclosing them with XMLItem. (This would
generally apply to any data that has been typed in by a user. Any
constant text that you know does not have such characters can be used
directly.) This can be done by using XMLItem monadically ? see ?General
Information? above.
Left argument
The left argument is the name of the XML element with which to ?enclose?
the data/contents. This may be a simple name, or it may optionally
include attributes to be inserted into the start-tag of the element. Just
a simple element name is the most commonly used form, and the easiest to
specify as an argument (just a simple character vector). If attributes
are specified, they may be given in any of several forms for maximum
programming comfort and flexibility, so just choose the form that you
like best. (Most of these forms are just an attempt to handle how you
would expect it to ?just work?, so try not to be intimidated by their
somewhat detailed descriptions.) Here are the different ways that the
element name and attributes may be specified.
* Simple element name (or as a nested singleton):
Just a simple character (text) vector containing the name to use for the
element.
Example: ??????
Produces:
* Element name + nested vector of attributes:
A two-item nested vector:
[1]_Simple element name, as above (but nested here).
[2]_A nested vector of zero or more attribute name-value pairs.
_Each attribute is a ?? nested vector of its name & value:
_[1] Attribute name, text vector.
_[2] Attribute value, text vector (or numbers will be ??d).
Note: Be careful with the depth of nesting needed here.
Example: ?????????????????????????????????
Produces:_
* Element name + nested matrix of attributes:
Similar to the nested vector of attributes, but the attributes are listed
in a two-column nested matrix rather than a vector of vectors, as in:
[1]_Simple element name, as above (but nested here).
[2]_A nested matrix of zero or more attribute name-value pairs.
_Each attribute is given on a row of the matrix:
_[;1] Attribute name, text vector.
_[;2] Attribute value, text vector (or numbers will be ??d).
Example: ?????????????????????????????????
Produces:_
* Element name followed by a list of attributes:
Similar to the nested vector of attributes, but in a more relaxed form.
Rather than the attributes all being nested into a single item, they are
allowed to be separated as their own items of the left argument, as in:
[1]_Simple element name, as above (but nested here).
[2]_Attribute #1 (nested name-value pair, as above).
[3]_Attribute #2 (nested name-value pair, as above).
[4+] (etc.)
Example: ???????????????????????????????
Produces:_
Special kinds of element names are also supported, including:
* If the element name begins with a ???:
Encode as a PI entity (both pseudo-content and pseudo-attributes are
allowed).
Example: ???????????????????????????????????
Produces:_
* If the element name begins with ????, or is only ??? or ???:
Encode contents argument as a comment (attributes not allowed).
Example: ?????????????????????????
Produces:_
* If the element name otherwise begins with ???:
Encode contents argument as a special section or declaration.
Example: ????????????????????????????????????????????????
Produces:_
Result
The result of XMLItem is a ?? nested vector. This single item can be
concatenated into a vector of other similar items (or ordinary text) to
produce a longer vector of items. If this item contains other (nested)
items, then it may be APL-nested very deeply ? this is quite normal.
The concept of returning a ?? nested (singleton) vector is that an XML
element and all of its contents represents a single logical entity in a
data stream. It is therefore being treated in the same way that a single
text character would be treated in such a data stream.
In general, this nested result can be displayed, but it?s not very
readable by itself. To see what you?re creating (while debugging), use
BuildXML to display it.
Examples
???????????????????????????
??????????????????????????????????????????????????????
??????????????????????????????????????????????????????
????????????????????????????????????????????????????
??????????????????????????????????????????????????
?????????????????????????????????????????????????????????????
???????????????????????????????????????????????????????????????????????
???????????????????????????????????????????????????????????????????
XMLItems - Call XMLItem on an array
Syntax
xml ? elements ???????? contents
General Information
The purpose of XMLItems is to take multiple pieces of data (such as a
multi-dimensional array) and run them all through XMLItem. The result is
concatenated together into a single valid XMLItem-like result (usually
with more than one item in the vector). This is functionally the same as
calling XMLItem on each data item and then appropriately catenating the
results together and recursing while managing the resulting rank and
depth, but this single function is much easier to use.
The right argument is given as a nested array of any rank and the left
argument is a vector of element names (each item is allowed to be a valid
left argument for XMLItem). These left-argument element names are then
paired with the right-argument data and XMLItem is called on each one.
However, this correlation is not one-for-one. Each item of the left
argument (from right to left) is paired with each dimension of the right
argument (from last to first), and that one element name is used to
enclose all the items along the matching data dimension.
This process converts the right-argument rank into XML-depth. You may
supply more element names than the rank of the data if desired. In that
case, the right-most element names are used to enclose the data
dimensions and any additional (leading, left-most) element names are then
used to contain the resultant single data item.
Note: If fewer element names are given in the left argument than there
are dimensions of data, then last-most dimensions will be enclosed in the
given elements and the result will be returned as an array of separate
results rather than a simple, valid XMLItem result value. The rank of
such an array will be the rank of the right argument less the number of
items in the left argument that were used to reduce it (i.e.
?????????????????????????????). In this case, these items may not simply
be appended into a coded XML vector. Instead, it is recommended that the
data be used as an argument to an additional call to XMLItems until
enough element names have been provided to reduce out its entire rank and
return a valid XMLItem result vector.
Right argument
The right argument should be one or more content (data) items to be
?enclosed? as XML element(s) by repeatedly using the XMLItem function.
The right argument may be a nested scalar, vector, matrix, or array of
any rank. Note that a single piece of data must be a nested singleton
(e.g. ??????? or ?????) to be properly encoded.
Left argument
The left argument is a nested vector of one or more element names with
which to ?enclose? the data (contents) array using the XMLItem function.
Normally there will be one item (element name) in the left argument for
each dimension of the right argument data. The first item (element name)
is applied to the first dimension of the data and the last item is
applied to the last dimension. If there are too few or too many element
names, then they are applied in right-to-left order. See note above for
more details about this situation.
Advanced usage
Each element name in the left argument is usually a simple XML element
name. However, more complicated structures may optionally be provided to
perform more advanced tasks. Each element name (for a given dimension of
data) may be structured in any of the following ways:
* A simple name:
This produces unadorned, consistent element names for each item of data
it is enclosing.
* A ?? nested vector of an element name with attributes:
This structure is a form acceptable for XMLItem, where the first sub-item
is the element name and the second sub-item contains element attributes.
Multiple attributes may be specified in any form allowed by XMLItem.
* An ?n???? nested matrix of multiple (different) element names:
This is used to enclose each content (data) item (along the corresponding
dimension) within a different element. For instance:
_?????????????????????????????????????????????????????
may be used to encode 10 sets of first name, last name, and phone number
into their respective elements and then enclose each of those sets in a
-element, yielding a ??? vector of s.
Note: The height of the element-name matrix must equal the length along
the corresponding dimension of the data being encoded with that
element-list.
A nested vector may be turned into a one-column matrix with ??????, if
desired. Or, as a programming convenience, this list may also be
specified as a simple text matrix rather than a one-column, nested matrix
(e.g. ???????????????????????????).
* An ?????? nested matrix of multiple element names and attributes:
This encodes each data item into a different element name (as described
above), but also allows for a specification of attributes to be supplied
(in the second column) for each one (in any form acceptable for XMLItem).
Result
The result is a nested vector of the form described earlier in this
document and also used by XMLItem. However, XMLItem always returns a
single-item vector and XMLItems may return a multi-item vector (of the
same type). This result can be used in the same way as results from
XMLItem, including catenating them together (or with ordinary text) or
using them as further input to XMLItem or XMLItems.
Note: If the number (rho) of element names supplied in the left argument
is less than the rank of the data supplied in the right argument, then
the XML encapsulation process is incomplete. It cannot be used as a
finished XMLItem object as described above until it is reprocessed with
XMLItems to provide enough element names. See ?General Information? above
for more details.
Examples
???????????????????
Returns a ?? vector of 4 XMLItem results, each item of which is a single
-element containing a number from 1 to 4. Note that XMLItem
normally returns a ?? vector, so this is just 4 such vectors that are
simply catenated together. This is equivalent to:
_????????????????????????
or just:
_?????????????????????????????????????????????????????????????
which would result in:
_1
_2
_3
_4
??????????????????????????
Returns a ?? vector of a -element that contains the ?? vector from
the example above. Additional prefix element names in the left argument
would simply add a single element wrapper for each one. This is
equivalent to:
_??????????????????????????????????
which would result in:
_
__1
__2
__3
__4
_
????????????????????????
Returns a ?? vector of separately-nested ?? vectors of -elements,
which are not logically joined together and are only valid structures
separately. Further use of XMLItems is needed.
???????????????????????????????
Returns a ?? vector of -elements, each of which contains 4
-elements (each of which contains a number). This is a valid
structure, suitable for catenating to an XMLItem-vector. It would result
in:
t>1234
t>5678
count>9101112
?????????????????????????????????????
Returns a ?? vector of a -element containing a ?? vector of
-elements, as above.
XMLPath - Extract selected information from nested XML
Syntax
subset(s) ? path [path]? ??????? xml
General Information
When extracting specific information out of a nested XML vector (the
result of ParseXML), you may use your usual APL techniques to get what
you want. These include Compression (?), Pick (?), First (?), and
especially Each (?). But for complicated structures this can get to be
tiring and difficult to read. In order to simplify this process, XMLPath
has been written to help locate and extract particular elements from the
XML vector.
XMLPath was designed to use a syntax similar to a simplified version of
the standard XPath language. Whether you know this standard XML-related
syntax or not, you should find the syntax for XMLPath reasonably easy to
use.
Note: XMLPath can return either elements or low-level data as directed.
Any time that whole elements are returned, they are always in a valid
form for nested (parsed) XML data, and thus the result is suitable for
further use with XMLPath. This may sometimes cause confusion when a ??
nested vector is returned for a single item rather than disclosing it to
reveal its contents. But this is necessary for consistency ? just use
First (?) obtain a single result when needed.
Note: When processing the result of XMLPath elements (or indeed the
originally-parsed vector), it is often convenient to use ?:FOR? or call a
subroutine with Each (?). However, doing this causes an implicit Disclose
(?) of each item, changing the structure into one that is no longer valid
for use with XMLPath. However, simply re-enclosing each item (either
outside or inside the loop) will restore the proper structure and avoid
much confusion.
Right argument
The right argument to XMLPath should be a validly nested XML data
structure vector (as described earlier in this document). Such a vector
is returned from ParseXML, XMLItem/XMLItems, and from some uses of
XMLPath.
Left argument
The left argument provides one or more Paths (character vectors) to
indicate the data that should be selected. If more than one Path is
provided, then each Path is completely independent of one another,
processed as if Each (?) were used, and multiple results are returned.
The rest of this description assumes that only one Path is provided.
A Path has zero or more Terms, each separated by the ??? character. Each
Term beyond the first (in a Path) conceptually ?discloses? the XML by one
level. If you?re familiar with XPath, it?s rather like each Term is of
axis ?child::? (except for special cases like ?attribute::?). An explicit
axis operator (?::?) is therefore not supported.
All remaining contents (after ?diving? through Terms/children) is
returned. Empty vectors are returned if no matches are found. If multiple
matches are made at parent levels, then the children are joined together
as a single group (with ???) before proceeding.
If attribute pairs or values (??? or ???), element-names-only (???), or
plain text or numeric (??? or ???) results are selected, then the
returned result is not a valid XML data structure. As such, these can
only be used as final Terms in the Path. The final-only Terms are
returned as a (nested) vector of items of the same length as the number
of parent nodes from which they were extracted. All other returned values
are valid data XML node structures (a vector of 0 or more elements) and
can be further processed with XMLPath (and its family of XML functions).
Each Term may use one of the following syntax choices:
????
Select only elements with that name.
?????
Select only elements without that name.
(Empty) Select all/only contents at that level (both ??? and ???). This
is often used last in a Path by terminating the Path with a ???.
?
Select all/only child nodes.
The following items may only be used as final Terms:
?
Select all/only text contents.
?
Exactly as ???, but convert to numeric result (or ?? if not numeric).
?
Select all/only element attributes. Each element returns a nested vector
of attribute name-value pairs.
?????????
Select the value(s) of the named attribute (? if none exists).
?????????
Select the value(s) of the named attribute as above (? if none exists),
but convert to a numeric result (or ?? if not numeric).
?
Return only the names of (child) elements. Equivalent to ?? of a ???
Term.
Terms that select nodes (whole elements) may be suffixed by a filtering
criteria within square brackets (????) to further restrict which nodes
are returned. At present, only one type of filtering is supported, but
new filtering syntax (similar to XPath?s) is planned for future
enhancement. The following types of filtering are supported:
??
Numeric constant(s) select only those nodes by their sequential
positions.
Examples of Paths
????
Keep all top-level elements and discard any others.
?????????
Get all s? s (elements).
?
Select all children (both text and elements).
??????
All children not named .
??????
First child encountered.
?????????
Children of the first .
?????????
All child element names in .
??????
Text of elements.
??????
All of the s? attributes.
??????????
The lang= attribute of each .
Result
The result of XMLPath depends upon the extraction request specified in
the left argument. If whole elements are being returned (such as when
using ???), then the result is still a valid XML data structure (as
described earlier in this document). If names (?), text content (? or ?),
or attributes (? or ?) are being returned, then those are just data and
cannot be further processed by XMLPath.
See ?Left argument? above for more details on what is returned for
different requests.
If multiple extraction requests are made by supplying more than one left
argument, then multiple results are nested and returned as if
Each-Enclose (??) were used in the call to XMLPath.
Examples
??????????????????????????????????????
?????????????????????????????????????????
??????????????????????????????????????
?????????????????????????????????????????
_? 23 ?