Tags
A Scripting Language for Text
Introduction
Tags is a scripting language for processing text. You can write simple Tags
scripts to process plain and delimited text. You can also easily extract
information from HTML and XML documents obtained from websites, use ODBC to
support SQL queries, and use simple commands to manipulate folders and text
files.
You can write a valid Tags script in a single line of text. But you could
also write a Tags script that spans many files to implement, for example, a
complex document and software generation library. I know because I have.
Here is the traditional "Hello World" written as a Tags script:
<hello>
Hello World.
</hello>
A Tags script is always embedded within an XML document as text (a text node,
in XML language). A trivial Tags script is simply the text contained in the
document element of the script document - in this case, the text within the
hello element. Tags' default action is to output the text it finds as it
processes the script to the standard output. If you replaced the "Hello
World." text with the text of a book, Tags would output the entire text of
the book. But a Tags script can do much more.
Here are some simple sample scripting snippets:
* Load the Top Stories RSS document from the CNN website into a Tags
variable, and then save it in a file.
<topstories script="/topstories/text()">
$#in topstories(http://rss.cnn.com/rss/cnn_topstories.rss)
$#out topstories(topstories.xml)
</topstories>
Tags commands are identified by the leading "$#" character sequence (but
there can be leading spaces as in the sample). The Tags language also
supports variables, and in this example, topstories is a Tags variable. The $#in command reads the document identified
by the URL in the parentheses into the topstories variable. The $#out command writes the contents of the
topstories variable to the file
named topstories.xml.
* Read text lines from a file, and write them to the standard output.
<copy script="/copy/text()">
$#forEach line(myfile.txt)
$?forEach$$
$#end
</copy>
The line-variant of the $#forEach command (there are several other
variants as you will see later) reads each text line from myfile.txt,
placing the text line in the forEach
variable where you can reference it in the part of the script between
the $#forEach and the following $#end commands. In the sample script, the
line following the $#forEach command
references the forEach variable.
Tags variables are referenced by preceeding the variable name with the "$?"
character sequence, and following the variable name with the "$$" character
sequence. (You can change the characters that Tags will expect within the
script using the marks attribute in
the document element, but it's probably not worth doing.) In this example,
the $?forEach$$ reference causes the
contents of the forEach variable to
replace the variable reference and to be written to the standard Tags output
file.
* Select records from a database, and write them to the standard output.
<select script="/select/text()">
$#set dsn(customerDSN)
$#set username(myusername)
$#set password(mypassword)
$#forEach SQL(select * from customer)
$?forEach$$
$#end
</select>
Tags uses the ODBC
interface to support the SQL
query. To run this script, you need a database table, called customer,
and you need to have defined a DSN
(Data Source Name), called customerDSN, to provide the interface
information to the ODBC driver. You must also preset the dsn variable to the DSN name. You may also
need to set the username and password variables if they are needed. The
SQL-variant of the $#forEach command issues the SQL select
statement to the ODBC driver, and then places each resulting record in the
forEach variable, where you can reference it. In this sample, as in
the previous sample, the $?forEach$$
reference causes the contents of the forEach variable to be written to the
standard Tags output file.
The Tags scripting language supports XPath and regular expressions to allow
considerable scripting power. And its simple but comprehensive command set is
easy for anyone with scripting or programming experience to learn and use. If
you aren't familiar with XPath, here
is a place to start. And if you don't know regular expressions, you could
start here.
How to Execute a Tags Script
You can execute a Tags script from the command line, from within a batch
file, from a program, or from a WSH script (JavaScript or VBScript). The Tags
command line takes the following parameters:
- The name of the Tags script file to execute,
- Any parameters needed by the Tags script.
There are also several pre-defined flag-parameters that you can use:
-V
Plays the ok.wav file on success,
and the error.wav file on failure,
if the files are available.
-X
Displays this manual in the default browser, if both are available.
-Z
Saves variables to files when they are loaded with the $#in command (for debugging).
-n
(n is a number) Adjusts the time Tags sleeps to share CPU cycles between
commands. Not usually required for short runs. Mostly useful for running a
Tags script in the "background.
Example:
> tags hello.xml -v >hello.txt
This command causes Tags to execute using one of the sample files included in
this release. On completion, it plays the ok.wav file if successful, or the error.wav file if not successful (assuming
that the wav-files are present.)
Several sample scripts are included with the release.
When you install the Tags files by downloading and unzipping the tags.zip file from http://paul.medlock.com/tags.zip,
you should also add an environment variable, called tagsPath, and set it to contain the path
to the folder where you installed Tags.
Some Basics
The elements, attributes, and text of a script file are wholly determined by
the application. Since you make up the element and attribute names, along
with the structure of the script file, to fit your application, there is no
DTD or schema that describes a valid Tags script.
The text in the Tags script is free-form and can contain any ordinary text
and special characters except for the standard five XML predefined
characters:
- instead of "<", use <
- instead of ">", use >
- instead of "&", use &
- instead of "'" (apostrophe), use '
- instead of """ (quote), use "
If your text contains any of these characters, you may need to convert them
to the equivalent XML entity reference.
On the other hand, you can choose to embed your text in CDATA-sections
instead. You can use a CDATA-section anywhere you could write text, and you
can even mix them together, since Tags treats CDATA-sections as if they were
text. A CDATA-section begins with the string "<![CDATA[" and ends with
"]]>". Here is an example:
<element><![CDATA[
put your <marked> up text & commands here
]]></element>
The text may also contain white-space: viz., spaces, tabs, and new-lines.
Since these characters are preserved in the text, you will find that they
will frequently appear in the output of your script unless you control their
use..
Here's a useful idea: If you aren't
using the CDATA option and you choose to convert the special
characters to entity-references when performing a search-and-replace,
be sure to replace the ampersands with & first. Otherwise,
you will never find the ampersands later to fix them.
|
The examples in this manual may not use the XML entities when they should so
that they are easier to read. But don't forget that you will have to deal
with that issue before you can use your script in Tags. The characters that
Tags uses for markup were chosen so as not to infringe on XML's markup.
Here is another useful idea: You can
check an XML document for being well-formed using Internet Explorer
5+, Netscape 6+. Mozilla, Sea Monkey, FireFox, etc; To use IE, for
example, just drag the name of the file you want to check onto the IE
shortcut on your desktop. IE will recognize the XML file name
extension and display the document. If the document contains an
error, your browser will report the line and column numbers where the
error was detected. Of course, if you use a different file name
extension, e.g., myscript.tags, the browser may not recognize the
file as XML.
|
XML is case-sensitive, and, consequently, XPath expressions are
case-sensitive. Tags is partly case-sensitive. Command names are not, but
variable names are.
A Tags script file must be a well-formed XML document. Usually the bulk of
the file is the text that you want in the output. Here is the Hello.xml
example again:
<hello script="/hello/text()">
Hello world.
</hello>
and you can run it with the command line
> tags hello.xml >hello.txt
The document element of a Tags script document should contain the script attribute, which identifies to the
Tags interpreter where the script is within the document using an XPath
expression. In the example, the value of the script attribute is "/hello/text()". This
is an absolute XPath expression. It's a good idea to always use an absolute
XPath expression to locate the script. The script attribute is optional, but only if
the Tags script is the sole occupant of the document element, as in this
case. We need the script attribute
in more complex script documents, since the script probably will not be in
such an obvious place, so you are probably better off by getting in the habit
of using it.
By default, Tags writes the text generated by the script to the standard
output file, but at least one of the sample scripts we have already discussed
demonstrates how to direct Tags output to other files.
About those pesky whitespace characters. If you look carefully at the
contents of the output file from the Tags run above, you will notice that
there is a blank line, followed by the "Hello world." line. This blank line
resuts from the newline that follows the <Tags> element - the "Hello world."
line is on the next line down. You can remove that extraneous line from the
output in two ways. You could rewrite the script as
<hello script="/hello/text()">Hello world.
</hello>
or you could use a join-command:
<hello script="/hello/text()">$\j
Hello world.
</hello>
The join-command ($\j) joins with the
next line, and is one of several special text output control commands.
Another command is the newline-command, which breaks lines, and is written as
$\n. It causes
the text of the line that follows the command to be written as the next line.
In the following line of text, the newline-command causes the one line to be
output as two lines.
this is the first line$\nthis is the second line
There are other output control commands, but I'll explain them later in the
manual.
As we saw in the second sample script, you can redirect output to files other
than the standard output using the $#out command. Let's modify the hello.xml file by using the $#out command to redirect its output to
another file:
<hello script="/hello/text()">$/j
Hello world.
$#out (hello.txt)
</hello>
After you run this example, you will find the output of the script in hello.txt. Note that this version of the
$#out command does not identify a
variable as the output source as did the RSS document load sample in the
first section of this manual. A variable name is not needed because Tags can
emit text to a default variablet (its name is output, if you want to reference it), and
the $#out command in this example is outputting the text from the default
variable to the file. (note: the $#out command flushes the default variable
as a side effect. )
In most programming languages, the text information is usually marked off
from the other elements of the language with special marks, such as quotation
marks, etc., while the language commands are not marked. In Tags, it's the
other way around: text is written simply as text. It is the special Tags
commands that are marked.
There are two kinds of Tags symbols: commands and referencers. Commands occupy a single line
of text, and are identified by a $-sign followed by a #-sign followed by the
command name. Spaces are not allowed to separate these three parts, but
commands do not have to start in the beginning of the line: there may be
leading spaces. Lines that begin with the "$#" identifier that are followed
by a space or do not have a recognized command name are considered comments
and are ignored.
You use referencers to modify the
outputs that your Tags script generates. You can reference the text and
attributes of the Tags script document, other XML documents that you load,
and variables whose values you set. Referencers begin with a $-sign followed
by either an explanation-point ("!") or a question-mark ("?") followed by an
expression of some kind, followed by two $-signs. Referencers can appear
pretty much anywhere within your text as you need them, but they must be
complete on the same line on which they start. On the other hand, their
resolved value may span as many lines as desired. The file copy and the ODBC
samples both used the $?forEach$$
variable referencer.
Commands and referencers will be discussed in more detail in subsequent
sections, but here are some examples:
Tags commands:
$#out (myfile.txt)
$#text class(myclass)
$#if (true)
$#end
$#debug (on)
$#get objectname(Enter the name of the new object:)
$# this is a comment (because of the space after the $#-prefix)
Tags referencers:
$!/model/help/text()$$
prompt="$!@prompt$$"$\j
<map name="Action" value="$?line{1}$$" info="$?line{2}$$"/>
The effect that these commands and referencers might have on the output of a
script depends on the context in which they operate. Different data at the
locations specified by the referencer expressions will result in different
outputs. And, since there is no difference between data and program in Tags,
any referencer could obtain text that contains commands and referencers that
Tags would also process in a recursive fashion. That's how Tags provides
something akin to the subroutine paradigm that programmers are familiar with,
though not exactly, since Tags does not provide a facility for passing
parameters to "subroutines".
More Samples
Here are a couple of sample scripts of more complex activities you can
implement in a few Tags script lines:
* Query a database table, called customer, to obtain customer information,
write the information into a text file, and then display the results in
notepad. The script assumes that a DSN, called customerDSN, has been
created for the database table access. Check the link given earlier for
information about ODBC.
<db2text script="/db2text/text()">
$#set dsn(customerDSN) assumes that the DSN customerDSN was previously declared
$#set sqlcolumns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
$#forEach SQL(select * from customer) {
$?CustNo$$,$?Name$$,$?Street$$,$?City$$,$?State/Prov$$,$?ZipCode$$,$?Country$$,$?Phone$$
$#end }
$#out (cust.txt)
$#exec (notepad cust.txt)
</db2text>
If you provide the names you want to assign the columns in the result records
to the Tags interpreter using the sqlcolumns variable, you can access
the columns as variables by their name, which I do in this example. In order
to use ODBC, you must first have set up the ODBC link for the specific
database, as I described earlier. It is beyond the scope of this manual to
explain that, but you can get more help by following this sequence of steps
in Windows XP: Windows Start -> Control Panel -> Administrative Tools
-> Data Sources (ODBC). Ok, now you are on your own.
* Here is another script that builds on the earlier script to read the same
RSS document from the CNN news site, extract features from the document to
create an HTML document, and then display it using your default internet
browser. Note the use of the CDATA-section to escape all the HTML tags.
<rss2html script="/rss2html/text()"><![CDATA[
$#in contextNode(http://rss.cnn.com/rss/cnn_topstories.rss)
<html>
<head>
<h2>$!/rss/channel/title/text()$$</h2>
</head>
<body>
$#forEach node($!/rss/channel/item$$) { list all the items in the feed
$#set contextNode($?forEach$$)
<h3>$!title/text()$$</h3>
<p>$!description/text()$$</p>
<p><a href="$!link/text()$$">Link</a></p>
$#end }
</body>
</html>
$#out ($?currentPath$$/topstories.htm)
$#open (file://$?currentPath$$/topstories.htm)
]]></rss2html>
This example obtains the Top Stories RSS document from CNN, as in the
earlier sample, and creates an HTML document using the <item> objects
in the document, writes the result to a file called topstories.htm,
and then opens the default browser to display the file. Note that the URL in
the $#in command must begin with "http://" so that Tags will know to look for the
object on the web.
Tags referencers may be coded virtually anywhere within the text of the
script, and have the form
$reftype symbol { subscriptor } $$
There are two reference types, distinguished by the single-character reftype:
! (exclamation-mark) indicates an
XPath expression. In most circumstances, Tags replaces the referencer with
the value obtained by evaluating the XPath expression.
? (question-mark) indicates a
variable reference. Tags replaces the referencer with the value of the
variable specified by the symbol. Variables are discussed later.
Referencers may be used anywhere within the script where they make sense. A
referencer may also contain referencers, and may result in text that contains
other referencers, which are also resolved until only unmarked text is left.
As already mentioned, a referencer cannot be split across two or more lines:
it must lie wholly within a single text line.
Examples:
| $!//config/tag$$ |
an XPath expression reference that
identifies all the <tag> elements in all <config>
elements in an XML document. |
| $?forEach$$ |
a reference to the Tags variable that
contains the local value within a $#forEach statement. |
| $?3$$ |
a reference to the third parameter on
the command line |
| receiver->SetSource("$!@source$$"); |
an XPath referencer to the source
attribute embedded in some text in the script document. (Note that
the quotes are part of the output, not part of the referencer.)
|
$?$?index$$$$
|
a reference to the variable identified
by the value of the referenced index variable (a nested
reference)
|
$#set x($!$x+1$$)
|
a Tags $#set command using an
XPath expression reference to increment the variable x by one. (Note that you can
reference a Tags variable within an XPath expression using only the
$-leadin character as documented in the XPath specification. Writing
($!$?x$$+1$$) would also work, except that the Tags interpreter
resolves the reference instead of the XPath interpreter, so you may
have to place it in quotes if it resolves to a string constant.)
|
$@script!/myscript/mysubroutine$$
|
This special form of the XPath
referencer allows to specify a node to use as a reference point when
processing the XPath expression. The example is using the script variable, which is
initialized by Tags to the document element of the script document
itself. If no context node is specified, Tags uses the contents of
the contextNode variable as
the reference point.
|
$# forEach
node($!/dep/mod[match(@name, "$?forEach$$.[cC]")]/ref/@name$$)
|
This example demonstrates an XPath
referencer that contains a variable referencer.
|
Subscriptors
When the type of a resolved referencer is a string, a list of strings, or a
nodeset (an XPath object), you can use an optional trailing subscriptor to
obtain a portion of the resolved referencer value. A subscriptor is annotated
as an open curly-brace, followed by a number, followed by a close
curly-brace, and is appended to the end of the referencer before the trailing
dual markers (the $$ tail). The subscriptor, itself, can also incorporate one
or more referencers, but it must resolve to a positive number (integer). If
the resolved value of the subscriptor is zero, or less than zero, the value
of the resolved subscripted referencer is left unchanged (i.e., not
subscripted - subscripts start at one.).
If the resolved value of the subscripted referencer is a string, Tags assumes
that the string is a series of fields preceeded by a delimiting character. Any
character can act as a delimiting character, and it is (by definition)
identified as the first character of the string. If the first character is a
comma, the delimiting character is a comma. If the first character is the
letter "A", the delimiting character is the letter "A". (Notice in the
example below that the string is prefixed with a comma to identify the comma
as the delimiting character.) In this manual, strings delimited in this way
are referred to as a delimited
string, or as a string record
.
,Lincoln,Abraham,Springfield,Illinois
If the value of the resolved referencer is a nodeset, Tags obtains the node
in the nodeset corresponding to the subscriptor value, counting the first
node as node one. I.e., the first node in a nodeset variable is identified as
$?nodeSet{1}$$, where nodeSet is the name of the variable.
If the value of the subscriptor is larger than the number of objects (fields,
strings, or nodes), then the value of the subscripted referencer is empty. If
the type of the resolved subscripted referencer is not a string, string list,
or a nodeset, the subscriptor is ignored.
Example of using the subscriptor notation:
$#text pres(,Lincoln,Abraham,Springfield,Illinois)
$#text city($?pres{3}$$) sets city to "Springfield"
$#set record(,$!@xyz$$) note the leading comma
$#set field($?record{5}$$) sets field to the value of the fifth field in the xyz attribute
Tags supports variables that can be referenced and assigned values. Each
variable has a name and a value. Unless it violates some other Tags rule, any
alphanumeric string can be a variable name. Values may be of any Tags type
(as described in the next section), or they may be empty. Variable names are
case-sensitive. You set the value of a variable using one of several Tags
commands, and you obtain the value by using the $?varname$$ referencer form.
Tags provides several variables that contain information about the processing
environment of the script. For example, the command-line parameters are
available as variables whose names are the numbers corresponding to the
positions of the parameters that they contain. For example, the first
parameter is available in the variable referenced as "$?1$$", the second
parameter is available in the "$?2$$" variable, and so
on. In the example command line given in the introduction, $?0$$ contains
"Tags", and $?1$$
contains "help.xml".
Tags also allows you to access the command-line flag-parameters (annotated in
the command-line using the form -letter{letter}). Examples of command-line
flag-parameters are -D, -C, -a, etc. Flag-parameters are preserved as Tags
variables having the letter as both their name and their value. The names are
always capitalized, regardless whether the flag-parameter is or not.
Variables named "$?a$$" and "$?A$$" are different
variables, and only the second could represent a flag-parameter. The
flag-parameter variables make it easy for the user to communicate special
conditions to the script. By the way, notice that there is no provision for
referencing numeric flag parameters as Tags variables.
You can also reference an environment variable by appending its name to
"env.". If you
reference an environment variable, such as PATH, as a variable (e.g., as in
$?env.path$$),
the value of the environment variable is returned. Tags does not currently
change the values of environment variables, it only allows you to access
their values in your script. This might change.
Tags pre-defines a number of variables to provide a means of communicating
between the Tags interpreter and your Tags script. Some of these variables
are associated with specific Tags commands. But there are several which have
meaningful values for the duration of the execution of a script. Following is
a list of Tags variables that have special meaning in the Tags language:
columns, getColumns, inputColumns, SQLColumns,
regXColumns
Used by various variants of the forEach command to parse the forEach input
into fields.
command line flags
Command line flags are referenced by their letter value using the notation
$?x$$, where
x is the actual
upper-case letter value of the flag. Tags interprets any command-line
parameter that is immediately preceeded by either a minus sign or a slash as
a command line flag group. Each letter in the group is a flag. Only letters
can be used as flags in Tags. The value of a flag variable is the name of the
variable. For example, if you code -AbC on the command line, Tags will create
three variables called $?A$$, $?B$$, and $?C$$, with respective values of
"A", "B", and "C".
command line parameters
Command line parameters are referenced by their position using the notation
$?n$$, where
n is the index
of the parameter in question. The first parameter is indexed as one.
Parameters are always strings. Command line flags as described above are not
counted and are handled in their own way.
contextNode
Used by XPath references to identify the default root of an XPath search
(string). Set by Tags during initialization to reference the root element of
your Tags script. You set it according to need. Tags provides an enhanced
form for an XPath referencer expression that allows you to use any variable
as the context node for the expression. The form is $@var!xpath$$. Note that $@contextNode!expression$$ is the same as
$!expression$$. The variable should
contain an XML node.
currentPath
Contains the absolute path to the current directory. Tags sets this to the
directory from which you are running your Tags script.
date and time
Tags provides date and time information to your Tags script through several
variables, which are updated before the interpreter processes each script
command. $?time$$ (string - format
is hh:mm:ss), $?day$$ (number - day
of the month), $?dayOfWeek$$ (string
- name of the week day), $?dayOfYear$$ (number - Julian day), $?month$$ (number - month of the year),
$?monthName$$ (string - name of the
month), and $?year$$ (number - all
four digits).
dsn
ODBC data source name used by SQL interface (string). You must set this
before using the SQL-variant of the $#forEach command.
empty
Convenience variable set by Tags to contain absolutely nothing. Use it to
clear other variables to empty as in $#set
var($?empty$$).
environment variables
Variables whose name starts with "env." is interpreted as an environment
variable, and Tags will attempt to return the value of the corresponding
environment variable, if defined. Otherwise, the value of the referencer is
empty. Note that you cannot change the value of an environment variable in a
Tags script. Note that, unlike other Tags variables, environment variable
names are not case sensitive. For example, reference the Path environment
variable as $?env.path$$.
error
Set by Tags as the result of the $#exec and $#open commands. It contains the
value returned by the executed program.
file variables
$?fileDrive$$ (drive:), $?fileName$$, $?filePath$$ (path\), $?fileInfo$$. These variables are set by
the file-variant of the $#forEach
command, which is described later in this document.
HTTP variables
TBD: $?HTTPHeaders$$ and $?HTTPResponseHeaders$$.
grep
Set this with a regular expression before using a $#forEach command to provide a filter in
selecting objects to present in the $?forEach$$ variable. It is not required
for proper forEach operation, but it can improve the performance of your
script in many cases. Even when you set the variable outside the $#forEach loop, it appears empty inside
the loop. But, once set, it retains its value outside the loop. This means
that, unless you change its value, yourself, it will have the same value for
two consequtive $#forEach loops,
which might not be what you want. So you should set it or clear it as needed
before each loop. $#forEach variants
that apply the grep variable are the Field, Line, Lineb, Str, Strb, and the
default variants. Regular expressions in Tags are compatible with the rules
of Perl 5, and are implemented using the PCRE software.
last
Set by the $#forEach command to the
index of the last object in the object set being processed by the command
(number). The value is not known in some variants of the $#forEach command, and is set to zero in
those cases.
output
Container in which Tags collects output text that is otherwise undirected,
and is automatically dumped to the standard output if not otherwise used. If
it is copied or appended to another variable, it is flushed.
password
ODBC password used by the SQL interface (string). Not all database accesses
require this, but when they do, you must set the value before using the
SQL-variant of the $#forEach
command.
position
Set by the $#forEach command to the
index of the current forEach value (number). The first object is indexed as
one.
regex variables
After performing a $#match or regex
version of an $#if or $#ifn, a set of variables contain the
matching substrings. The $?regXCount$$ variable specifies the
number of matched substrings, and the $?regXi$$ variables contain the matched
substrings; e.g., the third matched substring is in the variable named $?regX3$$ while the original matched
string is in the variable named $?regX0$$. TBD: $?regXColumns$$.
script
Set by Tags during initialization to the root of the Tags script (XPath
node). Use this to implement subroutines by writing XPath expressions
referencing other elements within the same Tags script, as in the following
example. $@script!/myscript/mysubroutine/text()$$.
sqlcolumns
Set this before using the SQL-version of the $#forEach command to define the fields of
the record set you expect to obtain via your select statement. See also the
ODBC example given earlier.
Set by Tags to the tagsPath
environment variable if present. Otherwise set to the path of the Tags
executable (string).
userName
ODBC user name by the SQL interface (string). Not all database accesses
require this, but when they do, you must set the value before using the SQL-variant of the $#forEach command.
You should remember that, except for environment variables, your script can
set the value of any variable, and you can lose valuable information by
overwriting the values of certain variables. For example, you will lose the
value of the variable $?script$$ by setting
it to some other value. On the other hand, you may well overwrite the value
of $?contextNode$$
frequently when you are using XPath expressions.
Properties of Variables
When a Tags variable is defined, it has a value, and it also has three
additional properties that can be ascertained using the $?#, $?%, and $??
prefixes. Note that these are the regular $? prefix with an additional #, %
and ? appended, respectively.
Number of Fields
Use the $?# prefix to obtain the number of fields in the contents of the
specified variable. Note that this value only makes sense when the variable
contains a string.
Length
Use the $?% prefix to obtain the length of the contents of a specified
variable.
Type
Use the $?? prefix to obtain the type of the contents of the specified
variable. There are a number of types that Tags values might have. Here is a
list of the types along with the meaning of the length property for that type
in parentheses.
- string (number of characters)
- string_list (number of strings)
- node_list (number of nodes)
- element_node (1)
- attribute_node (1)
- text_node (1)
- cdata_section_node (1)
- entity_reference_node (1)
- entity_node (1)
- processing_instruction_node(1)
- comment_node (1)
- document_node(1)
- document_fragment_node (1)
- notation_node (1)
Value types must be compatible with the context. An XPath node or XPath
nodeset value resulting from the resolution of a Tags referencer discovered
in ordinary text is converted to text, or may be an error. String type values
are acceptable everywhere. When XPath expressions obtain boolean or numeric
values, Tags converts them to strings.
$#set x( this is a string)
$#set t($??x$$) t is set to "string"
$#set f($?#x$$) f is set to 4
$#set i($?%x$$) i is set to 17
After the four $#set commands are
processed, x contains " this is a string", t contains "string", f contains
the number of fields in x, which is four, and i contains the length of x,
which is 17. If x was set to a node list, its length is taken as the number
of nodes in the list and the number of fields is set to zero. If x was set to
a string list, its length is defined as the number of strings in the list.
And so on.
Commands
Here are some general comments about Tags commands.
A Tags command may be coded virtually anywhere within the text of the script,
but must be the sole occupant of the text line. Tags commands have the
following form:
$#commandName argument1 (argument2 ) commentable area to the end of the line
Unlike variable names, the commandName is not case-sensitive. While
all commands have a commandName ,
not all commands have argument1 and
argument2, and no command has argument1 without having argument2. In all commands that have argument2, the parentheses are
required.
In most cases where it is used, argument1 is processed differently than
argument2 , Argument1 is usually resolved to a string,
while argument2 is resolved only as
far as needed. On the other hand, argument2
can resolve to a nodelist, or a SQL result set in the forEach command,
for example. This should be fairly intuitive in each case. (yeah right - I'll
try to clarify this more as I work more on the manual.)
When Tags parses a command, it must be able to isolate the two arguments.
This can occasionally conflict with the characters that the two arguments
must use. Specifically, Tags uses the following characters to parse a
command:
- quote ("""),
- apostrophe ("'"),
- open parenthesis ("("), and
- close parenthesis (")")
If these characters are paired within the command arguments, then Tags should
have no trouble. But if they are not paired, Tags will fail to understand the
command. You can help Tags out by "hiding" unmatched characters by
immediately preceeding the characters with the backward-apostrophe (`) up by
the tilde (~). (By the way, it is harmless, though unnecessary, to hide any
character in a command argument in this way.)
Here is an example:
$#match $?s$$(.*() will fail to parse, but
$#match $?s$$(.*`() will work fine
There are three basic categories of commands:
- Conditional commands
- The forEach command
- Additional commands
Conditional commands perform the
same function they do in any scripting or programming language, they let the
script make decisions, and vary its behaviour according to the conditions it
encounters.
The forEach command provides the
ability to repeat specified functionality over a set of objects, such as
nodes in a nodeset, text lines in a file, inputs from a user, fields in a
text record, etc.
A number of commands that I don't categorize further fall into the additional commands group. These include
several debugging commands, an output director command, several variable
setters and a variable loader, an include command, and a number of others. A
bit of a hodge-podge.
Conditional Commands
Tags provides a set of commands that conditionally control the inclusion or
exclusion of text and/or other commands.
$#if (expression)
Is false if the expression evaluates to false , and is true otherwise.
$#ifn (expression)
Is true if the expression evaluates to either empty, to the value zero (0),
or to the string "false" (case ignored), and is false otherwise.
$#elif (expression)
Is false if the expression evaluates to empty, to the value zero (0), or to
the string "false" (case ignored), or if a previous conditional command was
true, and is true otherwise.
$#elifn (expression)
Is false if the expression evaluates to non-empty, is not the value zero (0)
and is not the string "false" (case ignored), or if a previous conditional
command was true. Is true otherwise.
$#else
Is false if a previous conditional command was true, and is true otherwise.
$#end
Required to terminate a conditional command sequence. Also required to
terminate a forEach command, discussed below.
Expressions must resolve to strings to be properly evaluated. Tags
automatically converts XPath boolean and numeric results into strings, so
boolean true and false are converted to their string
equivalents. XPath and variable expression results that are nodes or nodesets
are converted into strings before they are evaluated according to these
rules.
These expression values are recognized as false:
- the value is empty
- the value is zero (0)
- the value is "false"
- the value is "off"
- the value is "no".
All other values are taken as true
.
Examples:
$#if ($!$?position$$ = $?last$$$$)
"$!text()$$",
$#else
"$!text()$$"
$#end
This example shows an $#if-command,
which might be coded within a $#forEach loop, and is a test to determine
if the last object is being processed to decide whether to terminate the line
with a comma. The $#forEach command
is explained in some detail below.
$#if ($?A$$)
do something big deal here...
$#end
The second example tests to determine if the command-line flag A is present
by testing if the variable, named "A", contains a value other than empty.
Additional commands that depend on boolean values also evaluate expressions
according to the same rules as the conditional commands.
Regular Expressions
Scripting in Tags sometimes requires the need for regular expressions. Four
of the conditional commands have additional forms that support the use of
regular expressions in decision making.
$#if string(regular-expression)
Is true if the string matches the regular-expression, and is
false otherwise. If true, subsequent $#elif and $#elifn
statements are ignored.
$#ifn string(regular-expression)
Is true if the string does not match the regular-expression,
and is false otherwise. If true, subsequent $#elif and $#elifn
statements are ignored.
$#elif string(regular-expression)
If evaluated, is true if the string matches the
regular-expression, and is false otherwise. If evaluated and true,
subsequent $#elif and $#elifn statements are ignored.
$#elifn string(regular-expression)
If evaluated, is true if the string does not match the
regular-expression, and is false otherwise. If evaluated and true,
subsequent $#elif and $#elifn statements are ignored.
Each conditional command matches the regular-expression with the string. (Note that it MUST be a string.
Anything else will fail.) If the regular-expression matches the string, and it contains sub-match
expressions (i.e., expressions coded within parentheses in the regular
expression), Tags sets variables to the matched portions of the string.
These variables have names that correspond to the positions of the sub-match
expressions within the regular-expression. The sub-match variable names have
the form $?regXi$$, wherei is the index of the sub-match expression
that corresponds to the variables.
Here is an example:
$#if $?date$$(^([0-9]*)-([0-9]*)-([0-9]*) ([0-9]*):([0-9]*):([0-9]*))
$#set date($?regX2$$/$?regX3$$/$?regX1$$ $?regX4$$:$?regX5$$:$?regX6$$)
$#end
This fragment reformats a date from y-m-d h:m:s to m/d/y h:m:s.
(Just a reminder: Note that the parentheses are all paired in this example,
so that Tags can find the beginning of the expression by matching the pairs.
If the parentheses do not match, you must use the back-quote character (`) to
escape the unmatched parentheses.) If the value of the date variable is
"2005-12-09 14:21:15", then the match generates the following six sub-match
variables:
$?regX1$$ = 2005
$?regX2$$ = 12
$?regX3$$ = 09
$?regX4$$ = 14
$?regX5$$ = 21
$?regX6$$ = 15
One additional variable, called $?regXCount$$ , contains the number of
sub-matched expressions. In the example, its value is six.
If a conditional command evaluates to false, the $?regXCount$$ variable is
set to zero. If a conditional command results in fewer sub-match variables
than the last match, only the variables for the sub-matches of the latest
match survive. Sub-match variables are not managed in any other way.
This is not an explanation for regular expressions. You can find out more by
following this link.
See also the $#match command, which is described later.
The $#forEach Command
$#forEach
type(argument)
Processes the commands and text that fall between the $#forEach command and its matching $#end command once for each object
identified in the $#forEach
argument. For each object, the variable, $?forEach$$, is set to contain the object
to allow the text within the loop to reference the object, while the $?position$$ variable is set to its index.
Note that Tags handles $#forEach
nesting so that the $?forEach$$
variable is maintained according to its context.
While processing the $#forEach loop,
the variable $?last$$ is set to the
index of the last object to be processed in the loop .
The type can be either empty or it can be "count", "field", "get", "input", or "SQL". If empty, the $#forEac h argument should resolve into
either a nodeSet containing zero or more nodes, a single node, or zero or
more text lines (each line is taken as a$#forEach object.)
Count
If the type field specifies "count", then the $#forEach argument must resolve into a
number. The $#forEach logic performs
the loop once for each value from one to the argument value, incrementing by
one for each pass.
Field
If the type field specifies "field",
then the $#forEach argument should
resolve to a string record, with its first character identifying the field
separator character. The $#forEach logic loops for each field in the
argument, setting the $?forEach$$
variable to each field in turn.
File
If the type field specifies "file",
then the $#forEach argument should
resolve to a string record having the form, |directory|mask|type, where directory is the path to the directory of
interest, mask is a filename
expression, and type may be any
"sum" of dir, tree, data, or any. Combine them using the plus-sign (+).
The mask expression can use the plus-sign (+) and the minus-sign (-) to
include or exclude ambiguous or absolute file names. E.G., *.cpp+*.h-s* includes cpp files and header
files except those that start with the letter s. The $?forEach$$ variable contains the full
pathname of each file that the forEach command finds per each iteration..
Get
If the type field specifies "get", then the $#forEach argument should resolve to a
prompt string that is displayed in the console window. User input is
accepted, and when the user presses the Enter-key, the $#forEach loop is performed. During the
pass, the user response is available in the$?forEach$$ variable. The $#forEach loop is terminated when the user
presses the Esc-key.
Line, Lineb
If the type field specifies "line" or
"lineb", then the $#forEach argument resolve to a file name.
The $?forEach$$ variable contains
the text of each consequtive text line in the specified file.
The Tags interpreter opens the file, and then performs the loop once for each
text object it finds in the file. The
$?position$$ variable is incremented to
reflect which object is being processed. Since the number of objects within
the file is not known during the loop, the
$?last$$ variable is not valid.
You can use the Lineb variant to ignore blank text lines.
There are two kinds of text objects that Tags recognizes: XML elements, and
simple lines of text terminated by either a newline or a return, or both, in
any combination.
If the first non-whitespace character in a line is a "<", then the object
is assumed to be a valid XML element. The Tags interpreter locates the
end-tag for the element, and then loads the element into a DOM and stores its
reference in $?forEach$$. If Tags is unable to load the document into
the DOM, then Tags quits with prejudice.
If the first non-whitespace character is not a "<", then the text line is
read and loaded into $?forEach$$. Tags can handle text lines as long
as 4095 characters. Any longer than that, and Tags terminates. This variant
of the $#forEach command provides the ability to convert each text
line into a set of variables through the use of the columns variable,
which is discussed in the next section.
When all objects in the file have been processed, the file is closed.
Example:
The file
input.txt contains a list
of numbers followed by names that are associated with the numbers. Here are a
few lines from that file:
0,UNKNOWN
1,CREATE TABLE
2,INSERT
3,SELECT
Suppose the problem is to reformat each line so that the output looks like
this:
<map value="0" name="UNKNOWN"/>
<map value="1" name="CREATE TABLE"/>
<map value="2" name="INSERT"/>
<map value="3" name="SELECT"/>
The reform script in the file
input.xml uses the
input type of the
$#forEach command to accomplish this:
<reform><![CDATA[$\j
$#forEach line(input.txt)
$#text line(,$?forEach$$)
<map value="$?line{1}$$" name="$?line{2}$$">
$#end
]]></reform>
This example uses subscripting to obtain the individual fields in each line.
Notice the comma in the
$#text
command. The comma is combined with the text line to form a value of, for
example, "
,1,CREATE TABLE ", which
is stored in the
line variable. The
leading comma informs the Tags parser that the fields are separated by
commas.
These files are included in this release. Use the following command line to
run this example:
> Tags input.xml >map.xml
Node
If the type field specifies "Node", then the $#forEach argument must resolve to a node
list, and the $#forEach loop is
performed once for each node in the node list. The $?forEach$$ variable will contain each
node in turn.
SQL
If the type field specifies "SQL",
then the $#forEach argument must
resolve to a SQL query, which is performed against the DSN named in the $?dsn$$ variable. The $#forEach loop is performed once for each
row in the result set of the query, with the $?forEach$$ variable containing each row
in turn. Because of the relative complexity of this $#forEach option, it is discussed in more
detail under its own heading below.
XML
If the type field specifies "XML", then the $#forEach argument must resolve to a file
containing a list of one or more XML documents. The $#forEach loop is performed once for each
XML document in the file, with the type of the $?forEach$$ variable being
"document_node". The $?forEach$$
variable can be accessed using XPath expressions.
Variables Associated with the forEach Command
These variables have a special relationship with the $#forEach
command. As the command initializes, it saves the value of the variables, and
restores their values at the end of the loop. Note that some variables are
inputs to the $#forEach command while others are output by the $#forEach
command.
columns, getColumns, inputColumns, SQLColumns
Set these variables to cause the $#forEach command to parse the value
of the forEach variable into a set of variables containing its fields.
If the columns variable is not empty, the parse is applied whenever
the forEach variable is a string. This can happen for the
input-type, the SQL-type, and for the default-type of the
$#forEach command. For all types except the SQL-type, the
format of the columns variable can have one of two forms:
1. ,name1,name2,...,nameN
2. ,name1{size1},name2{size2},...,nameN{sizeN}
Use the first form when the forEach value is a string record, and use
the second form if the forEach value is a record comprised of a set of
fixed-length fields. If a name is omitted, the field is skipped and no
variable is created for that field. While the forms shown above use the comma
as the field delimiter, any special character is acceptable.
In the first form, if the forEach value is not a proper string record,
i.e., does not start with a non-alphanumeric character, the field delimiter
of the columns variable is assumed to be appropriate for the
forEach value as well.
columns is used by the Str/Strb and
the anonymous types of forEach.
getColumns is used by the Get
type,
inputColumns is used by the
Line/Lineb type.
SQLColumns is used by the SQL
type.
forEach
Variable set by the $#forEach
command to contain each object, in turn, that is contained in the forEach
argument. For example, if the forEach argument is a nodeset, then the $?forEach$$ variable will contain a node.
When Tags begins, it initializes $?forEach$$ to reference the script text.
position
Variable set to the index of the current object processed by the $#forEach command. (the position of the
first object is one, the second object is two, etc.) When Tags begins, it
initializes $?position$$ to zero.
last
Variable set to the index of the last object processed by the $#forEach command. This variable is not
valid during an input -type or SQL-type $#forEach loop. When Tags begins, it
initializes $?last$$ to zero.
contextNode
Unless you use the $@var!xpathExpression$$
form, you must set this variable before using any XPath expression to
search any subtree of an XML document. When Tags begins, It initializes $?contextNode$$ to reference the script
document.
Example:
Here is an example using the variables provided by the $#forEach command:
$#forEach ($!//event$$)
$#set contextNode($?forEach$$)
$#if ($!$?position$$ =$?last$$$$)
"$!text()$$",
$#else
"$!@name$$"
$#end
$#end
$# At this point, after the above forEach command is
$# processed, the value of both the forEach and
$# the contextNode variables revert to the values held before
$# the forEach command was encountered.
Here, the XPath expression "@name "
is to be applied to each of the <event> elements in the script
document. In this example, the script writer has set the $?contextNode$$ variable to let Tags know
where to look for the text() and
name="" attribute by setting the
$?contextNode$$ variable to contain
the current <event> element
object. Note that the $?contextNode$$ variable is not set
automatically.
The values of the $?forEach$$ ,$?position$$, $?last$$ , and $?contextNode$$ variables are saved
before processing a $#forEach loop,
and, at the completion of the $#forEach$$ loop, are reset to their saved
values. Note that while you generally would not $#set the $?forEach$$, $?position$$ , and $?last$$ variables, you should $#set the $?contextNode$$ variable to control the
context of your XPath search expressions within the $#forEach context.
Another example:
Assuming that the following Tags script is stored in a file, called letter.xml, it can be processed with the
following command line:
> tags letter.xml >letter.txt
Tags script in the file, letter.xml:
<letter script="/letter/body/text()">
<body>
$#!/letter/data/salute/text()$$$\j
$!/letter/data/firstname/text()$$$\j
$!/letter/data/lastname/text()$$
$!/letter/data/street/text()$$
$!/letter/data/city/text()$$,$\j
$!/letter/data/state/text()$$$\j
Dear $/letter/data/salute/text()$$:
I am looking for fresh wood for my sawmill. I am especially
looking for Eastern hardwoods. Do you have any on hand? I will
be happy to remove it and pay you a fair price for the opportunity.
Sincerely,
Paul B.
</body>
<data>
<salute>Mr</salute>
<firstname>George</firstname>
<lastname>Washington</lastname>
<street>123 Cherry Lane</street>
<city>Mt Vernon</city>
<state>Virginia</state>
</data>
</letter>
There are several variables associated with the SQL Query interface, which
are discussed in the next section.
Using the forEach File Interface
The form of the forEach argument is
|directoryName|fileMask|searchType
The directory name can be any ambiguous or non-ambiguous path given the value
of the $?currentPath$$ variable. The
file mask can be a logical expression comprised of ambiguous and
non-ambiguous file names concatenated with either the plus sign (implements
union) or the minus sign (implements difference). The valid searchTypes can
be one from the set { root | tree }
and one from the set { data | dir |
any } where the defaults are root and data.
The variables that the $#forEach
command sets are
$?fileInfo$$ is a string record
having the form
|fileName|createDate|createTime|createSecs|modificationDate|modificationTime|modificationSecs|size|"dir"
or "data"
$?fileDrive$$ is the drive letter
followed by a colon,
$?filePath$$ is the path followed by
a forward-slash, and
$?fileName$$ is the file name and
extension, if any.
Note that file paths can use the forward-slash or backward-slash.
Using the forEach SQL Query Interface
Tags provides a SQL query interface through the SQL variant of the $#forEach command. For example, assuming
that there is an accessable dataset, called name-and-address, on your
computer, the following $#forEach
command implements a simple query to that table:
$#forEach SQL(select name,
street, city, state, zipcode from name-and-address)
..etc
$#end
Generally, the result of a SQL query is what is called a result-set: a set of rows (records) that
satisfy the query. Tags repeats the forEach loop once for each row in the
result-set, setting the $?forEach$$ variable to each row in the result-set,
in turn.
By itself, the $#forEach command given above does not provide enough
information to perform the query. The ODBC system requires additional
information, such as the name of the database in which the name-and-address
table resides, the name of the server computer, and the name of the ODBC
interface driver needed to interface to the specific database server.
To communicate this information, the ODBC interface provides an encapsulation
object, called a DSN, or Data Service
Name, which is maintained by the system as a Registry key, and its associated
entries in the Registry at HKEY_LOCAL_MACHINE/ SOFTWARE/ ODBC/ ODBC.INI/
dsnkey; where dsnkey is the
name of the DSN. (Use your Registry Editor to examine some DSNs, but be
careful not to make any changes to the Registry unless you know what you are
doing - standard warning) These entries usually identify the database name,
the server name, and the ODBC driver name. Depending on the type of database,
other information may be stored there as well.
While there are several ways to create a DSN, the easiest is by using the
ODBC Data Source Administrator tool at Start/Settings/Control Panel/Administrative
Tools/Data Sources (ODBC) . This tool is available in all 32-bit
Windows operating systems, as far as I know.
Many database servers require that a query is accompanied by a username and a
password, which the database administrator sets up beforehand, though not all
database interfaces require a username and a password.
Be that as it may, the Tags SQL Query implementation needs this additional
information to pass on to the ODBC interface. You provide the information to
Tags before the $#forEach SQL
command through specific Tags variables. These variables are named as
follows:
- $?dsn$$
- $?username$$
- $?password$$
The $?dsn$$ variable is always
required, but, depending on the specific ODBC interface, the $?username$$ and $?password$$ may not be required. For
example, generally they are required if you are querying a Microsoft SQL
Server or Oracle database, but are not likely to be required if you are
querying a FoxPro table.
As I mentioned earlier, the result of a successful query is a result-set, and
Tags provides each row in the result-set as a delimited string in the $?forEach$$ variable. To access the
specific columns (fields) in the row (record) contained in the $?forEach$$ variable, you can use the
subscripting feature as in the following example:
<Tags script="/Tags/text()">
$#set dsn(Tags-customer-dsn)
$#forEach SQL(select * from cust)
$?forEach{1}$$, $?forEach{2}$$, $?forEach{3}$$, $?forEach{4}$$, (and so on)
$#end
</Tags>
In this example, the code assumes that a DSN, called Tags-customer-dsn, exists in the
Registry.
Tags provides another way of identifying the columns of the row that does not
use the subscripting method. You can provide the column names as a string
record in a Tags variable, called columns . Tags not only places the column
values in the forEach variable, it
also places the values into variables named in the columns variable.
Here is an example where the programmer has set the columns variable:
<Tags script="/Tags/text()">
$#set dsn(Tags-customer-dsn)
$#set columns(,CustNo,Name,Street,City,Stat