Tags
A
Scripting Language for Text
- Tags
- Introduction
- How to Execute a Tags Script
- Some Basics
- More Samples
- Tags Referencers
- Tags Variables
- Properties of Variables
- Commands
- Conditional Commands
- Regular Expressions
- The $#forEach and the $#forElse
Commands
- Variables Associated with the
forEach Command
- Using the forEach File Interface
- Using the forEach Line/Lineb
Interface
- Using the forEach SQL Query
Interface
- The while Commands
- Additional Commands
- $#add name(value), $#adds
name(value), $#addt name(value), $#addu name(value)
- $#console (value)
- $#debug (value)
- $#defer name(value)
- $#drop (name)
- $#exec wait(command-line)
- $#get name(prompt)
- $#in name(file-name), $#inb
name(file-name), $#inp name(file-name)
- $#include (file-name)
- $#match
string(regular-expression)
- $#message (message), $#msg
(message)
- $#open wait(file-name)
- $#out name(file-name),
$#outa name(file-name)
- $#pause (message)
- $#play (wave-file-name)
- $#pop, $#push
- $#sleep (time)
- $#set name(value), $#text
name(value), $#xml name(value)
- $#stop (message)
- $#trace (value)
- $#translate
(arg-char-set,fun-char-set)
- $#vars (message)
- Examples:
- XPath functions available in
Tags
- The Line Output Management
Commands
- Errors
- Tags Download
Introduction
Tags is a scripting language for processing text. You can write
simple Tags scripts to process plain and delimited text. You can
also easily extract information from HTML and XML documents obtained
from websites, use ODBC to support SQL queries, and use simple
commands to manipulate folders and text files.
You can write a valid Tags script in a single line of text. But you
could also write a Tags script that spans many files to implement,
for example, a complex document and software generation library. I
know because I have.
Here is the traditional "Hello World" written as a Tags script:
<hello>
Hello World.
</hello>
A Tags script is always embedded within an XML document as text (a
text node, in XML language). A trivial Tags script is simply the
text contained in the document element of the script document - in
this case, the text within the hello element. Tags' default action
is to output the text it finds as it processes the script to the
standard output. If you replaced the "Hello World." text with the
text of a book, Tags would output the entire text of the book. But a
Tags script can do much more.
Here are some simple sample scripting snippets:
* Load the Top Stories RSS document from the CNN website
into a Tags variable, and then save it in a file.
<topstories script="/topstories/text()">
$#in topstories(http://rss.cnn.com/rss/cnn_topstories.rss)
$#out topstories(topstories.xml)
</topstories>
Tags commands are identified by the leading "$#" character sequence
(but there can be leading spaces as in the sample). The Tags
language also supports variables, and in this example, topstories is a Tags variable.
The $#in command reads the
document identified by the URL in the parentheses into the topstories variable. The $#out command writes the
contents of the topstories variable
to the file named topstories.xml.
* Read text lines from a file, and write them to the standard
output.
<copy script="/copy/text()">
$#forEach line(myfile.txt)
$?forEach$$
$#end
</copy>
The line-variant of the $#forEach command (there are
several other variants as you will see later) reads each text line
from myfile.txt, placing the text line in the forEach variable where you can
reference it in the part of the script between the $#forEach and the
following $#end commands. In the sample script, the line following
the $#forEach command
references the forEach variable.
Tags
variables
are referenced by preceeding the variable name with the "$?"
character sequence, and following the variable name with the "$$"
character sequence. (You can change the characters that Tags will
expect within the script using the marks attribute in the document element, but it's
probably not worth doing.) In this example, the $?forEach$$ reference causes
the contents of the forEach
variable to replace the variable reference and to be written to the
standard Tags output file.
* Select records from a database, and write them to the standard
output.
<select script="/select/text()">
$#set dsn(customerDSN)
$#set username(myusername)
$#set password(mypassword)
$#forEach SQL(select * from customer)
$?forEach$$
$#end
</select>
Tags uses the ODBC
interface to support the SQL
query. To run this script, you need a database table, called customer,
and you need to have defined a DSN
(Data Source Name), called customerDSN, to provide the
interface information to the ODBC driver. You must also preset the dsn variable to the DSN name.
You may also need to set the username
and password variables
if they are needed. The SQL-variant
of the $#forEach command
issues the SQL select statement to the ODBC driver, and then places
each resulting record in the forEach variable, where you can
reference it. In this sample, as in the previous sample, the $?forEach$$ reference causes
the contents of the forEach variable
to be written to the standard Tags output file.
The Tags scripting language supports XPath and regular expressions
to allow considerable scripting power. And its simple but
comprehensive command set is easy for anyone with scripting or
programming experience to learn and use. If you aren't familiar with
XPath, here
is a place to start. And if you don't know regular expressions, you
could start here.
How to
Execute a Tags Script
You can execute a Tags script from the command line, from within a
batch file, from a program, or from a WSH script (JavaScript or
VBScript). The Tags command line takes the following parameters:
- The name of the Tags script file to execute,
- Any parameters needed by the Tags script.
There are also several pre-defined flag-parameters that you can use:
-V
Plays the ok.wav file on success, and the error.wav file on failure, if
the files are available.
-X
Displays this manual in the default
browser, if both are available.
-Z
Saves variables to files when they
are loaded with the $#in
command (for debugging).
-n
(n is a number) Adjusts the time
Tags sleeps to share CPU cycles between commands. Not usually
required for short runs. Mostly useful for running a Tags script
in the "background.
Example:
> tags hello.xml -v >hello.txt
This command causes Tags to execute using one of the sample files
included in this release. On completion, it plays the ok.wav file if successful, or
the error.wav file if not
successful (assuming that the wav-files are present.)
Several sample scripts are included with the release.
When you install the Tags files by downloading and unzipping the tags.zip file from http://paul.medlock.com/tags.zip,
you should also add an environment variable, called tagsPath, and set it to contain
the path to the folder where you installed Tags.
Some
Basics
The elements, attributes, and text of a script file are wholly
determined by the application. Since you make up the element and
attribute names, along with the structure of the script file, to fit
your application, there is no DTD or schema that describes a valid
Tags script.
The text in the Tags script is free-form and can contain any
ordinary text and special characters except for the standard five
XML predefined characters:
- instead of "<", use <
- instead of ">", use >
- instead of "&", use &
- instead of "'" (apostrophe), use '
- instead of """ (quote), use "
If your text contains any of these characters, you may need to
convert them to the equivalent XML entity reference.
On the other hand, you can choose to embed your text in
CDATA-sections instead. You can use a CDATA-section anywhere you
could write text, and you can even mix them together, since Tags
treats CDATA-sections as if they were text. A CDATA-section begins
with the string "<![CDATA[" and ends with "]]>". Here is an
example:
<element><![CDATA[
put your <marked> up text & commands here
]]></element>
The text may also contain white-space: viz., spaces, tabs, and
new-lines. Since these characters are preserved in the text, you
will find that they will frequently appear in the output of your
script unless you control their use..
Here's a useful idea: If you
aren't using the CDATA option and you choose to convert the
special characters to entity-references when performing a
search-and-replace, be sure to replace the ampersands with
& first. Otherwise, you will never find the
ampersands later to fix them.
|
The examples in this manual may not use the XML entities when they
should so that they are easier to read. But don't forget that you
will have to deal with that issue before you can use your script in
Tags. The characters that Tags uses for markup were chosen so as not
to infringe on XML's markup.
Here is another useful idea:
You can check an XML document for being well-formed using
Internet Explorer 5+, Netscape 6+. Mozilla, Sea Monkey,
FireFox, etc; To use IE, for example, just drag the name of
the file you want to check onto the IE shortcut on your
desktop. IE will recognize the XML file name extension and
display the document. If the document contains an
error, your browser will report the line and column
numbers where the error was detected. Of course, if you use
a different file name extension, e.g., myscript.tags, the
browser may not recognize the file as XML.
|
XML is case-sensitive, and, consequently, XPath expressions are
case-sensitive. Tags is partly case-sensitive. Command names are
not, but variable names are.
A Tags script file must be a well-formed XML document. Usually the
bulk of the file is the text that you want in the output. Here is
the Hello.xml example again:
<hello script="/hello/text()">
Hello world.
</hello>
and you can run it with the command line
> tags hello.xml >hello.txt
The document element of a Tags script document should contain the script attribute, which
identifies to the Tags interpreter where the script is within the
document using an XPath expression. In the example, the value of the
script attribute is
"/hello/text()". This is an absolute XPath expression. It's a good
idea to always use an absolute XPath expression to locate the
script. The script attribute
is optional, but only if the Tags script is the sole occupant of the
document element, as in this case. We need the script attribute in more
complex script documents, since the script probably will not be in
such an obvious place, so you are probably better off by getting in
the habit of using it.
By default, Tags writes the text generated by the script to the
standard output file, but at least one of the sample scripts we have
already discussed demonstrates how to direct Tags output to other
files.
About those pesky whitespace characters. If you look carefully at
the contents of the output file from the Tags run above, you will
notice that there is a blank line, followed by the "Hello world."
line. This blank line resuts from the newline that follows the <Tags> element - the
"Hello world." line is on the next line down. You can remove that
extraneous line from the output in two ways. You could rewrite the
script as
<hello script="/hello/text()">Hello world.
</hello>
or you could use a join-command:
<hello script="/hello/text()">$\j
Hello world.
</hello>
The join-command ($\j) joins with the next line, and is one of several
special text output control commands. Another command is the
newline-command, which breaks lines, and is written as $\n. It
causes the text of the line that follows the command to be written
as the next line. In the following line of text, the newline-command
causes the one line to be output as two lines.
this is the first line$\nthis is the second line
There are other output control commands, but I'll explain them later in
the manual.
As we saw in the second sample script, you can redirect output to
files other than the standard output using the $#out command. Let's modify the hello.xml file by using the $#out command to redirect its
output to another file:
<hello script="/hello/text()">$/j
Hello world.
$#out (hello.txt)
</hello>
After you run this example, you will find the output of the script
in hello.txt. Note that
this version of the $#out
command does not identify a variable as the output source as did the
RSS document load sample in the first section of this manual. A
variable name is not needed because Tags can emit text to a default
variablet (its name is output,
if you want to reference it), and the $#out command in this example
is outputting the text from the default variable to the file. (note:
the $#out command flushes the default variable as a side effect. )
In most programming languages, the text information is usually
marked off from the other elements of the language with special
marks, such as quotation marks, etc., while the language commands
are not marked. In Tags, it's the other way around: text is written
simply as text. It is the special Tags commands that are marked.
There are two kinds of Tags symbols: commands and referencers.
Commands occupy a single line of text, and are identified by a
$-sign followed by a #-sign followed by the command name. Spaces are
not allowed to separate these three parts, but commands do not have
to start in the beginning of the line: there may be leading spaces.
Lines that begin with the "$#" identifier that are followed by a
space or do not have a recognized command name are considered
comments and are ignored.
You use referencers to
modify the outputs that your Tags script generates. You can
reference the text and attributes of the Tags script document, other
XML documents that you load, and variables whose values you set.
Referencers begin with a $-sign followed by an explanation-point
("!"), a question-mark ("?"), or a caret followed by an
expression of some kind, followed by two $-signs. Referencers can
appear pretty much anywhere within your text as you need them, but
they must be complete on the same line on which they start. On
the other hand, their resolved value may span as many lines as
desired. The file copy and the ODBC samples both used the $?forEach$$ variable
referencer.
Commands and referencers will be discussed in more detail in
subsequent sections, but here are some examples:
Tags commands:
$#out (myfile.txt)
$#text class(myclass)
$#if (true)
$#end
$#debug (on)
$#get objectname(Enter the name of the new object:)
$# this is a comment (because of the space after the $#-prefix)
Tags referencers:
$!/model/help/text()$$
prompt="$!@prompt$$"$\j
<map name="Action" value="$?line{1}$$" info="$?line{2}$$"/>
The effect that these commands and referencers might have on the
output of a script depends on the context in which they operate.
Different data at the locations specified by the referencer
expressions will result in different outputs. And, since there is no
difference between data and program in Tags, any referencer could
obtain text that contains commands and referencers that Tags would
also process in a recursive fashion. That's how Tags provides
something akin to the subroutine paradigm that programmers are
familiar with, though not exactly, since Tags does not provide a
specifically defined facility for passing parameters to
"subroutines".
More
Samples
Here are a couple of sample scripts of more complex activities you
can implement in a few Tags script lines:
* Query a database table, called customer, to obtain customer information, write
the information into a text file, and then display the results in
notepad. The script assumes that a DSN, called customerDSN,
has been created for the database table access. Check the link given
earlier for information about ODBC.
<db2text script="/db2text/text()">
$#set dsn(customerDSN) assumes that the DSN customerDSN was previously declared
$#set sqlcolumns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
$#forEach SQL(select * from customer) {
$?CustNo$$,$?Name$$,$?Street$$,$?City$$,$?State/Prov$$,$?ZipCode$$,$?Country$$,$?Phone$$
$#end }
$#out (cust.txt)
$#exec (notepad cust.txt)
</db2text>
If you provide the names you want to assign the columns in the
result records to the Tags interpreter using the sqlcolumns variable,
you can access the columns as variables by their name, which I do in
this example. In order to use ODBC, you must first have set up the
ODBC link for the specific database, as I described earlier. It is
beyond the scope of this manual to explain that, but you can get
more help by following this sequence of steps in Windows XP: Windows
Start -> Control Panel -> Administrative Tools -> Data
Sources (ODBC). Here
is a Google search link to a number of tutorials. Ok, now you
are on your own.
* Here is another script that builds on the earlier script to read
the same RSS document from the CNN news site, extract features from
the document to create an HTML document, and then display it using
your default internet browser. Note the use of the CDATA-section to
escape all the HTML tags.
<rss2html script="/rss2html/text()"><![CDATA[
$#in contextNode(http://rss.cnn.com/rss/cnn_topstories.rss)
<html>
<head>
<h2>$!/rss/channel/title/text()$$</h2>
</head>
<body>
$#forEach node($!/rss/channel/item$$) { list all the items in the feed
$#set contextNode($?forEach$$)
<h3>$!title/text()$$</h3>
<p>$!description/text()$$</p>
<p><a href="$!link/text()$$">Link</a></p>
$#end }
</body>
</html>
$#out ($?currentPath$$/topstories.htm)
$#open (file://$?currentPath$$/topstories.htm)
]]></rss2html>
This example obtains the Top Stories RSS document from CNN,
as in the earlier sample, and creates an HTML document using the
<item> objects in the document, writes the result to a file
called topstories.htm, and then opens the default browser to
display the file. Note that the URL in the $#in command must begin
with "http://" so that Tags
will know to look for the object on the web.
Tags
Referencers
Tags referencers may be coded virtually anywhere within the text of
the script, and have the form
$reftype symbol { subscriptor } $$
There are three reference types, distinguished by the
single-character reftype:
!
(exclamation-mark) indicates an XPath expression. In most
circumstances, Tags replaces the referencer with the value
obtained by evaluating the XPath expression. There is a modified
version of this, which allows to specify the context node within
the expression. That modification uses an "@" before the "!". The
samples below present an example of this.
? (question-mark)
indicates a variable reference. Tags replaces the referencer with
the value of the variable specified by the symbol. Variables are
discussed later.
^ (caret) indicates a stack pop. Tags replaces the referencer with
the value of the variable at the top of the stack specified by the
symbol. Stack variables are used primarily as a way to pass
parameters to sub-routines as well as a way to return results back
to sub-routine callers. Multiple parameters can be passed and
multiple results can be returned by using the $#push command and
the $#pop command or the $^pop$$ referencer.
Referencers may be used anywhere within the script where they make
sense. A referencer may also contain referencers, and may result in
text that contains other referencers, which are also resolved until
only unmarked text is left. As already mentioned, a referencer
cannot be split across two or more lines: it must lie wholly within
a single text line.
Examples:
$!//config/tag$$ |
an XPath expression reference
that identifies all the <tag> elements in all
<config> elements in an XML document. |
$?forEach$$ |
a reference to the Tags
variable that contains the local value within a $#forEach
statement. |
$?3$$ |
a reference to the third
parameter on the command line |
receiver->SetSource("$!@source$$"); |
an XPath referencer to the
source attribute embedded in some text in the script
document. (Note that the quotes are part of the output, not
part of the referencer.)
|
$?$?index$$$$
|
a reference to the variable
identified by the value of the referenced index variable (a
nested reference)
|
$#set x($!$x+1$$)
|
a Tags $#set command
using an XPath expression reference to increment the
variable x by one.
(Note that you can reference a Tags variable within an XPath
expression using only the $-leadin character as documented
in the XPath specification. Writing ($!$?x$$+1$$) would also
work, except that the Tags interpreter resolves the
reference instead of the XPath interpreter, so you may have
to place it in quotes if it resolves to a string constant.)
|
$@script!/myscript/mysubroutine$$
|
This special form of the
XPath referencer allows to specify a node to use as a
reference point when processing the XPath expression. The
example is using the script
variable, which is initialized by Tags to the
document element of the script document itself. If no
context node is specified, Tags uses the contents of the contextNode variable
as the reference point.
|
$# forEach
node($!/dep/mod[match(@name,
"$?forEach$$.[cC]")]/ref/@name$$)
|
This example demonstrates an
XPath referencer that contains a variable referencer.
|
$#set str($!lower-case("A STRING")$$)
|
This example demonstrates how
to use an XPath string function. Note that the string
parameter must be in quotes.
|
Subscriptors
When the type of a resolved referencer is a string, a list of
strings, or a nodeset (an XPath object), you can use an optional
trailing subscriptor to obtain a portion of the resolved referencer
value. A subscriptor is annotated as an open curly-brace, followed
by one number, or two numbers separated by a comma, followed by a
close curly-brace, and is appended to the end of the referencer
before the trailing dual markers (the $$ tail); eg., $?var{4,5}$$.
The subscriptor, itself, can also incorporate one or more
referencers, but they must resolve to one or two numbers (integer).
When subscripting a string, the first number (the index) can be
prefixed with a "C", "F", or a "W". The letter determines whether
the subscript is by character, field, or by word, respectively.
Subscriptors with a single numeric parameter {index}
The resolved value of the subscriptor has effect when it is either
positive or negative. If it is zero, the value of the resolved
subscripted referencer is left unchanged, i.e., it is not
subscripted.
If the resolved value of the subscripted referencer is a string, and
the index is preceded by a "c", as in {c5}, Tags treats the string
as a series of characters. The index references the beginning of the
character(s) to be extracted, beginning with one as the first
character in the string.
If the resolved value of the subscripted referencer is a string, and
the index is preceded by an "f", or nothing, as in {f5}, Tags treats
the string as a series of fields preceded
by a delimiting character. Any character can act as a delimiting
character, and it is (by definition) identified by virtue of being
the first character in the string. If the first character is a
comma, the field delimiting character is a comma. If the first
character is the letter "A", then the field delimiting character is
the letter "A". (Notice in the example below that the string is
prefixed with a comma to identify the comma as the delimiting
character.) In this manual, strings treated as a set of fields are
referred to as a delimited string,
or as a string record .
,Lincoln,Abraham,Springfield,Illinois
If the resolved value of the subscripted referencer is a string, and
the index is preceded by a "w", as in {w5}, Tags treats the string
as a series of words separated by a space. Processing the words in a
string follows the same rules as processing the fields in a string.
Note that punctuation is not removed, and if connected to a word
with no intervening space, it will be counted as part of the word.
If the value of the resolved referencer is a nodeset, Tags obtains
the node in the nodeset corresponding to the subscriptor value, ndx,
counting the first node as node one. I.e., the first node in a
nodeset variable is identified as $?nodeSet{1}$$, where nodeSet is
the name of the variable. If the ndx value is negative, the nodes
are counted from the last node, which is counted as -1.
If the value of the subscriptor is larger than the number of objects
(fields, strings, or nodes), then the value of the subscripted
referencer is empty. If the type of the resolved subscripted
referencer is not a string, string list, or a nodeset, the
subscriptor is ignored and the resolved value is not subscripted.
Subscriptors with two numeric parameters {index,length}
TBD: Note that, in all uses of subscriptors, a positive index is
counted left to right, where the first sub-entity is indexed as
one, when the index is negative, the index is counted right to
left, with the last entity (character, word, field, node, etc,)
being indexed by -1. This form is used only to provide a substring
function for strings, and currently has no implementation for any
other Tags type. A negative or zero length is treated as if it was
absent, and the effect of the subscriptor reverts to that of a
subscriptor with a single parameter..
Examples
of using the subscriptor notation:
$#text pres(,Lincoln,Abraham,Springfield,Illinois)
$# accessing the fields of a string as a string record
$#text city($?pres{3}$$) sets city to "Springfield"
$#text first($?pres{f-3}$$) sets first to "Abraham"
$# accessing the characters (substrings) of a string as a collection of characters
$#text last($?pres{c2,7}$$) sets last to "Lincoln"
$#text state($?pres{c-8,8}$$) sets state to "Illinois"
$#text comma($?pres{c1,1}$$) sets comma to the first comma
$#set record(,$!@xyz$$) note the leading comma
$#set field($?record{5}$$) sets field to the value of the fifth field in the xyz attribute
$#txt a(this tests tags substring stuff)
$#txt b($?a{f4}$$) yields "ags subs"
$#txt c($?a{5}$$) yields "ring s" (defaults to field)
$#txt d($?a{w2}$$) yields "tests"
$#txt e($?a{c12,4}$$) yields "tags"
Tags
Variables
Tags supports variables that can be referenced and assigned values.
Each variable has a name and a value. Unless it violates some other
Tags rule, any alphanumeric string can be a variable name. Values
may be of any Tags type (as described in the next section), or they
may be empty. Variable names are case-sensitive. You set the value
of a variable using one of several Tags commands, and you obtain the
value by using the $?varname$$
referencer form.
Tags provides several variables that contain information about the
processing environment of the script. For example, the command-line
parameters are available as variables whose names are the numbers
corresponding to the positions of the parameters that they contain.
For example, the first parameter is available in the variable
referenced as "$?1$$",
the second parameter is available in the "$?2$$" variable, and so on. In
the example command line given in the introduction, $?0$$ contains
"Tags", and $?1$$
contains "help.xml".
Tags also allows you to access the command-line flag-parameters
(annotated in the command-line using the form -letter{letter}). Examples of
command-line flag-parameters are -D, -C, -a, etc. Flag-parameters
are preserved as Tags variables having the letter as both their name
and their value. The names are always capitalized, regardless
whether the flag-parameter is or not. Variables named "$?a$$" and "$?A$$" are
different variables, and only the second could represent a
flag-parameter. The flag-parameter variables make it easy for the
user to communicate special conditions to the script. By the way,
notice that there is no provision for referencing numeric flag
parameters as Tags variables.
You can also reference an environment variable by appending its name
to "env.".
If you reference an environment variable, such as PATH, as a
variable (e.g., as in $?env.path$$), the value of the environment
variable is returned. Tags does not currently change the values of
environment variables, it only allows you to access their values in
your script. This might change.
Tags pre-defines a number of variables to provide a means of
communicating between the Tags interpreter and your Tags script.
Some of these variables are associated with specific Tags
commands. But there are several which have meaningful values for the
duration of the execution of a script. Following is a list of Tags
variables that have special meaning in the Tags language:
columns,
getColumns, lineColumns, SQLColumns, regXColumns
Used by various variants of the
forEach command to parse the forEach input into fields.
command
line flags
Command line flags are referenced
by their letter value using the notation $?x$$,
where x
is the actual upper-case letter value of the flag. Tags interprets
any command-line parameter that is immediately preceeded by either
a minus sign or a slash as a command line flag group. Each letter
in the group is a flag. Only letters can be used as flags in Tags.
The value of a flag variable is the name of the variable. For
example, if you code -AbC on the command line, Tags will create
three variables called $?A$$, $?B$$, and $?C$$, with respective
values of "A", "B", and "C".
command
line parameters
Command line parameters are
referenced by their position using the notation $?n$$,
where n
is the index of the parameter in question. The first parameter is
indexed as one. Parameters are always strings. Command line flags
as described above are not counted and are handled in their own
way.
contextNode
Used by XPath references to
identify the default root of an XPath search (string). Set by Tags
during initialization to reference the root element of your Tags
script. You set it according to need. Tags provides an enhanced
form for an XPath referencer expression that allows you to use any
variable as the context node for the expression. The form is
$@var!xpath$$. Note that
$@contextNode!expression$$
is the same as $!expression$$.
The variable should contain an XML node.
currentPath
Contains the absolute path to the
current directory. Tags sets this to the directory from which you
are running your Tags script.
date
and time
Tags provides date and time
information to your Tags script through several variables, which
are updated before the interpreter processes each script command.
$?time$$ (string - format
is hh:mm:ss), $?day$$
(number - day of the month), $?dayOfWeek$$
(string - name of the week day), $?dayOfYear$$ (number - Julian day), $?month$$ (number - month of
the year), $?monthName$$
(string - name of the month), and $?year$$ (number - all four digits).
dsn
ODBC data source name used by SQL
interface (string). You must set this before using the SQL-variant
of the $#forEach
command.
empty
Convenience variable set by Tags to
contain absolutely nothing. Use it to clear other variables to
empty as in $#set
var($?empty$$).
environment
variables
Variables whose name starts with "env." is interpreted as an
environment variable, and Tags will attempt to return the value of
the corresponding environment variable, if defined. Otherwise, the
value of the referencer is empty. Note that you cannot change the
value of an environment variable in a Tags script. Note that,
unlike other Tags variables, environment variable names are not
case sensitive. For example, reference the Path environment
variable as $?env.path$$.
error
Set by Tags as the result of the
$#exec and $#open commands. It contains the value returned by the
executed program.
file
variables
$?fileDrive$$
(drive:), $?fileName$$,
$?filePath$$ (path\), $?fileInfo$$. These variables
are set by the file-variant of the $#forEach command, which is described later in
this document.
HTTP
variables
TBD: $?HTTPHeaders$$ and
$?HTTPResponseHeaders$$.
grep
Set this with a regular expression
before using a $#forEach
command to provide a filter in selecting objects to present in the
$?forEach$$ variable. It
is not required for proper forEach operation, but it can improve
the performance of your script in many cases. Even when you set
the variable outside the $#forEach
loop, it appears empty inside the loop. But, once set, it retains
its value outside the loop. This means that, unless you change its
value, yourself, it will have the same value for two consequtive $#forEach loops, which might
not be what you want. So you should set it or clear it as needed
before each loop. $#forEach
variants that apply the grep variable are the Field, Line, Lineb,
Str, Strb, and the default variants. Regular expressions in Tags
are compatible with the rules of Perl 5, and are implemented using
the PCRE software.
last
Set by the $#forEach command to the
index of the last object in the object set being processed by the
command (number). The value is not known in some variants of the $#forEach command, and is set
to zero in those cases.
output
Container in which Tags collects
output text that is otherwise undirected, and is automatically
dumped to the standard output if not otherwise used. You can clear
the output using the $#set
output() command.
password
ODBC password used by the SQL
interface (string). Not all database accesses require this, but
when they do, you must set the value before using the SQL-variant
of the $#forEach
command.
position
Set by the $#forEach command to the
index of the current forEach value (number). The first object is
indexed as one.
regex
variables
After performing a $#match or regex version of
an $#if or $#ifn, a set of variables
contain the matching substrings. The $?regXCount$$ variable specifies the number of
matched substrings, and the $?regXi$$
variables contain the matched substrings; e.g., the third matched
substring is in the variable named $?regX3$$ while the original matched string is
in the variable named $?regX0$$.
TBD: $?regXColumns$$.
script
Set by Tags during initialization
to the root of the Tags script (XPath node). Use this to implement
subroutines by writing XPath expressions referencing other
elements within the same Tags script, as in the following example.
$@script!/myscript/mysubroutine/text()$$.
sqlcolumns
Set this before using the
SQL-version of the $#forEach
command to define the fields of the record set you expect to
obtain via your select statement. Alternately, set it to empty
before using the $#forEach
command to obtain the column names from the database as part of
the SELECT request. See also the ODBC example given earlier.
tab
Convenience variable set by Tags to
contain the tab character 0x09 (\t) for general use.
tagsPath
Set by Tags to the tagsPath environment
variable if present. Otherwise set to the path of the Tags
executable (string).
userName
ODBC user name by the SQL interface
(string). Not all database accesses require this, but when they
do, you must set the value before using the SQL-variant of the $#forEach command.
You should remember that, except for environment variables, your
script can set the value of any variable, and you can lose valuable
information by overwriting the values of certain variables. For
example, you will lose the value of the variable $?script$$ by
setting it to some other value. On the other hand, you may well
overwrite the value of $?contextNode$$ frequently when you are using
XPath expressions.
Properties
of Variables
When a Tags variable is defined, it has a value, and it also has
three additional properties that can be ascertained using the $?#,
$?%, and $?? prefixes. Note that these are the regular $? prefix
with an additional #, % and ? appended, respectively.
Number
of Fields
Use the $?# prefix to obtain the number of fields in the contents of
the specified variable. Note that this value only makes sense when
the variable contains a string.
Length
Use the $?% prefix to obtain the length of the contents of a
specified variable. For a string, it returns the number of
characters. For a string list such as output, for example, it returns the number of
lines (strings), and for a node list, it returns the number of
nodes.
Type
Use the $?? prefix to obtain the type of the contents of the
specified variable. There are a number of types that Tags values
might have. Here is a list of the types along with the meaning of
the length property for that type in parentheses.
- string (number of characters)
- number (number of digits)
- string_list (number of strings)
- node_list (number of nodes)
- element_node (1)
- attribute_node (1)
- text_node (1)
- cdata_section_node (1)
- entity_reference_node (1)
- entity_node (1)
- processing_instruction_node(1)
- comment_node (1)
- document_node(1)
- document_fragment_node (1)
- notation_node (1)
Value types must be compatible with the context. An XPath node or
XPath nodeset value resulting from the resolution of a Tags
referencer discovered in ordinary text is converted to text, or may
be an error. String type values are acceptable everywhere. When
XPath expressions obtain boolean or numeric values, Tags converts
them to strings.
$#set x( this is a string)
$#set t($??x$$) t is set to "string"
$#set f($?#x$$) f is set to 4
$#set i($?%x$$) i is set to 17
After the four $#set
commands are processed, x contains " this is a string", t contains
"string", f contains the number of fields in x, which is four, and i
contains the length of x, which is 17. If x was set to a node list,
its length is taken as the number of nodes in the list and the
number of fields is set to zero. If x was set to a string list, its
length is defined as the number of strings in the list. And so on.
Actions
Two additional marks can be used to tidy up the resolved value of
a string-type variable reference: The first is the trim action (-)
the minus-sign, which removes leading and trailing spaces, if any,
and the second is the trim-and-capitalize action (+) the
plus-sign, which trims and capitalizes the first letter of the
resolved value.
$#set x( this is a string )
$#set trimmed($?-x$$) trimmed is set to "this is a string"
$#set capped($?+x$$) capped is set to "This is a string"
Commands
Here are some general comments about Tags commands.
A Tags command may be coded virtually anywhere within the text of
the script, but must be the sole occupant of the text line. Tags
commands have the following form:
$#commandName argument1 (argument2 ) commentable area to the end of the line
Unlike variable names, the commandName
is not case-sensitive. While all commands have a commandName , not all commands
have argument1 and argument2, and no command has
argument1 without having argument2. In all commands that
have argument2, the
parentheses are required.
In most cases where it is used, argument1
is processed differently than argument2
, Argument1 is
usually resolved to a string, while argument2 is resolved only as far as needed. On
the other hand, argument2 can
resolve to a nodelist, or a SQL result set in the forEach command,
for example. This should be fairly intuitive in each case. (yeah
right - I'll try to clarify this more as I work more on the manual.)
When Tags parses a command, it must be able to isolate the two
arguments. This can occasionally conflict with the characters that
the two arguments must use. Specifically, Tags uses the following
characters to parse a command:
- quote ("""),
- apostrophe ("'"),
- open parenthesis ("("), and
- close parenthesis (")")
If these characters are paired within the command arguments, then
Tags should have no trouble. But if they are not paired, Tags will
fail to understand the command. You can help Tags out by "hiding"
unmatched characters by immediately preceeding the characters with
the backward-apostrophe (`) up by the tilde (~). (By the way, it is
harmless, though unnecessary, to hide any character in a command
argument in this way.)
Here is an example:
$#match $?s$$(.*() will fail to parse, but
$#match $?s$$(.*`() will work fine
There are three basic categories of commands:
- Conditional commands
- The forEach command
- Additional commands
Conditional commands
perform the same function they do in any scripting or programming
language, they let the script make decisions, and vary its behaviour
according to the conditions it encounters.
The forEach command
provides the ability to repeat specified functionality over a set of
objects, such as nodes in a nodeset, text lines in a file, inputs
from a user, fields in a text record, etc.
A number of commands that I don't categorize further fall into the additional commands group.
These include several debugging commands, an output director
command, several variable setters and a variable loader, an include
command, and a number of others. A bit of a hodge-podge.
Conditional
Commands
Tags provides a set of commands that conditionally control the
inclusion or exclusion of text and/or other commands.
$#if
(expression)
Is false if the expression
evaluates to false , and is true otherwise.
$#ifn
(expression)
Is true if the expression evaluates
to either empty, to the value zero (0), or to the string "false"
(case ignored), and is false otherwise.
$#elif
(expression)
Is false if the expression
evaluates to empty, to the value zero (0), or to the string
"false" (case ignored), or if a previous conditional command was
true, and is true otherwise.
$#elifn
(expression)
Is false if the expression
evaluates to non-empty, is not the value zero (0) and is not the
string "false" (case ignored), or if a previous conditional
command was true. Is true otherwise.
$#else
Is false if a previous conditional
command was true, and is true otherwise.
$#end
Required to terminate a conditional
command sequence. Also required to terminate a forEach command,
discussed below.
Expressions must resolve to strings to be properly evaluated. Tags
automatically converts XPath boolean and numeric results into
strings, so boolean true
and false are converted to
their string equivalents. XPath and variable expression results that
are nodes or nodesets are converted into strings before they are
evaluated according to these rules.
These expression values are recognized as false:
- the value is empty
- the value is zero (0)
- the value is "false"
- the value is "off"
- the value is "no".
All other values are taken as true
.
Examples:
$#if ($!$?position$$ = $?last$$$$)
"$!text()$$",
$#else
"$!text()$$"
$#end
This example shows an $#if-command,
which
might be coded within a $#forEach
loop, and is a test to determine if the last object is being
processed to decide whether to terminate the line with a comma. The
$#forEach command is
explained in some detail below.
$#if ($?A$$)
do something big deal here...
$#end
The second example tests to determine if the command-line flag A is
present by testing if the variable, named "A", contains a value
other than empty.
Additional commands that depend on boolean values also evaluate
expressions according to the same rules as the conditional commands.
Regular
Expressions
Scripting in Tags sometimes requires the need for regular
expressions. Four of the conditional commands have additional forms
that support the use of regular expressions in decision making.
$#if string(regular-expression)
Is true if the string matches
the regular-expression, and is false otherwise. If true,
subsequent $#elif and $#elifn statements are
ignored.
$#ifn string(regular-expression)
Is true if the string does
not match the regular-expression, and is false otherwise.
If true, subsequent $#elif and $#elifn statements
are ignored.
$#elif string(regular-expression)
If evaluated, is true if the string
matches the regular-expression, and is false
otherwise. If evaluated and true, subsequent $#elif and $#elifn
statements are ignored.
$#elifn
string(regular-expression)
If evaluated, is true if the string
does not match the regular-expression, and is false
otherwise. If evaluated and true, subsequent $#elif and $#elifn
statements are ignored.
Each conditional command matches the regular-expression with the string. (Note that it MUST be a
string. Anything else will fail.) If the regular-expression matches the string, and it contains
sub-match expressions (i.e., expressions coded within parentheses in
the regular expression), Tags sets variables to the matched portions
of the string. These variables have names that correspond to
the positions of the sub-match expressions within the
regular-expression. The sub-match variable names have the form $?regXi$$, wherei is the index of the sub-match
expression that corresponds to the variables.
Here is an example:
$#if $?date$$(^([0-9]*)-([0-9]*)-([0-9]*) ([0-9]*):([0-9]*):([0-9]*))
$#set date($?regX2$$/$?regX3$$/$?regX1$$ $?regX4$$:$?regX5$$:$?regX6$$)
$#end
This fragment reformats a date from y-m-d h:m:s to m/d/y
h:m:s. (Just a reminder: Note that the parentheses are all
paired in this example, so that Tags can find the beginning of the
expression by matching the pairs. If the parentheses do not match,
you must use the back-quote character (`) to escape the unmatched
parentheses.) If the value of the date variable is "2005-12-09
14:21:15", then the match generates the following six sub-match
variables:
$?regX1$$ = 2005
$?regX2$$ = 12
$?regX3$$ = 09
$?regX4$$ = 14
$?regX5$$ = 21
$?regX6$$ = 15
One additional variable, called $?regXCount$$
, contains the number of sub-matched expressions. In the example,
its value is six.
If a conditional command evaluates to false, the $?regXCount$$
variable is set to zero. If a conditional command results in fewer
sub-match variables than the last match, only the variables for the
sub-matches of the latest match survive. Sub-match variables are not
managed in any other way.
This is not an explanation for regular expressions. You can find out
more by following this link.
See also the $#match command, which is described later.
The $#forEach
and the $#forElse Commands
$#forEach
type(argument)
Processes the commands and text that fall between the $#forEach command and its
matching $#end command
once for each object identified in the $#forEach argument. For each object, the variable,
$?forEach$$, is set to
contain the object to allow the text within the loop to reference
the object, while the $?position$$
variable is set to its index. Note that Tags handles $#forEach nesting so that the $?forEach$$ variable is
maintained according to its context.
You can use the $?forEach
command in one of two ways as shown below:
$#forEach type(argument)
...commands...
$#end
or
$#forEach type(argument)
...commands...
$#forElse
...commands...
$#end
The $?forElse command
permits to perform commands if the $#forEach argument comes up
empty-handed.
While processing the $#forEach
loop, the variable $?last$$
is set to the index of the last object to be processed in the loop .
The type can be either empty or it can be one of the types listed
below. If empty, the $#forEach
argument should resolve into zero or more text lines (each line is
taken as a$#forEach
object.) If it is not a list of strings, it is converted to a list
of strings. If, for example, it is a list of nodes, each node is
converted into a string according to the rules of XPath.
Char
If the type field specifies "char", then the $#forEach argument should
resolve to a string. The $#forEach
logic performs the loop once for each character in the argument,
setting the $?forEach$$ variable to each character in turn.
Count
If the type field specifies "count", then the $#forEach argument must
resolve into a number. The $#forEach
logic performs the loop once for each value from one to the
argument value, incrementing by one for each pass.
Field
If the type field specifies "field", then the $#forEach argument should
resolve to a string record, with its first character identifying
the field separator character. The $#forEach logic
loops for each field in the argument, setting the $?forEach$$ variable to each
field in turn.
File
If the type field specifies "file", then the $#forEach argument should
resolve to a string record having the form, |directory|mask|type, where directory is the path to the
directory of interest, mask is
a filename expression, and type
may be any "sum" of dir,
tree, data, or any. Combine them using the
plus-sign (+). The mask expression can use the plus-sign (+) and
the minus-sign (-) to include or exclude ambiguous or absolute
file names. E.G., *.cpp+*.h-s*
includes cpp files and header files except those that start with
the letter s. The $?forEach$$
variable contains the full pathname of each file that the forEach
command finds per each iteration..
Get
If the type field specifies "get",
then the $#forEach
argument should resolve to a prompt string that is displayed in
the console window. User input is accepted, and when the user
presses the Enter-key, the $#forEach
loop is performed. During the pass, the user response is available
in the$?forEach$$
variable. The $#forEach
loop is terminated when the user presses the Esc-key.
Line,
Lineb
If the type field specifies "line" or "lineb", then the $#forEach argument resolve to
a file name. The $?forEach$$
variable contains the text of each consequtive text line in the
specified file. Because of the relative complexity of this $#forEach option, it is
discussed in more detail under its own heading below.
Node
If the type field specifies "Node",
then the $#forEach
argument must resolve to a node list, and the $#forEach loop is performed
once for each node in the node list. The $?forEach$$ variable will contain each node in
turn.
SQL
If the type field specifies "SQL", then the $#forEach argument must
resolve to a SQL query, which is performed against the DSN named
in the $?dsn$$ variable.
The $#forEach loop is
performed once for each row in the result set of the query, with
the $?forEach$$ variable
containing each row in turn. Because of the relative complexity of
this $#forEach option,
it is discussed in more detail under its own heading below.
Str,
Strb
If the type field specifies "str"
or "strb", then the
$#forEach argument should
resolve to the name of a list of strings. The $#forEach logic performs the
loop once for each string in the argument, setting the $?forEach$$
variable to each string in turn. If the type field specifies"strb", empty strings are
ignored. The str specification
differs from the line specification
by
expecting a list argument instead of a file argument. Except for
that difference, the discussion below about lines applies equally
to the string specification.
Word
If the type field specifies "word", then the $#forEach argument should
resolve to a string. The $#forEach
logic performs the loop once for each word in the argument,
setting the $?forEach$$ variable to each word in turn. Note that
special characters that might trail a word are trimmed.
XML
If the type field specifies "XML",
then the $#forEach
argument must resolve to a file containing a list of one or more
well-formed XML documents. The $#forEach
loop is performed once for each XML document in the file, with the
type of the $?forEach$$
variable being "document_node". The $?forEach$$ variable can be accessed using XPath
expressions.
Variables
Associated with the forEach Command
These variables have a special relationship with the $#forEach
command. As the command initializes, it saves the value of the
variables, and restores their values at the end of the loop. Note
that some variables are inputs to the $#forEach command while others
are output by the $#forEach command. And at least one (SQLColumns)
can have a value going in and/or have a value coming out.
columns,
getColumns, lineColumns, SQLColumns
Set these variables to cause the $#forEach
command to parse the value of the forEach variable into a
set of variables containing its fields. If the columns variable
is not empty, the parse is applied whenever the forEach variable
is a string. This can happen for the input-type, the SQL-type,
and for the default-type of the $#forEach command. For all
types, the format of the .columnsvariable can have one of
the following two forms:
1. ,name1,name2,...,nameN
2. ,name1{size1},name2{size2},...,nameN{sizeN}
Use the first form when the forEach value is a string
record, and use the second form if the forEach value is a
record comprised of a set of fixed-length fields. If a name is
omitted, the field is skipped and no variable is created for that
field. While the forms shown above use the comma as the field
delimiter, any special character is acceptable.
In the first form, if the forEach value is not a proper
string record, i.e., does not start with a non-alphanumeric
character, the field delimiter of the columns variable is
assumed to be appropriate for the forEach value as well.
columns is used by the
Str/Strb and the anonymous types of forEach.
getColumns is used by the
Get type,
lineColumns is used by
the Line/Lineb type.
SQLColumns is used by the
SQL type. If you do not preset it (i.e., you leave/set it empty)
before issuing a $#forEach SQL command, it will be filled with the
names of the columns selected by the forEach command.
forEach
Variable set by the $#forEach command to contain
each object, in turn, that is contained in the forEach argument.
For example, if the forEach argument is a nodeset, then the $?forEach$$ variable will
contain a node. When Tags begins, it initializes $?forEach$$ to reference the
script text.
position
Variable set to the index of the
current object processed by the $#forEach
command. (the position of the first object is one, the second
object is two, etc.) When Tags begins, it initializes $?position$$ to zero.
last
Variable set to the index of the
last object processed by the $#forEach
command. This variable is not valid during an input -type or
SQL-type $#forEach loop. When Tags begins, it initializes $?last$$ to zero.
contextNode
Unless you use the $@var!xpathExpression$$ form,
you must set this variable before using any XPath expression to
search an XML document. When Tags begins, It initializes $?contextNode$$ to reference
the script document node.
Example:
Here is an example using some of the variables provided by the
$#forEach command:
$#forEach ($!//event$$)
$#set contextNode($?forEach$$)
$#if ($!$?position$$ =$?last$$$$)
"$!text()$$",
$#else
"$!@name$$"
$#end
$#end
$# At this point, after the above forEach command is
$# processed, the value of both the forEach and
$# the contextNode variables revert to the values held before
$# the forEach command was encountered.
Here, the XPath expression "@name
" is to be applied to each of the <event>
elements in the script document. In this example, the script writer
has set the $?contextNode$$ variable
to let Tags know where to look for the text() and name="" attribute by setting the $?contextNode$$ variable to
contain the current <event>
element object. Note that the $?contextNode$$
variable is not set automatically.
The values of the $?forEach$$
,$?position$$, $?last$$, and $?contextNode$$ variables are
saved before processing a $#forEach
loop, and, at the completion of the $#forEach$$ loop, are reset to their saved values.
Note that while you generally would not $#set the $?forEach$$,
$?position$$ , and $?last$$ variables, you should
$#set the $?contextNode$$ variable to
control the context of your XPath search expressions within the
$#forEach context.
Another
example:
Assuming that the following Tags script is stored in a file, called
letter.xml, it can be
processed with the following command line:
> tags letter.xml >letter.txt
Tags script in the file, letter.xml:
<letter script="/letter/body/text()">
<body>
$#!/letter/data/salute/text()$$$\j
$!/letter/data/firstname/text()$$$\j
$!/letter/data/lastname/text()$$
$!/letter/data/street/text()$$
$!/letter/data/city/text()$$,$\j
$!/letter/data/state/text()$$$\j
Dear $/letter/data/salute/text()$$:
I am looking for fresh wood for my sawmill. I am especially
looking for Eastern hardwoods. Do you have any on hand? I will
be happy to remove it and pay you a fair price for the opportunity.
Sincerely,
Paul B.
</body>
<data>
<salute>Mr</salute>
<firstname>George</firstname>
<lastname>Washington</lastname>
<street>123 Cherry Lane</street>
<city>Mt Vernon</city>
<state>Virginia</state>
</data>
</letter>
There are several variables associated with the SQL Query interface,
which are discussed in the next section.
Using
the forEach File Interface
The form of the forEach argument is
|directoryName|fileMask|searchType
The directory name can be any ambiguous or non-ambiguous path given
the value of the $?currentPath$$
variable. The file mask can be a logical expression comprised of
ambiguous and non-ambiguous file names concatenated with either the
plus sign (implements union) or the minus sign (implements
difference). The valid searchTypes can be one from the set { root | tree } and one from
the set { data | dir | any } where
the
defaults
are root and data.
The variables that the $#forEach
command sets are
$?fileInfo$$ is a string
record having the form
|fileName|createDate|createTime|createSecs|modificationDate|modificationTime|modificationSecs|size|"dir"
or
"data"
$?fileDrive$$ is the drive
letter followed by a colon,
$?filePath$$ is the path
followed by a forward-slash, and
$?fileName$$ is the file
name and extension, if any.
Note that file paths can use the forward-slash or backward-slash.
Using the
forEach Line/Lineb Interface
The Tags interpreter opens the specified file, and then performs the
loop once for each text object (line) it finds in the file. The$?position$$ variable is
incremented to reflect which object is being processed. Since the
number of objects within the file is not known during the loop, the
$?last$$ variable is not
valid.
You can use the Lineb variant to ignore blank text lines.
There are two kinds of text objects that Tags recognizes: XML
elements, and simple lines of text terminated by either a newline
or a return, or both, in any combination.
If the first non-whitespace character in a line is a "<", then
the object is assumed to be a valid XML element. The Tags
interpreter locates the end-tag for the element, and then loads
the element into a DOM and stores its reference in $?forEach$$.
If Tags is unable to load the document into the DOM, then Tags
quits with prejudice.
If the first non-whitespace character is not a "<", then the
text line is read and loaded into $?forEach$$. Tags can
handle text lines as long as 4095 characters. Any longer than
that, and Tags terminates. This variant of the $#forEach
command provides the ability to convert each text line into a set
of variables through the use of the columns variable,
which is discussed in the next section.
When all objects in the file have been processed, the file is
closed.
Example:
The file input.txt
contains a list of numbers followed by names that are associated
with the numbers. Here are a few lines from that file:
0,UNKNOWN
1,CREATE TABLE
2,INSERT
3,SELECT
Suppose the problem is to reformat each line so that the output
looks like this:
<map value="0" name="UNKNOWN"/>
<map value="1" name="CREATE TABLE"/>
<map value="2" name="INSERT"/>
<map value="3" name="SELECT"/>
The reform script in the file input.xml
uses the input type of
the $#forEach command to
accomplish this:
<reform><![CDATA[$\j
$#forEach line(input.txt)
$#text line(,$?forEach$$)
<map value="$?line{1}$$" name="$?line{2}$$">
$#end
]]></reform>
This example uses subscripting to obtain the individual fields in
each line. Notice the comma in the $#text command. The comma is combined with the
text line to form a value of, for example, ",1,CREATE TABLE ", which is stored in the line variable. The leading
comma informs the Tags parser that the fields are separated by
commas.
These files are included in this release. Use the following command
line to run this example:
> Tags input.xml >map.xml
Using
the forEach SQL Query Interface
Tags provides a SQL query interface through the SQL variant of the $#forEach command. For example,
assuming that there is an accessable dataset, called
name-and-address, on your computer, the following $#forEach command implements a
simple query to that table:
$#forEach SQL(select
name, street, city, state, zipcode from name-and-address)
..etc
$#end
Note that the SQL argument is simply a SQL Select statement.
Generally, the result of a SQL query is what is called a result-set: a set of rows
(records) that satisfy the query. Tags repeats the forEach loop once
for each row in the result-set, setting the $?forEach$$ variable to
each row in the result-set, in turn.
By itself, the $#forEach command given above does not provide enough
information to perform the query. The ODBC system requires
additional information, such as the name of the database in which
the name-and-address table resides, the name of the server computer,
and the name of the ODBC interface driver needed to interface to the
specific database server.
To communicate this information, the ODBC interface provides an
encapsulation object, called a DSN,
or Data Service Name, which is maintained by the system as a
Registry key, and its associated entries in the Registry at HKEY_LOCAL_MACHINE/ SOFTWARE/ ODBC/
ODBC.INI/ dsnkey; where dsnkey
is the name of the DSN. (Use your Registry Editor to examine some
DSNs, but be careful not to make any changes to the Registry unless
you know what you are doing - standard warning) These entries
usually identify the database name, the server name, and the ODBC
driver name. Depending on the type of database, other information
may be stored there as well.
While there are several ways to create a DSN, the easiest is by
using the ODBC Data Source Administrator tool at Start/Settings/Control
Panel/Administrative Tools/Data Sources (ODBC) . This tool
is available in all 32-bit and 64-bit Windows operating systems, as
far as I know.
Many database servers require that a query is accompanied by a
username and a password, which the database administrator sets up
beforehand, though not all database interfaces require a username
and a password.
Be that as it may, the Tags SQL Query implementation needs this
additional information to pass on to the ODBC interface. You provide
the information to Tags before the $#forEach SQL command through specific Tags
variables. These variables are named as follows:
- $?dsn$$
- $?username$$
- $?password$$
The $?dsn$$ variable is
always required, but, depending on the specific ODBC interface, the
$?username$$ and $?password$$ may not be
required. For example, generally they are required if you are
querying a Microsoft SQL Server, MySQL or Oracle database, but are
not likely to be required if you are querying a FoxPro table.
As I mentioned earlier, the result of a successful query is a
result-set, and Tags presents each row in the result-set as a
delimited string in the $?forEach$$
variable. To access the specific columns (fields) in the row
(record) contained in the $?forEach$$
variable, you can use the subscripting feature as in the following
example:
<Tags script="/Tags/text()">
$#set dsn(Tags-customer-dsn)
$#forEach SQL(select * from cust)
$?forEach{1}$$, $?forEach{2}$$, $?forEach{3}$$, $?forEach{4}$$, (and so on)
$#end
</Tags>
In this example, the code assumes that a DSN, called Tags-customer-dsn, exists in
the Registry.
Tags provides another way of identifying the columns of the row that
does not use the subscripting method. You can provide the column
names as a string record in a Tags variable, called SQLColumns . Tags not only
places the column values in the forEach
variable, it also places the values into variables named in the SQLColumns
variable. Here is an example where the programmer has set the SQLColumns variable:
<Tags script="/Tags/text()">
$#set dsn(Tags-customer-dsn)
$#set SQLColumns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
$#forEach SQL(select * from cust)
$?CustNo$$, $?Name$$, $?Street$$, $?City$$, $?State/Prov$$, $?ZipCode$$, $?Country$$, $?Phone$$
$#end
</Tags>
And, if you pre-clear the SQLColumns variable, Tags will
obtain the column names from the SQL interface, and store them as a
string record in the SQLColumns variable. If the SQL
interface has no name for the column, Tags substitutes a fill-in
name of "SQLCol1" for the first missing name, "SQLCol2" for the
second, and so on. You can always set the SQLColumns variable
to empty ($#set SQLColumns()), and then access it within the $#forEach
loop to find out the names, yourself. This behaviour is somewhat
different than that described for the SQLColumns variable
in the earlier section describing the variables associated with the
$#forEach command.
The
while Commands
while
(expression), while string(regularexpression)
Repeats the Tags script between the
$#while command and the
terminating $#end
command while the expression (or regular expression) is true.
whilen
Repeats the Tags script between the
$#whilen command and the
terminating $#end
command while the expression (or regular expression) is not true.
last
(expression), last string(regularexpression)
Terminates a $#while or $#forEach loop if the
expression (or regular expression) is true.
lastn
(expression), lastn string(regularexpression)
Terminates a $#while or $#forEach loop if the
expression (or regular expression) is false.
next
(expression), next string(regularexpression)
Skips the remainder of the Tags
script before the terminating $#end
command if the expression (or regular expression) is true.
nextn
(expression), nextn string(regularexpression)
Skips the remainder of the Tags
script before the terminating $#end
command if the expression (or regular expression) is false.
Additional
Commands
$#add
name(value), $#adds name(value), $#addt name(value), $#addu
name(value)
Inserts/Appends a string or string
list to the named string list. If the specified variable is not a
string list, it is converted to one before the value of the
expression is appended to it. The suffixes -s, -t, and -u allow to
insert the value in its sorted position (s), at the top of the
list (t), or inserted uniquely in its sorted position (u). In the
latter case, the value is added only if that value is not already
in the list. If a string list is being inserted as sorted or
unique, each item in the string list is inserted according to the
add suffix. Otherwise, the entire list is inserted as a group. If
the resolved value is empty, no add takes place.
$#console
(value)
Opens the console window if the value is true,
closes the console window if the value is false.
This may not be available in some versions of Tags.
$#debug
(value)
Activates or deactivates the debug
facilities of Tags according to the specified value . If the value is false, debugging is turned
off. Otherwise, debugging is turned on. Use this script command
instead of the -Z command-line flag for debugging a short
section of script.
When debugging is on, the contents of a variable, when being
loaded by the $#in
command, is written to a file. The name of the file is set to the
name of the variable followed by ".dbg".
Also, the XML element or text line read by each iteration of the
input type of $#forEach
command is written to a text file, called "nextelement.dbg " or "nextline.dbg", respectively.
Use the $#pause command
to examine these files between iterations.
$#defer
name(value)
Sets the variable specified by name to the unresolved form
of the specified value.
E.g., if the value is
a string containing a referencer, the string is stored in the
variable without resolving the referencer. If the variable is
subsequently referenced, its contents will be resolved at that
time.
$#drop
(name)
Removes the variable specified by
name from the system, releasing any resources the variable may
own, such as its contents. You don't usually need to use this
command, since the system manages its resources automatically, but
it can be used to improve memory usage when variables containing
very large files are no longer needed. This might only be an issue
in advanced circumstances.
$#exec
wait(command-line)
Sends the specified command-line to the operating
system for execution. If the word "wait" is present, Tags waits
for its completion. otherwise it does not.
$#get
name(prompt)
Asks the user to enter a value to
assign to the variable having the specified name . If the console window
was not specified from the command-line (using the -W
flag-parameter), nor by the $#console
command, the console window is opened, the prompt is displayed, and the
user may enter a response, followed by the Enter key. The console
window is left open.
$#in
name(file-name), $#inb name(file-name), $#inp name(file-name)
If the file-name is a URL (starts with "http://"), Tags
loads the document from the internet. Otherwise, it loads the
local document identified by the file-name into the variable having the specified
name. If the file-name does not specifiy a
path, the TagsPath
environment variable is used to determine the directories to
search for the file, if the variable is present. Otherwise, the
path of the Tags command you specified in the command line is
used. If you did not specify a path, then Tags will look for the
file in the current directory.
If the file extension is "htm" or "html", then the file is assumed
to be an HTML file. Otherwise, if the first non-blank character in
the file is a "<", then the file is assumed to be a well-formed
XML document. If the document is determined to be an HTML
document, then it is "tidied up" to make it well-formed in the XML
sense before it is loaded into a DOM, since the DOM can only
handled well-formed documents. An XML document is loaded into a
DOM straight away.
At the completion of the $#in
command, the named variable will contain the document node
representing the parsed document. If Tags is unable to load the
document into a DOM, then Tags quits with prejudice.
If the first non-blank character is not a "<", then the file is
assumed to be a simple text (or binary) file and it is loaded into
the named variable as simple text. If the file is binary, you
cannot manipulate it with Tags.
Note: if the XML file is local, and if it contains $#include commands, these are
resolved as explained in the $#include
description above before the document is parsed.
the $#inp form uses the contents of the $?HTTPHeaders$$ variable
to send an HTTP POST request to the specified URL. You must set up
the contents of the variable before using this command. (Hmmm...
Seems to me I overlooked the data part of the Post. Need to check
that out.)
$#include
(file-name)
Tags processes the $#include
command as a script or XML data object is loaded. Immediately
after loading the script document specified as the first
command-line parameter, Tags processes all $#include commands embedded
in the document. Because of this, the file-name can only reference variables that
contain command-line parameters.
Be sure that included files do not affect the well-formedness of
the XML document when they are inserted into the script document
at the include points. After all includes have been performed, the
document must still be well-formed XML.
****************************************************************************************
Included files can also contain $#include
commands. Be careful about circular references: If some file, say
file-A, includes another file, say file-B, and file-B includes
file-A, you have an infinite loop. They are not detected by Tags,
and will cause the program to run until it fills up memory and
crashes. If the full include file path is not specified, Tags
looks in the directory where the script file was found.
$#include commands are
also detected and processed by the $#in command, but are not handled by the
$#forEach command.
$#match
string(regular-expression)
Matches the
regular-expression with the
string . If the
regular-expression matches
the
string, and it
contains sub-match expressions (i.e., expressions within
parentheses), Tags sets variables to the matched portions of the
string. These variables have names that correspond to the
positions of the sub-match expressions within the
regular-expression. The sub-match variable names have the form
$?regXi$$, where
i is the index of the
sub-match expression that corresponds to the variables.
Here is an example
$#set string(2005-12-09 14:21:15)
$#set expression(^([0-9]*)-([0-9]*)-([0-9]*) ([0-9]*):([0-9]*):([0-9]*))
$#match $?string$$($?expression$$)
This match generates six sub-match variables:
$?regX1$$ = 2005
$?regX2$$ = 12
$?regX3$$ = 09
$?regX4$$ = 14
$?regX5$$ = 21
$?regX6$$ = 15
One additional variable, called
$?regXCount$$
, contains the number of sub-matched expressions. In the example,
its value is six.
If a
$#match command
fails, the
$?regXCount$$
variable is set to zero. If a
$#match
command results in fewer sub-match variables than the last
previous
$#match
command, only the variables for the sub-matches of the latest
$#match command are valid.
Sub-match variables are not managed in any other way.
This is not an explanation for regular expressions. You can find
out more by following this
link.
$#message
(message), $#msg (message)
TBD
$#open
wait(file-name)
Obtains the name of the program
that is set by the operating system to open the type of the
specified file-name, and
then executes that program passing the file-name as the sole parameter. If the
word "wait" is present, Tags waits for its completion. otherwise
it does not.
$#out
name(file-name), $#outa name(file-name)
Writes/Appends the contents of the
named variable to the file to the specified file-name. Line management
commands are processed at this time (line management commands are
described in a later section of this manual). If a Tags script
contains no $#out
command, the accumulated results of the Tags script are written to
the file Tags.txt at the
end of the run. If the variable is not identified (no name
specified), then the accumulated results of the processing of the
Tags script are written/appended to the specified file.
$#pause
(message)
After displaying the pause message in the console
window, if it is open (the -W command line flag, or the $#console command), Tags
waits for you to press a key before continuing. The pause command
is ignored if the console window is not open.
$#play
(wave-file-name)
Plays the wave-file specified by
the wave-file-name. If
the wave-file-name does
not specify the path, the TagsPath
environment variable is used to determine the directories to
search for the file if available. Otherwise, the path of the Tags
command you specified in the command line is used. If you did not
specify a path, then Tags will look for the wave-file-name in the current
directory.
$#push,
$#pop
Pushes and pops variables on and
off a stack variable. Any variable can be used as a stack without
affecting its contents. If the stack variable does not already
exist it is created on first use. This is also true for the pop
variable.
$#push mystack(myvar) pushes myvar onto mystack
$#pop avar(mystack) pops the top of mystack into avar
or
$#set avar($^mystack$$) also pops the top of mystack into avar.
Any properties specified on mystack are applied to the tos.
The pop command also has an equivalent referencer: "$^". You can
request the properties of the variable at the top of the stack by
requesting them of the stack variable itself. For example,
$^?mystack$$ returns the type of the variable at the top of the
stack.
$#sleep
(time)
Relinquishes control of the CPU for
the specified time, which is in milliseconds. Useful for scripts
you want to run in the "background." An example might be a script
that watches for change in a page at some URL.
$#set
name(value), $#text name(value), $#xml name(value)
Sets the variable having the
specified name to the
resolved value. The
value is resolved to its "natural" type, such as nodeset, node, or
string.
$#stop
(message)
Aborts Tags processing, placing the
specified resolved message as
the last line of the output file. If the console window is
visible, Tags displays the stop message
there, and waits for the user to press a key before
terminating the run.
$#trace
(value)
Activates or deactivate the trace
facilities of Tags according to the resolved value . If the value is false, tracing is
turned off. Otherwise, tracing is turned on. Tracing causes Tags
commands to be written to the output as Tags executes them. Each
traced Tags command line is appended with the state and depth of
the condition stack. Use this script command instead of the -Y
command-line flag to limit the trace to a portion of your script.
If the console window is open, Tags displays the trace information
there as well.
$#translate
(arg-char-set,fun-char-set)
Translates output characters such
that characters matching characters in the
arg-char-set (the
characters before the comma) are translated to corresponding
characters in the
fun-char-set (the characters after the comma).
This command works like the XPath translate command.
All characters following the
$#translate
command are translated until either there is no more output or
until a
$#translate
command is encountered that has no argument. It is an error if the
number of characters in the
argcharset
is greater than the number of characters in the
funcharset. Any characters in
the
funcharset beyond
the number of characters in the
argcharsetare,
however,
ignored.
For example, if the
argcharset
contains the three characters "{}|", and if the
funcharset contains the
three characters
<
>
", the command is
written as
$#translate ({}|,<>") -- no embedded spaces!
$#translate command are
translated to '<', all '}' characters in the output are
translated to '>', and all '|' characters in the output are
translated to '"' (quotes). Thus,
{element
attribute=|value|/}
becomes, after translation,
<element attribute="value"/>.
Restrictions apply: You cannot use a comma nor a close-parenthesis
in either the
argcharset or
the
funcharset.
Otherwise, the program could not parse the command. Watch out:
Spaces within the parentheses are subject to the translating
rules.
$#vars
(message)
Lists the variables sorted by name
at the point in Tags processing when the command is encountered.
Tags displays the resolved
message
before the variable list. The variable list is also
displayed in the console window if it is open. You could use the
following command sequence to display the variables in the console
window to help you debug::
$#console(on)
$#vars (Here is a list of the
variables)
$#pause (Press any key)
$#console (off)
Except for the $#defer
command, all commands resolve their arguments before applying them.
Tags expects that you will use Tags referencers prolifically, both
in text and in expressions. Using the $#defer command, you can store commands in
variables for later reference.
Any command not in the above list is taken by Tags as a comment.
Comments are not written to the output file, but they are displayed
in the console window if it is active (use the -W command-line flag,
or the $#console command).
A Tags command occupies a single text line, and can be indented; as
long as only whitespace preceeds the command, since leading
whitespace is ignored. Tags also ignores any text on the same line
following the command.
Here's a helpful idea: if
your editor has the ability to match braces, you can put an
open-brace after each $#if
,$#ifn and$#forEach command, and
a close-brace after each matching $#end command, and then you can use your
editor match-braces commands to match up begin and end parts
of Tags command sequences. |
Examples:
$#set $?converter$$($?converter$$)
$#ifn ($?$?converter$$$$)
$#trace (on)...stuff to debug
$#trace (off)
The first example sets a variable whose name is the value of the
variable converter to the value of the variable converter. I.e., the
name and the value of the variable are the same.
The second example evaluates the value of the variable named by the
converter variable. If the variable doesn't exist, or its name and
value is false, then the
text within the $#ifn is
processed.
The third example traces a section of script, then turns the debug
off.
XPath
functions available in Tags
Tags does not support arithmetic and string expressions, but XPath
does. XPath also provides a number of useful functions that are
available to the Tags script writer using the $!xpath-expression$$
XPath expression syntax.
Nodeset
functions
Function:
number
last()
Function:
number
position()
Function:
number
count()
Function:
node-set
id(object)
Function:
string
local-name(node-set?)
Function:
string
namespace-uri(node-set?) - not implemented.
Function:
string
name
String
functions
Function:
string
string(object?)
Function:
string
concat(string, string, string*)
Function:
boolean
starts-with(string, string)
Function:
boolean
contains(string, string)
Function:
string
substring-before(string, string)
Function:
string
substring-after(string, string)
Function:
string
substring(string, number, number?)
Function:
number
string-length(string?)
Function:
string
normalize-space(string?)
Function: string lower-case(string) - nonstandard extension: returns
the argument with all alphabetic characters in lower case.
Function: string upper-case(string) - nonstandard extension: returns
the argument with all alphabetic characters in upper case.
Function:
string
translate(string, string, string)
Function: boolean match(string, string) - nonstandard extension:
returns true if a substring in the first parameter matches the
regular expression in the second parameter.
Boolean
functions
Function:
boolean
boolean(object)
Function:
boolean
not(boolean)
Function:
boolean
true()
Function:
boolean
false()
Number
functions
Function:
number
number(object?)
Function:
number
sum(node-set)
Function:
number
floor(number)
Function:
number
ceiling(number)
Function:
number
round(number)
The
Line Output Management Commands
When you direct Tags to output your text to a file or a pipe using
some form of the $#out command, you can manage how it processes your
raw output using these line management commands. Simply embed them
in your output streams.
The Tab
Command ($\t{i})
This output command allows you to format the output text by aligning
on specific offsets. Tags supports a variable, called $?tabs$$, which you can set to
contain a table of line offsets as adelimited string, which are indexed by the tab
command subscript. If you do not provide the tab table, then Tags
uses the tab command subscript, itself, as the line offset. Here is
an example using a tab table:
<Tags>
Customer Report Using The Tags ODBC Facility (December 10, 2004)
$#set dsn(Tags-customer-dsn)
$#set tabs(+9+39+62+78+95+104+124)
$#set columns(,CustNo,Name,Street,City,State/Prov,ZipCode,Country,Phone)
CustNo$\t{1}Name$\t{2}Street$\t{3}City$\t{4}State/Prov$\t{5}ZipCode$\t{6}Country$\t{7}Phone
------$\t{1}----$\t{2}------$\t{3}----$\t{4}----------$\t{5}-------$\t{6}-------$\t{7}-----
$#forEach SQL(select * from cust)
$?CustNo$$$\t{1}$?Name$$$\t{2}$?Street$$$\t{3}$?City$$$\t{4}$?State/Prov$$$\t{5}$?ZipCode$$$\t{6}$?Country$$$\t{7}$?Phone$$
$#end
</Tags>
In this example, the offset associated with $\t{1} is 9, and the offset
associated with $\t{4} is
78.
The
Join-Line Command ($\j )
Sometimes you need to control the output of parts of a single output
line. Use the special symbol $\j at the point on a line where you
want to concatenate the next piece of the output line. Here is an
example:
<$!@name$$ $\j
$#if ($!@prompt$$)
prompt="$!@prompt$$" $\j (notice the space before"prompt")
$#end
$#if ($!@default$$)
default="$!@default$$" $\j (notice the space before "default")
$#end
$#if ($!@value$$)
> $\j
$!@value$$ $\j
</$!@name$$>;
$#else
/>;
$#end
Notice that the prompt=""
and the default=""
attributes and the value may not be required. Supposing that only
the prompt="" attribute is
present, the output would appear as below:
<name prompt="Please say hello"/>
The space on the line before "prompt" puts the space between "name"
and "prompt." Note that, as in the example, text beyond the $\j
concatenator operator is discarded.
The
New-Line Command ($\n)
You can split a line into two output lines using the $\n command.
The text before the $\n is written to the output, then a newline is
written, and then the text after the $\n command is written to the
output.
The
CData Commands ($\c and $\d)
The $\c
command generates a <![CDATA[
begin-tag in the output stream, while the $\d command generates a closing
]]> end-tag in the output
stream.
The
Blank-Line Command ($\b{i})
The $\b{i}
command removes all subsequent groups of blank lines from the raw
output, replacing them with the number of blank lines specified by i. E.g., if your raw output
contains groups of blank lines, and you specify $\b3, then each subsequent
single group of blank lines is replaced by three blank lines in the
"cooked" output.
Errors
All errors detected by Tags result in immediate termination of the
resolution process. An error message is generated and appended to
the output file. While native Tags errors are explained with a short
phrase or sentence, XPath errors are given as an number. You can
translate (?) the number using these tables:
XPath
parser errors
These errors are the result of a badly-formed XPath expression.
2850
|
XPE_UNKNOWNENTITY |
2851
|
XPE_BADENTITY |
2852
|
XPE_DOUBLECOLONEXPECTED |
2853
|
XPE_QNAMEEXPECTED |
2854
|
XPE_LPARENEXPECTED |
2855
|
XPE_RPARENEXPECTED |
2856
|
XPE_RPARENNOTEXPECTED |
2857
|
XPE_RBRACKETEXPECTED |
2858
|
XPE_VARNAMEEXPECTED |
2859
|
XPE_LITERALEXPECTED |
2860
|
XPE_UNEXPECTEDEND |
2861
|
XPE_EQUALSIGNEXPECTED |
2862
|
XPE_UNKNOWNOPERATOR |
2863
|
XPE_TOOMANYCOLONS |
XPath
evaluator errors
These errors are the result of context errors. The expression
parsed successfully.
2800
|
XPE_UNDERRUN
|
2801
|
XPE_NODEEXPECTED |
2802
|
XPE_NODESETEXPECTED |
2803
|
XPE_STRINGEXPECTED |
2804
|
XPE_NUMBEREXPECTED |
2805
|
XPE_BOOLEANEXPECTED |
2806
|
XPE_OPNOTEXPECTED |
2807
|
XPE_AXISNAMEUNKNOWN |
2808
|
XPE_WRONGNRARGUMENTS |
2809
|
XPE_PROCINSTEXPECTED |
2810
|
XPE_STACKEMPTY |
2811
|
XPE_STACKNOTEMPTY |
2812
|
XPE_FUNCTIONUNKNOWN |
2813
|
XPE_BADOPERANDTYPE
|
2814
|
XPE_EMPTYRESULT |
2815
|
XPE_CONTEXTEXPECTED |
2816
|
XPE_PATHEXPECTED |
2817
|
XPE_DIVIDEBYZERO |
2818
|
XPE_NOVARS |
Tags
Download
Here is my Electronic License Agreement cribbed from others that I
have seen:
This is a legal Agreement between you and Paul J Medlock, Jr.
(hereinafter referred to as "I" or "me"). The terms of this
Agreement govern your use of the software in the Tags package and
any other materials on this website. By downloading and installing
the software in the Tags package, or other materials on this
website, you are agreeing to be bound by this Agreement. If you do
not agree to the terms of this Agreement, please do not download
and install the software onto your computer. You are free to use
the Tags software on your machine and/or other machines on a LAN
in your home and/or at your office at no cost. You are not free to
give copies to others. If others are interested in it, direct them
to this site instead. You may not sell the software in any form,
no matter how well you hide it. Nor can you claim that you wrote
it. I did. All materials that are copyrightable are copyrighted by
me.
I make no warranty for your use of this software. Nor do I
promise that it does what I claim it does. If the documentation
makes an outlandish claim of functionality, test the software
before assuming that it actually does what's claimed. If you have
any problem, or if Tags causes you any loss: personal, financial,
hardware, emotional, or otherwise, I am in no way responsible, and
I am not liable for any damages whatsoever. If you violate
patents, trademarks, or copyrights with the use of this software,
I am not a party to that violation, and I won't help you in court.
Don't forget, it's free.
In other words, use it at your own risk.
Here is a zip-file containing the latest release of the
Windows version of Tags and its
supporting and example files. If you download the software, it
means you are willing to abide by the terms and conditions
of the License Agreement above.
If you use Tags, please be kind enough to give me some feedback:
bugs, ideas for features, comments, etc. If you are interested in
a version for Linux or Unix, let me know.