|
Berkeley Web Template CGI script
|
|
|
|
Short Introduction to Template Concepts
Two Programs Rolled Into One
The Berkeley Web Template program can be thought of as two separate programs
completely independent of one another. The first program reads the configuration
files and generates the HTML form. The second program reads the configuration
files, reads the data the user has input into the form, and then generates the
XML markup with those user-supplied form values "plugged in." The first program
is invoked by calling the Template URL without any additional query string parameters, e.g.,
http://your.server.edu/cgi-bin/template/ead/generic while
the second program is invoked by calling the same url but with a ?submit=1
query string parameter tacked on, e.g.,
http://your.server.edu/cgi-bin/template/ead/generic?submit=1
In fact it is often useful to use the second part of the program only,
i.e., plugging user-input data into XML markup, without ever bothering to
use the HTML-form generation powers of the first part. The first part of
the process is useful for institutions which provide many slightly customized
HTML forms for different repositories, markup units or finding aids types.
The form templates have been optimized to make slight changes quick and
easy--but this comes at the expense of making major changes very hard. For smaller
institutions who need to create only one or two different web templates
which are significantly different in design from those which ship with this
program, it is probably much easier to create their own, static HTML form
exactly as they would like, and invoke the second part of the program by
plugging in the ?submit=1 form of the url into their form's action
attribute:
<form method="post" action="http://your.server.edu/cgi-bin/template/ead/generic?submit=1">
The file static_form.html included with this distribution demonstrates
this strategy.
Hierarchical Templates
The Berkeley Web Template program always reads two template files. The
first, or "master," template file contains data that are common to all
templates for all repositories which use the template program. The second,
or "sub," template files contains only that data which are specific or unique
to a particular repository. The repository's name and address is an example
of data which would probably live in the second sub template while most
of the form's HTML would probably live in the first.
This hierarchical relationship is reflected in the URL used to access
a specific template, e.g. this url:
http://your.server.edu/cgi-bin/template/ead/mss
invokes the master template called ead.cfg and then the
sub template called mss.cfg.
In fact this hierarchical arrangement is not set in stone. Most anything
you put in one template could also conceivably go into the other template
instead. You are free to divide up the data however you see fit. I have
created templates where all of the data was contained in the first template
and the second template was completely blank. And vice versa (well, almost
vice versa. The master template does have one piece of data that must live
there always: the TemplateDir parameter telling the Template program
in which directory all of the sub templates are located).
The Template program first reads information in the first template and
adds it to an internal data structure. It then reads the second template
and adds that data to the same structure. So all of the data in both
templates ultimately ends up in the same place. Not only that, but information
in the second template always trumps or overwrites the same information
in the first template (since the sub template is always read last). That means
that you may want to put information that is almost always the
same into the first template for all sub templates to share, but repeat
it with slight or major variations in a sub template when it must be
different for that particular repository or application.
Templates and Forms
Template files (or more accurately, configuration files) are composed
of distinct sections denoted by square brackets. The first part of the program,
the HTML form-generation part, reads its data from sections which begin with the
word "Form", e.g., [Form GenericTextBox]. The second part of the
program, the XML-generation part, always reads its data from sections which
begin with the word "Template", e.g., [Template C02Component].
Each type of section is generally invisible to the other. That is, while generating
the HTML form that the user will fill in, the Template program is completely
oblivious to [Template ...]'s. When it comes time for the program
to generate the XML markup when the user clicks submit, it knows absolutely
nothing about [Form ...] sections.
Templates can contain other Templates and Forms can contain other Forms.
This is another technique for factoring out common information and placing
just the parts you want to customize together in one place. So for example if
you have one Template called [Template Eadheader] and wish
to insert into it a different template called [Template RepositoryAddress]
you would include the token {@RepositoryAddress} at the place
in [Template Eadheader] where you want the address to occur.
You may in fact wish to put [Template Eadheader] into your
master configuration file for all repositories to share, but keep
[Template RepositoryAddress] in the sub templates where
it can be customized for each repository. Remember, you can divide your
data up between the master and sub templates however you wish.
[Template PublicationStmt]
<publicationstmt>
<publisher>The Bancroft Library.</publisher>
<address>
<addressline>University of California, Berkeley</addressline>
<addressline>Berkeley, California, 94720-6000</addressline>
<addressline>Phone: (510) 642-6481</addressline>
<addressline>Fax: (510) 642-7589</addressline>
<addressline>Email: bancref@library.berkeley.edu</addressline>
<addressline>URL: http://bancroft.berkeley.edu</addressline>
</address>
<date>© 2005</date>
<p>The Regents of the University of California. All rights reserved.</p>
</publicationstmt>
[Template _DEFAULT]
...
<filedesc>
<titlestmt>
<titleproper>{$COLLECTION_TITLE}</titleproper>
<titleproper type="filing">{$FILETITLE}</titleproper>
<author>{$PROCESSED_BY}.</author>
<sponsor>{$SPONSOR}.</sponsor>
</titlestmt>
{@PublicationStmt}
</filedesc>
...
|
System Variables
The Template program provides a number of special system variables which
you can plug in anywhere in any configuration file. Most are available only
within a Form section but others may be used in a Form or a Template section.
These are always uppercase and begin with an underscore. The most useful
ones involve aliases for your template script. These guarantee a degree
of portability for your templates. If you move the template script to a
new server with a different url, or change the name of the template script
(renaming it to something like "ead_templates.pl" for example), you needn't
make any changes to all of the templates you have created provided you
avoid direct explicit references to either the name of the script or the
url.
Assuming a URL like this: http://your.server.edu/cgi-bin/template/ead/mss@Toc,
the following system variables will embed these values into your Forms:
|
{_SCRIPT_URL}
|
http://your.server.edu/cgi-bin/template
|
|
{_PATH_INFO}
|
/ead/mss@Toc
|
|
{_FULL_URL}
|
http://your.server.edu/cgi-bin/template/ead/mss@Toc
|
|
{_URL}
|
http://your.server.edu/cgi-bin/template/ead/mss
|
{_FULL_URL} embeds the entire, complete url into your form, including
the name of any explicit Form template if present (see next section). {_URL} includes the
same complete url except that it does not include the Form name portion. If no
explicit Form is called in the URL, these two system variables are equivalent
(e.g., if [Form _DEFAULT] is used). {_URL} is by far the most
useful, while {_FULL_URL} is rarely used.
"_DEFAULT" Templates and Forms
[Form ...] and [Template ...]sections serve
a second purpose. Not only can one Form or Template section be called or
embedded within a different Form or Template section, specific Form and
Template sections can also be requested by the user's browser. This is
illustrated in the ead.cfg configuration file. The EAD template which ships
with this program is a framed document. In the left frame is the table
of contents and in the right, the HTML form page itself. Like all framed
documents, this one is controlled by a master frameset document. All in all,
three separate distinct documents are requested by the browser. Each of
these three documents is contained in the ead.cfg file in three separate
[Form ...] sections. To call up a specific Form section with
your browser, append the name of the Form to the end of your URL, preceded
by an '@' symbol.
http://your.server.edu/cgi-bin/template/ead/generic@Toc
http://your.server.edu/cgi-bin/template/ead/generic@Content
If you don't specify a specific Form or Template in your URL, the Template
program will use one called "_DEFAULT", e.g., [Form _DEFAULT].
The frameset file of the EAD template is contained in [Form _DEFAULT].
One could conceivably place the frameset in a Form called something like "Frameset",
then invoke like http://your.server.edu/cgi-bin/template/ead/generic@Frameset
and not supply a _DEFAULT Form at all, but this would make the URL a bit uglier and
harder to remember.
It is possible to invoke a specific markup Template by placing
the Template name token before the ?submit=1 query string parameter
in your HTML form's action attribute, but this would be a rare
and generally unnecessary thing to ever have to do.
Note: It is important that any Form or Template designed to be
sent to the user's browser begin with an appropriate Content-type
header. This only applies to Forms or Templates sent to the browser and
should not be included in Forms or Templates designed to be embedded within
other Forms or Templates. Forms sent to the browser should always begin
with Content-type: text/html followed by a blank line. Templates
will usually begin with Content-type: text/xml unless you are
using the template program to generate other types of data such as SGML
or plain text, or PDF, etc.
[Form _DEFAULT]
Content-type: text/html
<html>
<head>
<title>{TEMPLATE_TITLE}</title>
</head>
<frameset rows="100%, *" cols="140, *" framespacing="0" border="0" frameborder="0">
<frame name="toc" src="{_URL}@Toc">
<frame name="eadInput" src="{_URL}@Content">
</frameset>
</html>
[Form Toc]
Content-type: text/html
<html>
<head>
<title>Table of Contents</title>
</head>
<body>
<h1>Table of Contents for {TEMPLATE_TITLE}</hi>
...
</body>
</html>
|
Substitution Variables
Substitution variables are the core of the Template program's functionality.
Data input by the user in the HTML form is plugged into the XML markup by placing
a substitution variable having the same name as the form element's name
attribute. Plug the data into the markup by placing a token consisting of the
name of the form element preceded by a dollar sign and enclosed in curly braces,
e.g., {$UNITTITLE}. Form element name attributes and the
substitution variables that represent them must consist only of uppercase letters
or numbers and an underscore. You'll note that some substitution variables in
the EAD templates have other punctuation, such as dashes and colons, etc. These
have special meaning to the template program. Their use is not covered in this
guide and you should not use any puncutation marks other than an underscore.
In The HTML Form:
<input type="text" class="textbox" name="FINDAID_TITLE">
<input type="text" class="textbox" name="CALLNO">
In The XML Markup Template:
<frontmatter>
<titlepage>
<titleproper>{$FINDAID_TITLE}</titleproper>
<num>Collection number: {$CALLNO}</num>
|
Filters and Extensions
Filters are calls to small perl programs written in an extension file that
processes the data in a substitution variable in some way before plugging the
result into the XML markup after clicking Submit. Functionality such as splitting
prose text into paragraphs along blank lines, converting MARC format access points
into EAD <controlaccess> elements, automatically normalizing
dates, supplying language codes, encoding chronological lists, etc., are all
implemented using filters which call perl code included in the ead.ext extenstion
file.
A Filter call is implemented by following the substitution variable with
an arrow (->) and the name of the filter. {$SCOPECONTENT->auto_p}
for example, takes the data entered into the SCOPECONTENT form field and automatically
splits it into paragraphs. In the ead.ext extension file you will find an extension
called 'auto_p'. Filters can also take a single parameter enclosed in parentheses,
e.g., {$SCOPECONTENT->auto_p(param)}, and any substitution variable can
call more than one filter by separating them with a comma, e.g., {$SCOPECONTENT->filter1,filter2}.
In such cases, the data is passed to the first filter where it is processed, then it is
passed to the second filter where it is processed some more before finally being
inserted into the XML markup. These two techniques, multiple filters and filter
parameters, allow for more generic and reusable code.
In The HTML Form:
<tr>
<td align="left" width="40%">
<font face="Arial" size="-1">Head:</font>
<input type="text" class="headbox" name="BIOGPROSE_HEAD" size="40">
</td>
</tr>
<tr>
<td colspan="2">
<textarea name="BIOGPROSE" rows="20"></textarea>
</td>
</tr>
In The XML Markup Template:
<bioghist>
<head>{$BIOGPROSE_HEAD}</head>
{$BIOGPROSE->auto_p,auto_head(bioghist)}
</bioghist>
|
Special post processing filters can be applied to the entire XML file to
be output just before it is output by specifying them in a PostProcess
parameter in a [General Parameters] section. In ead.cfg, the
PostProcess parameter calls 3 separate filters to further manipulate
the XML markup before it is finally delivered to the user's browser:
PostProcess clean_attributes,delete_empties,convert_diacritics
clean_attributes deletes all attribute values with empty values,
e.g., <unittitle label=""> would be converted to to plain
<unittitle>. There are other syntax tricks for assuring that
empty attribute values are never output but a global sweep with clean_attributes
is simpler and cleaner. The clean_attributes filter is one of the
simplest filters available for the EAD templates (most of the others are
complicated by weird or rare exceptions that must be dealt with). Here is
what that filter looks like in ead.ext:
'clean_attributes' =>
sub {
my ($cfg, $template, $post_data, $post_tree, $var_name, $text, $node_fields) = @_;
$text =~ s/\s+[-a-z]+="\s*"//g;
$text =~ s/(\s+[-a-z]+=") ([^"\n<>]+")/$1$2/g;
return $text;
},
|
delete_empties is similar to clean_attributes except that
it deletes empty elements with no content, e.g., <unitid></unitid>
would be deleted entirely. Like empty attribute values, there are other syntax tricks
for dealing with this but they complicate the templates considerably. delete_empties
is an example of a filter that is more complicated than it at first seems.
It must delete not only empty elements but often the tags that surround them
as well. It is not good enough to delete <p></p>. You'll
probably also want to get rid of empty elements that surround it, e.g.,
<scopecontent>
<head>Scope and Content</head>
<p></p>
</scopecontent>
|
Finally, convert_diacritics converts various diacritical marks
and special characters into their hexadecimal numerical entity equivalents.
E.g., 'é' is converted to 'é' throughout the XML markup.
Writing filters and extension files is not covered in this guide.
Substitution Variable Default Values
You can provide a default value for your substitution variables if the
user chooses not to supply one when they fill out the web form. Certain
data may be so important that you, as the template manager, may wish to
insure that a value is always supplied. Within the curly braces of the
subsitution variable token, supply the default value following a vertical
bar ('|'), e.g., {$SCOPECONTEMT_HEAD|Scope and Content}. This
syntax is appropriate for very short, textual strings. More complicated
alternate text can be supplied using the alternate text syntax described
later.
It is sometimes necessary, when using the Template program's more complex
syntactical features, to explicitely specify nothing as the alternate text
of a substitution variable, e.g., {$NORMAL_DATE|}. This is only ever
necessary when said variables occur within associated text blocks (see below)
under certain circumstances. This guide will not cover these advanced syntactical
features in detail, but you will see them used extensively in the ead templates
which ship with this program.
More Templates and Forms
Finally, to facilitate reuse and modularity, Forms can be parameterized when
embedded within other Forms. That is, you can embed a Form using the standard
'@' syntax, {@FormName} but parameterized with custom values. This allows
you to use the exact same Form but with slightly different values. The EAD
templates which ship with this program utilizes a very modular format, allowing
you to "plug in" new form elements to support new EAD elements not included
with the shipped templates.
To parameterize a form, follow the name of the Form with a colon, then a
comma-separated list of parameterized values, as many as you wish:
{@TextBox:label="Dates",id="date1",name="UNITDATE"}
Within the invoked [Form ...], these parameters (they can be
called whatever you like) are used by embedding a Form parameter token, enclosed
in curly braces and preceded by an percent sign, e.g., {%label}
[Form TextBox]
{%label}: <input type="text" id="{%id}" name="{%name}">
|
Associated Text Blocks
An associated text block is a feature whereby markup and text can be explicitely
associated with a specific substitution variable. For example, if the template
provides a form element to input a unittitle, the generated markup would not
only include the value which the user input, but also the surrounding markup:
$[<unittitle label="Title">{$UNITTITLE}</unittitle>]$
The $[ and $] tokens associate this markup with
the {$UNITTITLE} substitution variable. What does this mean?
It means that the markup will only be generated if there is actually a
value for {$UNITTITLE}. Without the associated text block, if
the user does not input a value for the UNITTITLE form element then the
empty markup would still be generated:
<unittitle label="Title"></unittitle>
This is usually undesirable. With the associated text block, then no markup
would be generated for unittitle at all if the user does not input a value.
If an associated text block contains multiple substitution variables, then
if any one of them does not resolve to a value then the entire block will
not be generated:
$[<unittitle label="{$UNITITLE_LABEL}">{$UNITTITLE}</unittitle>]$
In the example above, markup would be generated only if the user
input values for both the UNITTITLE and the UNITITLE_LABEL. If the user inputs a
UNITTITLE but omits a label, then markup will not be generated. This is often
undesirable and there are a number of ways around this depending on
how one wishes to resolve this case. If, for example, you want markup generated
if the user inputs a UNITTITLE but does not input a UNITITLE_LABEL, then supply
the UNITTITLE_LABEL with a default value (see Substitution Variable Default Values
above):
$[<unittitle label="{$UNITITLE_LABEL|Title}">{$UNITTITLE}</unittitle>]$
You can even specify no value at all for the default value:
$[<unittitle label="{$UNITITLE_LABEL|}">{$UNITTITLE}</unittitle>]$
Now if the user specifies a UNITTITLE but no UNITTITLE_LABEL, normally you
would be left with markup like this:
<unittitle label="">John Carlsbad Papers</unittitle>
But since we apply a clean_attributes post processing filter to
our final markup before it is sent to the user's browser, empty attributes
such as that label="" will be deleted.
A second option is to use nested associated text blocks. Associated
text blocks can be nested up to a level of four deep. So a second option
is create a special set of blocks around that label attribute:
$[<unittitle$[ label="{$UNITITLE_LABEL}"]$>{$UNITTITLE}</unittitle>]$
I find this clutters up the markup templates and makes them a bit harder
to understand. Generally I prefer to leave this kind of thing to a post processing
filter such as clean_attributes.
Text within associated text blocks can span multiple lines:
$[<relatedmaterial>
<head>{$RELATEDMATERIAL_HEAD|Related Material}</head>
{$RELATEDMATERIAL->auto_p}
</relatedmaterial>]$
Since associated text blocks make markup templates a bit harder
to create and understand, I prefer to use post processing filters whenever
feasible. For the <relatedmaterial> example above, I
omit the associated text blocks and let the delete_empties
post processing filter delete empty, unused elements and any associated
<head>s.
Deletion of associated text blocks is triggered if any enclosed
substitution variable does not resolve to a value. To "turn off" that
trigger for some of those variables, supply a default value of nothing
for those variables. In the following example, associated text block
deletion has been turned off for the {$RELATEDMATERIAL_ANALOG}
and {$RELATEDMATERIAL_HEAD} substitution variables. Now only
a null value for {$RELATEDMATERIAL} will trigger deletion of
the entire block:
$[<relatedmaterial encodinganalog="{$RELATEDMATERIAL_ANALOG|}">
<head>{$RELATEDMATERIAL_HEAD|}</head>
{$RELATEDMATERIAL->auto_p}
</relatedmaterial>]$
|