Berkeley Web Template CGI script
The Berkeley EAD Toolkit
University of California, Berkeley
http://sunsite.berkeley.edu/ead/tools
 
Short Introduction to Template Concepts

Two Programs Rolled Into One

The Berkeley Web Template program can be thought of as two separate programs completely independent of one another. The first program reads the configuration files and generates the HTML form. The second program reads the configuration files, reads the data the user has input into the form, and then generates the XML markup with those user-supplied form values "plugged in." The first program is invoked by calling the Template URL without any additional query string parameters, e.g.,
http://your.server.edu/cgi-bin/template/ead/generic while the second program is invoked by calling the same url but with a ?submit=1 query string parameter tacked on, e.g.,
http://your.server.edu/cgi-bin/template/ead/generic?submit=1

In fact it is often useful to use the second part of the program only, i.e., plugging user-input data into XML markup, without ever bothering to use the HTML-form generation powers of the first part. The first part of the process is useful for institutions which provide many slightly customized HTML forms for different repositories, markup units or finding aids types. The form templates have been optimized to make slight changes quick and easy--but this comes at the expense of making major changes very hard. For smaller institutions who need to create only one or two different web templates which are significantly different in design from those which ship with this program, it is probably much easier to create their own, static HTML form exactly as they would like, and invoke the second part of the program by plugging in the ?submit=1 form of the url into their form's action attribute:

<form method="post" action="http://your.server.edu/cgi-bin/template/ead/generic?submit=1">

The file static_form.html included with this distribution demonstrates this strategy.



Hierarchical Templates

The Berkeley Web Template program always reads two template files. The first, or "master," template file contains data that are common to all templates for all repositories which use the template program. The second, or "sub," template files contains only that data which are specific or unique to a particular repository. The repository's name and address is an example of data which would probably live in the second sub template while most of the form's HTML would probably live in the first.

This hierarchical relationship is reflected in the URL used to access a specific template, e.g. this url:
http://your.server.edu/cgi-bin/template/ead/mss invokes the master template called ead.cfg and then the sub template called mss.cfg.

In fact this hierarchical arrangement is not set in stone. Most anything you put in one template could also conceivably go into the other template instead. You are free to divide up the data however you see fit. I have created templates where all of the data was contained in the first template and the second template was completely blank. And vice versa (well, almost vice versa. The master template does have one piece of data that must live there always: the TemplateDir parameter telling the Template program in which directory all of the sub templates are located).

The Template program first reads information in the first template and adds it to an internal data structure. It then reads the second template and adds that data to the same structure. So all of the data in both templates ultimately ends up in the same place. Not only that, but information in the second template always trumps or overwrites the same information in the first template (since the sub template is always read last). That means that you may want to put information that is almost always the same into the first template for all sub templates to share, but repeat it with slight or major variations in a sub template when it must be different for that particular repository or application.



Templates and Forms

Template files (or more accurately, configuration files) are composed of distinct sections denoted by square brackets. The first part of the program, the HTML form-generation part, reads its data from sections which begin with the word "Form", e.g., [Form GenericTextBox]. The second part of the program, the XML-generation part, always reads its data from sections which begin with the word "Template", e.g., [Template C02Component]. Each type of section is generally invisible to the other. That is, while generating the HTML form that the user will fill in, the Template program is completely oblivious to [Template ...]'s. When it comes time for the program to generate the XML markup when the user clicks submit, it knows absolutely nothing about [Form ...] sections.

Templates can contain other Templates and Forms can contain other Forms. This is another technique for factoring out common information and placing just the parts you want to customize together in one place. So for example if you have one Template called [Template Eadheader] and wish to insert into it a different template called [Template RepositoryAddress] you would include the token {@RepositoryAddress} at the place in [Template Eadheader] where you want the address to occur. You may in fact wish to put [Template Eadheader] into your master configuration file for all repositories to share, but keep [Template RepositoryAddress] in the sub templates where it can be customized for each repository. Remember, you can divide your data up between the master and sub templates however you wish.

[Template PublicationStmt]
<publicationstmt>
<publisher>The Bancroft Library.</publisher>
<address>
<addressline>University of California, Berkeley</addressline>
<addressline>Berkeley, California, 94720-6000</addressline>
<addressline>Phone: (510) 642-6481</addressline>
<addressline>Fax: (510) 642-7589</addressline>
<addressline>Email: bancref@library.berkeley.edu</addressline>
<addressline>URL: http://bancroft.berkeley.edu</addressline>
</address>
<date>© 2005</date>
<p>The Regents of the University of California. All rights reserved.</p>
</publicationstmt>


[Template _DEFAULT]
...
<filedesc>
<titlestmt>
<titleproper>{$COLLECTION_TITLE}</titleproper>
<titleproper type="filing">{$FILETITLE}</titleproper>
<author>{$PROCESSED_BY}.</author>
<sponsor>{$SPONSOR}.</sponsor>
</titlestmt>
{@PublicationStmt}
</filedesc>
...


System Variables

The Template program provides a number of special system variables which you can plug in anywhere in any configuration file. Most are available only within a Form section but others may be used in a Form or a Template section. These are always uppercase and begin with an underscore. The most useful ones involve aliases for your template script. These guarantee a degree of portability for your templates. If you move the template script to a new server with a different url, or change the name of the template script (renaming it to something like "ead_templates.pl" for example), you needn't make any changes to all of the templates you have created provided you avoid direct explicit references to either the name of the script or the url.

Assuming a URL like this: http://your.server.edu/cgi-bin/template/ead/mss@Toc, the following system variables will embed these values into your Forms:

{_SCRIPT_URL} http://your.server.edu/cgi-bin/template
{_PATH_INFO} /ead/mss@Toc
{_FULL_URL} http://your.server.edu/cgi-bin/template/ead/mss@Toc
{_URL} http://your.server.edu/cgi-bin/template/ead/mss

{_FULL_URL} embeds the entire, complete url into your form, including the name of any explicit Form template if present (see next section). {_URL} includes the same complete url except that it does not include the Form name portion. If no explicit Form is called in the URL, these two system variables are equivalent (e.g., if [Form _DEFAULT] is used). {_URL} is by far the most useful, while {_FULL_URL} is rarely used.



"_DEFAULT" Templates and Forms

[Form ...] and [Template ...]sections serve a second purpose. Not only can one Form or Template section be called or embedded within a different Form or Template section, specific Form and Template sections can also be requested by the user's browser. This is illustrated in the ead.cfg configuration file. The EAD template which ships with this program is a framed document. In the left frame is the table of contents and in the right, the HTML form page itself. Like all framed documents, this one is controlled by a master frameset document. All in all, three separate distinct documents are requested by the browser. Each of these three documents is contained in the ead.cfg file in three separate [Form ...] sections. To call up a specific Form section with your browser, append the name of the Form to the end of your URL, preceded by an '@' symbol.

http://your.server.edu/cgi-bin/template/ead/generic@Toc
http://your.server.edu/cgi-bin/template/ead/generic@Content

If you don't specify a specific Form or Template in your URL, the Template program will use one called "_DEFAULT", e.g., [Form _DEFAULT]. The frameset file of the EAD template is contained in [Form _DEFAULT]. One could conceivably place the frameset in a Form called something like "Frameset", then invoke like http://your.server.edu/cgi-bin/template/ead/generic@Frameset and not supply a _DEFAULT Form at all, but this would make the URL a bit uglier and harder to remember.

It is possible to invoke a specific markup Template by placing the Template name token before the ?submit=1 query string parameter in your HTML form's action attribute, but this would be a rare and generally unnecessary thing to ever have to do.

Note: It is important that any Form or Template designed to be sent to the user's browser begin with an appropriate Content-type header. This only applies to Forms or Templates sent to the browser and should not be included in Forms or Templates designed to be embedded within other Forms or Templates. Forms sent to the browser should always begin with Content-type: text/html followed by a blank line. Templates will usually begin with Content-type: text/xml unless you are using the template program to generate other types of data such as SGML or plain text, or PDF, etc.

[Form _DEFAULT]
Content-type: text/html

<html>
  <head>
    <title>{TEMPLATE_TITLE}</title>
  </head>
  <frameset rows="100%, *" cols="140, *" framespacing="0" border="0" frameborder="0">
    <frame name="toc" src="{_URL}@Toc">
    <frame name="eadInput" src="{_URL}@Content">
  </frameset>
</html>


[Form Toc]
Content-type: text/html

<html>
  <head>
    <title>Table of Contents</title>
  </head>
  <body>
    <h1>Table of Contents for {TEMPLATE_TITLE}</hi>
    ...
  </body>
</html>



Substitution Variables

Substitution variables are the core of the Template program's functionality. Data input by the user in the HTML form is plugged into the XML markup by placing a substitution variable having the same name as the form element's name attribute. Plug the data into the markup by placing a token consisting of the name of the form element preceded by a dollar sign and enclosed in curly braces, e.g., {$UNITTITLE}. Form element name attributes and the substitution variables that represent them must consist only of uppercase letters or numbers and an underscore. You'll note that some substitution variables in the EAD templates have other punctuation, such as dashes and colons, etc. These have special meaning to the template program. Their use is not covered in this guide and you should not use any puncutation marks other than an underscore.

In The HTML Form:
<input type="text" class="textbox" name="FINDAID_TITLE">
<input type="text" class="textbox" name="CALLNO">

In The XML Markup Template:
<frontmatter>
  <titlepage>
    <titleproper>{$FINDAID_TITLE}</titleproper>
    <num>Collection number: {$CALLNO}</num>


Filters and Extensions

Filters are calls to small perl programs written in an extension file that processes the data in a substitution variable in some way before plugging the result into the XML markup after clicking Submit. Functionality such as splitting prose text into paragraphs along blank lines, converting MARC format access points into EAD <controlaccess> elements, automatically normalizing dates, supplying language codes, encoding chronological lists, etc., are all implemented using filters which call perl code included in the ead.ext extenstion file.

A Filter call is implemented by following the substitution variable with an arrow (->) and the name of the filter. {$SCOPECONTENT->auto_p} for example, takes the data entered into the SCOPECONTENT form field and automatically splits it into paragraphs. In the ead.ext extension file you will find an extension called 'auto_p'. Filters can also take a single parameter enclosed in parentheses, e.g., {$SCOPECONTENT->auto_p(param)}, and any substitution variable can call more than one filter by separating them with a comma, e.g., {$SCOPECONTENT->filter1,filter2}. In such cases, the data is passed to the first filter where it is processed, then it is passed to the second filter where it is processed some more before finally being inserted into the XML markup. These two techniques, multiple filters and filter parameters, allow for more generic and reusable code.

In The HTML Form:
<tr>
  <td align="left" width="40%">
    <font face="Arial" size="-1">Head:</font>
    <input type="text" class="headbox" name="BIOGPROSE_HEAD" size="40">
  </td>
</tr>
<tr>
  <td colspan="2">
    <textarea name="BIOGPROSE" rows="20"></textarea>
  </td>
</tr>

In The XML Markup Template:
<bioghist>
<head>{$BIOGPROSE_HEAD}</head>
{$BIOGPROSE->auto_p,auto_head(bioghist)}
</bioghist>

Special post processing filters can be applied to the entire XML file to be output just before it is output by specifying them in a PostProcess parameter in a [General Parameters] section. In ead.cfg, the PostProcess parameter calls 3 separate filters to further manipulate the XML markup before it is finally delivered to the user's browser:

PostProcess   clean_attributes,delete_empties,convert_diacritics

clean_attributes deletes all attribute values with empty values, e.g., <unittitle label=""> would be converted to to plain <unittitle>. There are other syntax tricks for assuring that empty attribute values are never output but a global sweep with clean_attributes is simpler and cleaner. The clean_attributes filter is one of the simplest filters available for the EAD templates (most of the others are complicated by weird or rare exceptions that must be dealt with). Here is what that filter looks like in ead.ext:

'clean_attributes' =>

sub {
    my ($cfg, $template, $post_data, $post_tree, $var_name, $text, $node_fields) = @_;
    $text =~ s/\s+[-a-z]+="\s*"//g;
    $text =~ s/(\s+[-a-z]+=") ([^"\n<>]+")/$1$2/g;
    return $text;
},

delete_empties is similar to clean_attributes except that it deletes empty elements with no content, e.g., <unitid></unitid> would be deleted entirely. Like empty attribute values, there are other syntax tricks for dealing with this but they complicate the templates considerably. delete_empties is an example of a filter that is more complicated than it at first seems. It must delete not only empty elements but often the tags that surround them as well. It is not good enough to delete <p></p>. You'll probably also want to get rid of empty elements that surround it, e.g.,

<scopecontent>
  <head>Scope and Content</head>
  <p></p>
</scopecontent>

Finally, convert_diacritics converts various diacritical marks and special characters into their hexadecimal numerical entity equivalents. E.g., 'é' is converted to '&#x00E9;' throughout the XML markup.

Writing filters and extension files is not covered in this guide.



Substitution Variable Default Values

You can provide a default value for your substitution variables if the user chooses not to supply one when they fill out the web form. Certain data may be so important that you, as the template manager, may wish to insure that a value is always supplied. Within the curly braces of the subsitution variable token, supply the default value following a vertical bar ('|'), e.g., {$SCOPECONTEMT_HEAD|Scope and Content}. This syntax is appropriate for very short, textual strings. More complicated alternate text can be supplied using the alternate text syntax described later.

It is sometimes necessary, when using the Template program's more complex syntactical features, to explicitely specify nothing as the alternate text of a substitution variable, e.g., {$NORMAL_DATE|}. This is only ever necessary when said variables occur within associated text blocks (see below) under certain circumstances. This guide will not cover these advanced syntactical features in detail, but you will see them used extensively in the ead templates which ship with this program.



More Templates and Forms

Finally, to facilitate reuse and modularity, Forms can be parameterized when embedded within other Forms. That is, you can embed a Form using the standard '@' syntax, {@FormName} but parameterized with custom values. This allows you to use the exact same Form but with slightly different values. The EAD templates which ship with this program utilizes a very modular format, allowing you to "plug in" new form elements to support new EAD elements not included with the shipped templates.

To parameterize a form, follow the name of the Form with a colon, then a comma-separated list of parameterized values, as many as you wish:

{@TextBox:label="Dates",id="date1",name="UNITDATE"}

Within the invoked [Form ...], these parameters (they can be called whatever you like) are used by embedding a Form parameter token, enclosed in curly braces and preceded by an percent sign, e.g., {%label}

[Form TextBox]
{%label}: <input type="text" id="{%id}" name="{%name}">



Associated Text Blocks

An associated text block is a feature whereby markup and text can be explicitely associated with a specific substitution variable. For example, if the template provides a form element to input a unittitle, the generated markup would not only include the value which the user input, but also the surrounding markup:

$[<unittitle label="Title">{$UNITTITLE}</unittitle>]$

The $[ and $] tokens associate this markup with the {$UNITTITLE} substitution variable. What does this mean? It means that the markup will only be generated if there is actually a value for {$UNITTITLE}. Without the associated text block, if the user does not input a value for the UNITTITLE form element then the empty markup would still be generated:

<unittitle label="Title"></unittitle>

This is usually undesirable. With the associated text block, then no markup would be generated for unittitle at all if the user does not input a value.

If an associated text block contains multiple substitution variables, then if any one of them does not resolve to a value then the entire block will not be generated:

$[<unittitle label="{$UNITITLE_LABEL}">{$UNITTITLE}</unittitle>]$

In the example above, markup would be generated only if the user input values for both the UNITTITLE and the UNITITLE_LABEL. If the user inputs a UNITTITLE but omits a label, then markup will not be generated. This is often undesirable and there are a number of ways around this depending on how one wishes to resolve this case. If, for example, you want markup generated if the user inputs a UNITTITLE but does not input a UNITITLE_LABEL, then supply the UNITTITLE_LABEL with a default value (see Substitution Variable Default Values above):

$[<unittitle label="{$UNITITLE_LABEL|Title}">{$UNITTITLE}</unittitle>]$

You can even specify no value at all for the default value:

$[<unittitle label="{$UNITITLE_LABEL|}">{$UNITTITLE}</unittitle>]$

Now if the user specifies a UNITTITLE but no UNITTITLE_LABEL, normally you would be left with markup like this:

<unittitle label="">John Carlsbad Papers</unittitle>

But since we apply a clean_attributes post processing filter to our final markup before it is sent to the user's browser, empty attributes such as that label="" will be deleted.

A second option is to use nested associated text blocks. Associated text blocks can be nested up to a level of four deep. So a second option is create a special set of blocks around that label attribute:

$[<unittitle$[ label="{$UNITITLE_LABEL}"]$>{$UNITTITLE}</unittitle>]$

I find this clutters up the markup templates and makes them a bit harder to understand. Generally I prefer to leave this kind of thing to a post processing filter such as clean_attributes.

Text within associated text blocks can span multiple lines:

$[<relatedmaterial>
<head>{$RELATEDMATERIAL_HEAD|Related Material}</head>
{$RELATEDMATERIAL->auto_p}
</relatedmaterial>]$

Since associated text blocks make markup templates a bit harder to create and understand, I prefer to use post processing filters whenever feasible. For the <relatedmaterial> example above, I omit the associated text blocks and let the delete_empties post processing filter delete empty, unused elements and any associated <head>s.

Deletion of associated text blocks is triggered if any enclosed substitution variable does not resolve to a value. To "turn off" that trigger for some of those variables, supply a default value of nothing for those variables. In the following example, associated text block deletion has been turned off for the {$RELATEDMATERIAL_ANALOG} and {$RELATEDMATERIAL_HEAD} substitution variables. Now only a null value for {$RELATEDMATERIAL} will trigger deletion of the entire block:

$[<relatedmaterial encodinganalog="{$RELATEDMATERIAL_ANALOG|}">
<head>{$RELATEDMATERIAL_HEAD|}</head>
{$RELATEDMATERIAL->auto_p}
</relatedmaterial>]$