Database System Architecture:


Fig 7 Functional architecture of the database system. Querying: users interact with the WWW Server via a WWW Client Submitting data: Users submit and maintain data through the WWW Server; additionally, users at database sites can use the local Supervisor Module to enter and maintain their data without using the Internet.
The WWW Server and Clients. All interaction between end users (e.g. Neuroiologists, Fig 7) and the ZfishDB is via the WWW Server. Users may use any WWW client (MOSAIC, Lynx, Netscape, etc.) on any platform, like SUN, Silicon Graphics, IBM PC and Macintosh workstations, that can run this client software. The WWW Server extracts user input, either database queries or new data submissions, from the data stream transmitted by the WWW client and passes it to the Data Retrieval and Processing Engine. It then waits for the Data Retrieval and Processing Engine to return an appropriate response, formatted as an HTML page. Forcing all database access through the WWW Server achieves two important implementation goals. First, it supports ubiquitous and uniform access; users need only have a viable (i.e. capable of HTML form handling) WWW Client to access the database. No special purpose software is required. Second, the data cannot be entered or retrieved except through the WWW Server. This enforces the common logical data model and dictionary.

We plan to use an essentially unmodified HyperText Transfer Protocol (HTTP) server as the WWW Server for the ZfishDB. HTTP server software is publicly available from both CERN and NCSA; commercial versions (e.g. Netscape) are available as well. We will implement this module of the ZfishDB by writing scripts to extract and decode user input, interact with the underlying processing components of the ZfishDB, and return results to the client.

Challenges and Limitations. Submitting image data presents a particularly interesting challenge given the WWW front end to the ZfishDB; there is (currently) no support in HTML, the markup language interpreted by WWW clients, for passing anything but text strings from a WWW client to a WWW server. There are a number of possible solutions to this problem; to avoid necessarily extending the HTML language and creating special-purpose WWW clients for all possible computing platforms, we propose a low-tech solution. The user interface allows researchers to transfer (e.g. using ftp) image files to a special directory on the server. They then formally submit the data using an HTML form (see Fig 4) specifying the image file name, and the interface integrates the image and its corresponding descriptive information into the database. We believe this will be a non-issue soon; requests to extend the HTML standard to support a "file" input type are currently under review by the HTML standards committee.

Another challenge arises because HTTP is a stateless protocol, based on discrete data retrieval operations. Thus, as a query is constructed, information is not maintained as the user moves from one HTML page to the next, setting various constraints. In one solution, the system could embed (e.g. in a TEXTAREA field) the state information within the HTML pages it dynamically generates and returns to the user. This information would then be returned to the system when the user submits the forms contained in those pages. Alternatively, the server could create a unique "state" file for each user. The name of this temporary file would be embedded in all transactions; after some interval of inactivity, the file would be purged.

Another potential limitation is that the use of a single WWW server will affect reliability and performance if the server goes down or is overloaded by requests. The parallel processor on which the system will be developed and tested, will serve as a backup for failures and the WWW servers at the other database sites will provide additional access points to the system, thus sharing the load.

Data Retrieval and Processing Engine. Queries or data submissions passed to the system via the WWW Server are processed by the Processing Engine (Fig 7). Processing differs according to the user's action: a broad database query, a request for a specific record, or a data submission.

The bulk of the Data Retrieval and Processing Engine will be implemented in C++ by developing logical objects corresponding to the various components (e.g. SQL Generator, HTML Synthesizer, Fig 7). The Processing Engine acts as an interpreter, translating the inputs received from the WWW Server into various database access requests, integrating the results, and formatting them as HTML output. We have extensive experience [Fickas89;Douglas92] with the programming tools and techniques required to implement such interpreter functionality.

Challenges and Limitations. Implementing the Distribution Manager is a key challenge. The Distribution Manager must interact with the database components at the various sites, distributing database queries and collecting the results. Because these interactions are primarily file transfers between UNIX hosts, we propose to use a well-understood IP protocol like UUCP. Alternatively we could use a custom network file server as we have previously developed.

Supervisor Module. The Supervisor Module provides tools to assist the site supervisor in maintaining the system. For example, the site supervisor might need to remove erroneous data from the system or restructure the database in response to changes in the logical data model as determined by the zebrafish community. The Supervisor Module automates integration into the ZfishDB of data posted to public databases like GenBank and MEDLINE. It provides bookkeeping operations for collecting and reporting database access activity. We will also implement an auxiliary X-Window based interface to the ZfishDB logically equivalent to the interface provided by the WWW Server. Thus, data submission for local researchers can be streamlined, bypassing the WWW Server. This is particularly sensible because we propose to place sites in the labs producing the most data.

We plan to implement most of the Supervisor Module in C++ using OSF Motif to create the user interface. Interfacing with the Illustra database will be straightforward because Illustra provides an API for C/C++.

Databases. The database components of the ZfishDB store, organize and provide access to the submitted data. During initial system development, the ZfishDB will begin with a single database at the Eugene site. We will also use a parallel processor in the Computational Science Institute for testing the client/server WWW software and as backup for the database. Later, other sites will be added to expand the system and to streamline submission of data to the database. The sites are distributed geographically, making the ZfishDB is a "distributed database", a single database system distributed over a number of nodes. Each site in the ZfishDB implements the identical logical data model of the Illustra database system. The homogenous view of the ZfishDB presented to the user is constructed by the Distribution Manager. This architecture contributes significantly to system robustness and performance:

We expect system performance to be limited primarily by the bandwidth, traffic patterns, and transfer rates of the Internet. This limitation will affect primarily image data. We plan to keep most in-line images under 20k, allowing users to download higher resolution images on demand.
The Zebrafish Database

Continue on to Implementation of Specific Aim 2


Return to Table of Contents