Project History

Architecture: Business Layer

Current Concepts Supported by Macaw

The following UML diagram describes the current business concepts that are supported by Macaw:

Fig. Arch-Business-1: The main domain concepts represented by Macaw.

The main concept in the model is a Variable, which is a concept that relates to the NSHD data. A RawVariable is a Variable that relates to a particular question on a particular form of a particular survey that forms part of the NSHD’s ongoing study. The majority of RawVariables are found on paper-based index cards, some of which are decades old. A DerivedVariable is one that has been created using data from one or more Variables.

Early on in design, I thought there was more distinction between these two types of Variables. However, over the months it has become clear that they share many of the same properties. For now, RawVariable and DerivedVariable classes may almost be considered marker classes. The main difference between the two is that a DerivedVariable comprises one or more source variables. For example, the derived variable “Body Mass Index” is derived from variables for “Height” and “Weight”. The two classes remain in case future developments cause more distinction between the two classes to occur.

In the Macaw editor, each variable is associated with the following fields which appear in drop-down lists:

The values used to populate these drop down lists come from instances of those classes, which can be edited through Macaw editing features available to super users.

A Category provides a general grouping for NSHD variables. Currently there are almost 30 of them and in future there will be more. Examples of categories include:

A CleaningState describes a general method used to clean a variable. Examples of cleaning states include:

An AvailabilityState describes which kinds of users can access a variable. Examples of availability states include:

Category, CleaningState, AvailabilityState are all sub-classes of MacawListChoice, which is a convenience class used to manage two properties: an identifier and a name. The identifier is the primary key value held in the relational database for a given instance. The name is the value that is displayed to an end-user scientist in a drop-down list.

AliasFilePath is an association between a logical location for data associated with a variable and a physical location. For example, the logical location “b17” corresponds to a UNIX file path “$lb/b17/c.dat”. The location “b27” corresponds to a UNIX file path “$lb/b27/c.sv”. Over time, members of the Data Service Group may feel a need to move the physical locations of text files from one server to another. However, this association allows the users to maintain a consistent view of the logical ordering of variable data files. In the DATADICT v3.0, the Macaw field “alias” corresponds to “cardNumber” and the field filePath corresponds to “path”. Macaw used the term “alias” because “cardNumber” referred to the legacy concept of a paper-based index card.

A ValueLabel describes an answer to a survey question. A ValueLabel comprises the following concepts:

A SupportingDocument describes a document that explains more about what a Variable means. For example, a SupportingDocument could be a standard document that describes the way Body Mass Index is calculated. There may be multiple variables that relate to body mass index calculations. For example, the index value may be computed for different years. In this case, a collection of similar variables may all reference the same SupportingDocument. In future, a Variable may also be described by multiple SupportingDocuments.

Author: Kevin Garwood

(c)2010 Lifelong Health and Ageing Unit of the Medical Research Council.