MARC REVIEW

MARC Review will search a MARC file of any size for any content, either plain text or MARC coding, or both, and produce highly-customizable reports of the search results.

When setting up a search in MARC Review, it is important to think literally. The purpose of this utility is not to replace your OPAC, but to give you the power to drill down into your data at a level the OPAC is not generally aware of.

NOTE: This program is launched from MARC Report, and the source file is set to whatever MARC file you have (most recently) selected in MARC Report. The name of the current source file is always visible in the status bar at the bottom of the screen. To switch to a different source file, double-click on the picture on the left before you go to the next screen (in either MARC Review or MARC Global).

REVIEWS

In MARC Review, you first specify the data to search for, which we call a pattern, and then you specify the output options. The combined set of these options is referred to as a 'review'.

Reviews can be saved (the Save option appears at the end of the run). When you want to run a Review that you have previously saved, press the 'Load' button.

The remainder of this help page describes the components of a pattern. There are separate help pages available for output options and saved reviews.

PATTERNS

Whether using MARC Review or MARC Global, a good understanding of the implementation of patterns is a requirement for your successful use of the software.

The basic unit of the MARC Review search is called a pattern. Each pattern that you specify will be applied to each tag in each record. To run a MARC Review, at least one pattern must be specified; there is no limit on the number of patterns that can be specified.

If you have specified more than one pattern, you can use the navigational buttons–'Previous Pattern' and 'Next Pattern'–to move back and forth among your patterns. The 'List Patterns' button displays all of the patterns in the Review. The 'Clear' Pattern button clears the current pattern form; the 'Delete' pattern button removes the pattern from the Review altogether.

When you have finished entering patterns, click the 'Next' button at the bottom of the screen to go to the next step.

The components of the pattern form are as follows:

TAG

Enter any valid MARC tag (000-999). The Tag is the only required field on the form.

If you are searching for data in a fixed field, enter the field (eg '008'), then press the TAB key. It is important to press the TAB key since this action displays the Fixed Field templates which make it very easy to search the Fixed Fields.

It is possible to search a range of tags using the 'X' feature. For example, to search all tags beginning with '6' (i.e. 600-699), enter '6XX' in the tag box; to search all tags beginning with '24' (i.e. 240-249), enter '24X' in the tag box.

You can also search the whole MARC record (000-999) for a string by entering 'XXX' in the tag box. See the section on 'Search Whole Record' below.

All of the remaining fields on the form qualify the MARC tag specified here. When an Indicator, Subfield, etc., is specified on the same form, they are 'AND'ed together with the Tag.

OCC (TAG)

The first occurrence field refers to the MARC Tag. You can specify a Tag's occurrence by number (by entering '1', '2', '3', etc.) or by making a selection from the pulldown list. If left blank (the default), any and all matching occurrences of the tag will be found.

See the 'Note about Occurrences' at the bottom of this document for details on how to find a relative occurrence of a pattern.

INDICATOR 1 / INDICATOR 2

If you want to find only tags with/without a certain indicator value, enter that value in the appropriate indicator position. Otherwise, leave these fields blank (the default).

Hint: If you want to search for an indicator value that is not valid, like a letter, you can turn off the form validation by right-clicking on the 'Ind 1' or 'Ind 2' label (or clicking 'Retry' when the form validation error pops up).

If you want to find indicators that match a range of values, you can use a regular expression in one of the Ind boxes. For example, if Tag=245 and Ind2 is [5-9] and Regular Expression is checked, the program will find all 245 fields with a second indicator in the range 5..9.

Note: To use a regular expression in an indicator box, the DATA box for the pattern should contain a single period. (The reason for this is that the routine that validates the regular expression requires a non-empty DATA value; entering a period in a regular expression will match any non-null value, so it will not harm the pattern.)

SUBFIELD

If you want to find only tags with/without a certain subfield value, enter that value here. Otherwise, leave this field blank (the default). If the MARC Tag specified above refers to the leader, or a fixed field, the 'Subfield' box will be automatically filled in by the program when the Fixed Field form closes.

SUBFIELD PATTERNS

If more than one subfield code is entered into the SUBF box, the program assumes you want to search for tags with matching subfield patterns. This function is useful for research, troubleshooting, and finding examples of patterns of subfield code usage.

For example, if you enter 245 in the TAG box and 'ahb' (without the quotes) in the SUBF box, the program will match all records where the 245 field contains subfields $a, $h, $b (in that order). The 'ahb' subfield pattern will also match fields that contain additional subfields as long as the specified subfields are present in the specified order; 'ahb' will match $a $h $b (exact match of the pattern), $6 $a $h $b, and/or $a $h $b $c (because $a $h $b are present in the specified order). However, 'ahb' will not match $a $b $h, nor $a $n $h $b (because the subfields are not in the specified order).

Hint: To force a more literal match, enclose the subfield pattern in quotes. For example, in X10, a search for 'an' (without the quotes) will match many headings with $b between the $a and $n. But if we enter 'an' (with the quotes) in the SUBF box, the program will not match fields with the $a $b $n pattern; it will only find fields with a $n immediately after the $a.

Also, enclosing subfields in quotes is the only way to match subfield patterns where any subfields inthe pattern are repeated. For example, to find cases of $a $b $b $c in the 260, you must enter 'abbc' (with the quotes); entering 'abbc' (without the quotes) will effectively be treated as if it were 'abc'. Finally, if you also enter a pattern in the DATA box, the program will match that pattern in any of the subfields specified above.

NB. Enclosing subfields in single quotes (or double-quotes, as below) will generate a form validation error; you will need to override this warning by clicking 'Retry' on the resulting pop-up form.

It is also possible to specify a negated subfield pattern using regular expression syntax. For example, if you enter 6XX in the TAG box, and '[^vxyz]' (without the quotes) in the SUBF box, and 'edited' in the DATA box, the program will find all subject headings without any subdivisions. Another example, if you enter TAG=245, SUBF='[^c]' (without the quotes), and DATA=edited, the program will find all 245 fields that contain the word 'edited' anywhere but in subfield $c.

Finally, a regular expression can be entered into the subf box if it is enclosed in double-quotes. For example, if you want to find all 245 tags that contains at least five subfield delimiters, you could enter this into the SUBF box:

"....."

where each dot will match a subfield code. Or, to find all 245 tags that contain combinations of $n and $p, enter the following into the SUBF box:

"a[np][np]"

Do not set the Regular expression checkbox for either of the last two examples. The program will automatically assume a regular expression when it sees enclosing quotes in the SUBF box.

OCC (SUBF)

The second occurrence field refers to the subfield. You can specify a Subfield's occurrence by number (by entering '1', '2', '3', etc.) or by making a selection from the pulldown list. If left blank (the default), any and all matching occurrences of the subfield will be found.

DATA

If you want to find only records containing certain data in the tag/subfield/etc. specified, enter that data here. This field may be blank (the default).

Do not enter indicators in the Data box.

To embed a subfield in the Data pattern, press <Ctrl>D to indicate the MARC Subfield delimiter. However, do not use both the SUBFIELD box and <Ctrl>D in the DATA box at the same time; use only one or the other.

The Data box supports regular expressions and embedded booleans (described in detail below).

REGULAR EXPRESSIONS

If the Regular Expression box is selected, the program will treat the pattern entered in the DATA box as a regular expression. The most common metacharacters used in regular expression patterns are listed below:

.       matches any single character
*       matches 0 or more of the preceding character
^       anchors match to the beginning of the data 
$       anchors match to the end of the data
[       begin character class definition
]       end character class definition	
-       within a character class, indicates a range of characters:
\       removes (escapes) special meaning from above metacharacters 

For example, if the Regular Expression box is checked, and your data pattern contains:

 201[0-3]

the program will match any data that contains '2010', '2011', '2012', or '2013'. If the Regular Expression box was not checked, the program would literally try to match the string '201[0-3]'.

SPECIAL NOTE: Although '^' usually means to anchor the match to the beginning of the data, within square brackets, '^' negates a match. Therefore, to find all instances of invalid subfield coding, we could use the following expression:

±[^0-9a-z] 

This would match any subfield delimiter ± that is followed by a character not in the character class 0-9a-z.

Do not use commas to separate individual values in a character class. For example, this is the correct way to pattern match the ten numeric digits and the uppercase letters 'A', 'B', and 'C':

[0-9ABC]

But the following regular expression will also match any string containing a comma in it:

[0-9,A,B,C]

PCRE

Beginning with version 236, MARC Report uses Perl Compatible Regular Expressions (PCRE), which greatly expands the pattern matching capabilities of previous versions.

If you are planning to make full use of the PCRE support in the program, then you should not use the curly braces technique (described above) for matching diacritics inside a regular expression. Instead, when matching a diacritic in a regular expression, use '\x' + the hex code of the character.

Applying this to the example used above: to find all title fields that begin with the diacritic Ayn, search:

TAG=245 SUBF=a

if leader/09 = 'a' set the DATA box to: ^\xCA\xBB if leader/09 = ' ' set the DATA box to: ^\xB0

You can easily get a hexadecimal listing of all characters in a file by running the MARC Analysis utility on it; when the report is ready, scroll down to the bottom and look for MARC-8 or UTF-8 character set usages tables.

For more examples on using PCRE regular expressions in MARC Review, please visit: http://www.marcofquality.com/w/doku.php?id=236:pcre_and_mr

For official PCRE documentation, visit pcre.org

FINDING DIACRITCS

To search for a character not on your keyboard, you will need to know the value of the character(s) in hexadecimal format (there are many free pages on the web that provide this information–search for 'character codes').

To enter the code, prepend '\x' to the hex value.

For example, to search for the copyright character, you might enter

TAG=264  SUBF=c  DATA=\xC2\xA9

if your records are unicode, or

TAG=264  SUBF=c  DATA=\xA9

if your record use MARC-8 encoding.

This is not a regular expression on its own, so it does not check the regular expression box checked.

However, if you were searching for a copyright symbol only at the beginning of the subfield $c, you would instead enter (using the unicode example):

TAG=264  SUBF=c  DATA=^\xC2\xA9

and check the Regular expression' box.

DIACRITICS (Deprecated)

The old way to search for diacritics is now (version 248) deprecated and may not work in future versions of MARC Report and MARC Global.

This method entered the numeric value of the diacritic character enclosed in curly braces. You may use either decimal or hexadecimal notation for this number; decimal numbers must be zero-filled to three digits and fall within the range 000-255; hex numbers must begin with a 'x' and fall within the range 00-FF. For example, decimal {031} or hex {x1F} will match any MARC subfield delimiter. You can use this technique to search for diacritics, non-english script, special flags used in the leader, etc. This method required that the regular expression box be selected, whether or not the rest of the data specified contained a regular expression.

For example, to search for the copyright character in unciode records, enter:

TAG=264  SUBF=c  DATA={xC2}{xA9}

and select the regular expression checkbox.

Even though curly braces have another meaning in standard regular expressions, this works because the program performs a character substitution for 'curly brace' characters before it evaluates regular expressions (thus, the regular expression engine never sees the curly braces).

CASE SENSITIVE

This option controls whether case-sensitive matching is performed on the data to be matched (if any). For example, if this box is checked, and your data pattern contains 'The', none of the following will match: 'the', 'thE', 'tHE', 'THe', 'tHe', or 'THE'. This box is checked by default. This option also modifies regular expressions.

RULE

The Match Rule parameter defaults to 'AND', in which case the program will find (only) those records that match the elements specified in a pattern.

The Match Rule can also be set to 'OR' if more than one pattern is present. An 'OR' rule always binds to the previous pattern. For example, if the first pattern is 'AND 650', and the second pattern is 'OR 651', the result would be to match records containing either tag 650 or tag 651. If instead, the second pattern is 'AND 651', the result would be to match records containing both tag 650 or tag 651.

'OR' can also be bound to a preceding pattern that used the 'NOT' rule (see below).

The Match Rule can also be set to 'NOT', 'NONE', or 'NOWHERE'.

Use 'NOT' to find records where the tag in your pattern is not present. For example, 'NOT 650' matches records without any 650 tags. Also use 'NOT' to find records where the tag in your pattern is present but one or more occurrences of the tag contains an element (indicator, subfield, data) that is not what you specified. For example, 'NOT 650 I2=0', matches records that contains 650 tags where the second indicator is not 0; it does not match records that do not contain 650 tags. Another example, 'NOT 035 $a (OCoLC)', matches records with 035 tags that contain $a without '(OCoLC)'; it does not match records that do not contain 035 tags without $a.

Use 'NONE' to find records where a repeatable tag in your pattern is present, but “none” of the occurrences of that tag match your pattern. For example, 'NONE 650 I2=0' matches records where none of the 650 tags contain a second indicator that is 0; 'NONE 035 $a (OCoLC)' matches records where none of the 035 tags contain '(OCoLC)' in the $a. NONE requires that an element other than the tag (indicator, subfield, data) be specified; it cannot be used with simply a tag in the pattern, as in 'NONE 650'. Use 'NOT 650' instead.

Use 'NOWHERE' to find records where your pattern does not match, whether the tag is present or not. For example, 'NOWHERE 650 I2=0' matches records with 650 tags that do not have I2 set to '0', as well as any records that do not have any 650 tags at all. Another example, 'NOWHERE 035 $a (OCoLC)', matches records where either none of the 035 tags contain '(OCoLC)' in the $a, or there is no 035 present at all.

SPECIAL RULES

'AND/SAME OCC'

When a repeatable tag is specified in more than one pattern, and the rule in each pattern is 'AND', the program will match records at the tag level. For example:

Pattern1: TAG=651 SUBF=a DATA=United States Rule=AND Pattern2: TAG=651 SUBF=v DATA=literature Rule=AND

This review will match records where any 651 tag contains 'United States' in subfield $a, and any 651 tag contains 'literature' in subfield $v. The two strings do not need to be present in the same occurrence of the tag. For example, a record containing the following 651 fields will match the pattern above:

651 0$aUnited States$xEmigration and immigration$xHistory. 651 0$aNew York (N.Y.)$xBuildings, structures, etc.$vJuvenile literature.

To require that both patterns be present in the same occurrence of a tag, use the 'And/Same Occ' rule. If the 'And/Same Occ' rule was used in the example review–

Pattern1: TAG=651 SUBF=a DATA=United States Rule=AND/SAME OCC Pattern2: TAG=651 SUBF=v DATA=literature Rule=AND/SAME OCC

–then the record containing the 651 example above would not have matched; instead, only records with a 651 that matches both patterns will match; for example:

651 0$aUnited States$xEmigration and immigration$xHistory$vJuvenile literature.

'DATA'

When you want to perform an action in MARC Review or MARC Global only when the data in two different fields is the same (or is not the same), use the 'DATA' rules.

'Data' allows you to compare data in two fields without knowing the content of the fields. There are three data match rules: 'Data', 'And/Data', and 'Not/Data'.

The data match rules must be used in pairs: the first match rule must always be 'Data', and the second match rule in the pair can be either 'And/Data'or 'Not/Data'.

For example, to find all records where 049 subfield $a contains the same data as 949 subfield $a (regardless of what that data might be, as long as some data is present), use the following review:

TAG=049 SUBF=a DATA= RULE=Data TAG=949 SUBF=a DATA= RULE=And/Data

Another example, to find all records where 041 subfield $a contains a code that is different from that in 008/35, use the following review:

TAG=041 SUBF=a DATA= RULE=Data TAG=008 POS=35 LEN=3 DATA= RULE=Not/Data

When using the 'Data' rules, the DATA box on the pattern form must always be empty.

ABOUT RULES

When we refer to a pattern in MARC Review, we usually mention the match rule at the beginning, even though on the pattern data entry form the match rule appears near the end; some examples of this marc-review-speak are:

AND 650 $x=Fiction
NOT 1XX AND NOT 245 I1=1

The list of rules is context-sensitive; that is, the possible values in the 'Rule' list depend on the other values (if any) that have been entered on the current and (if applicable) previous pattern form(s). Therefore, it is normal that not all of the rules described above will be present in the list at a given time.

SEARCH WHOLE RECORD

It is possible to search the whole MARC record by entering 'XXX' in the TAG box, and entering the string you want to search for in the DATA box. For this type of review, DATA is a required field. The Case Sensitive, and Regular Expression options can be used as described above.

One requirement of the 'XXX' pattern is that it must be the first pattern that is specified (if more than one pattern is being entered).

Also available in the Whole Record search is the 'Data Occ' box. This option defaults to 'First' or any occurrence. If you set this option to 'All', and also select the 'Matching Tags' output option (see below), then all of the tags containing the pattern (whatever they might be) will be displayed.

When setting up your output options for this type of review, the default is a 'Full Record', since the whole record (beginning with the base address) will be searched as a single data string, instead of in tag-by-tag fashion. However, there is a special output type called 'Matching Tags' that will display only the tags that contains matches for your pattern. This can make it easier to visually identify certain types of patterns.

EMBEDDED PATTERNS

It is possible, and sometimes necessary, to specify multiple patterns in a single 'DATA' pattern. This is done by stringing the patterns together, separating each one with one of the boolean symbols listed below.

The following boolean symbols are supported within the DATA box:

&& = and	
|| = or 	
!! = not	

You can use the following English equivalents for the above interchangeably, as long as they are enclosed in angle brackets (they are not case-sensitive):

<and>	= &&	
<or>	= ||	
<not>	= !!	

An example of each of these three boolean expressions follows.

'And' example: 040 $d DLC<and>OCoLC

True if both 'DLC' and 'OCoLC' are present in $d in the same 040

'Or' example: 035 $a OCoLC||TMQ

True if either 'OCoLC' or 'TMQ' are present in 035 $a

'Not' example: 040 $d OCoLC<not>DLC

True if there is a $d 'OCoLC' and not a $d 'DLC' in the same 040

These patterns can be combined with the standard Match Rules 'AND', and 'NOT' (eg. 'NOT 035 $a OCoLC||TMQ'). The Match Rule is applied AFTER the data is evaluated.

NOTE: If you use a regular expression with an embedded boolean, it must be repeated for each argument. For example: 949 $a = '^PB<or>^PER' (The '^' is repeated before both PB and PER).

WHEN TO USE EMBEDDED PATTERNS

Whenever you find yourself entering two separate patterns for the same MARC data element, you should consider using embedded patterns.

The standard pattern match design works well in most cases. However, especially when using 'NOT', some situations require the use of embedded patterns. For example, how can we delete every 035 that does not contain either '(OCoLC)' or '(DLC)' in the subfield a? Or how can we find every 040 that does not contain both $a and $c?

Specifying two separate patterns will not work, since each pattern is run on each tag. Therefore, the pattern 'NOT 035 $a (OCoLC)' will match 035s with '(DLC)' in the $a, and the pattern 'NOT 035 $a (DLC)' will match 035s with '(OCoLC)' in the $a.

The way around this is to use an embedded pattern: 'NOT 035 $a (OCoLC)||(DLC)'. This pattern matches any 035 tag where $a does not contain either '(OCoLC)' or '(DLC)'. For the second example use: 'NOT 040 $a<and>$d'; this pattern matches any 040 tag that does not contain both $a and $d.

FOR BEST RESULTS

Do not specify the same MARC data element in more than one pattern. Consider the example:

Pattern1: TAG=651 SUBF=a DATA=United Rule=AND
Pattern2: TAG=651 SUBF=a DATA=States Rule=AND

Both patterns reference the same MARC data element (651 $a); this review would be better formed by the following:

TAG=651 SUBF=a DATA=United<and>States 

If you are stringing together a long list of terms joined by 'or', do not use this type of pattern (as it is very inefficient for long lists). Instead, use a list pattern–see below. By 'long', we would say anything over 50 items.

LEXICAL COMPARISONS

MARC Review can also compare a user pattern against a MARC data string using the following lexical comparison operators:

  1. gt The MARC data is greater than the user pattern
  2. ge The MARC data is greater than or equal to the user pattern
  3. lt The MARC data is less than the user pattern
  4. le The MARC data is less than or equal to the user pattern

To use these operators in a pattern, enter the operator, followed by a blank space, followed by the string you wish to compare, in the DATA box. This syntax must be followed exactly; if the dash is not the first character entered, or if the blank space after the operator is missing, the review will not work as intended.

This option is most useful for comparing data in fixed fields, and for comparing numerical data in variable fields. For example, you can use a lexical comparison to quickly and easily pull out all records for items published before/after a certain date:

In the TAG box, enter 008 and press <TAB>.
Click the Format icon (Book + Question Mark) and select 'Any Format'
Enter a publication date (eg. '1980') in the Date 1 box. Click Save.
In the DATA box, type '-le ' before the '1980' ('-le 1980')

This review will find all records in the file with an 008/Date 1 that is less than or equal to 1980. (Change the '-le ' to '-gt ' to find all records in the file with an 008/Date 1 that is greater than 1980.)

Also, you can combine this review with another review. So, for example, you could find all records that have a 6XX beginning 'United States$xHistory' and a publication date earlier than '1970'.

Lexical comparisons can be used to compare data in variable fields, as long as no other special characters (regular expressions, embedded booleans, etc.) are used in the pattern that follows the operator. For example, you could use this technique on call numbers:

In the TAG box, enter 050; in the SUBF box, enter 'a'
In the DATA box, enter '-ge PZ7'
Click the 'Next Pattern' button and repeat the first pattern ...
Except in the DATA box, enter '-le PZ8'

This review will find all records with LC Class numbers between PZ7 and PZ8

Note that in a variable field, the data to be lexically compared should be at the beginning of a tag or subfield. Indicators and subfields in variable fields are ignored unless they have been specified as part of the pattern.

Also note that this type of comparison will not have very good results when used on 2-digit year strings (such as those found in the MARC 008 Date Entered element). Dates before our current century will always compare greater than those of the present time; for example, Jan 1, 1999 will be seen as greater than Jan 1, 2013, because:

990101 > 130101

LIST SEARCHING

If you have a long list of items that you want to search, its possible to search them in a single step instead of creating a new pattern for each item.

For example, you could search your database for a list of control numbers, LCCNs, or ISBNs, or a list of call numbers, or a list of values from a codelist, and so on. Each time a record matches, you can use any of the usual MARC Review/MARC Global output actions.

To search a list of items you need a textfile containing these items. Each item in the list must be on a separate line. Each item should be entered exactly as it would be entered in the DATA box of a MARC Review pattern. Do not add any extra blank spaces to a line unless they are part of the item to be searched. The list must not contain any null (empty) lines, as a null line represents the end of the list to the program.

Once you have such a file of items, start MARC Review, goto the Pattern form, and enter the TAG, and SUBFIELD (if applicable) where the data from the list will be found.

Next, tab down to the DATA box, and right-click on it. An explorer window will appear–navigate to the file that contains your list, and select it. MARC Review check the file, and if it is acceptable, report the number of items loaded, put the filename in the DATA box, and flip the color of the 'Data' label to blue.

ABOUT LISTS

The maximum list size in the current version of the program is 5000 items. If your list is larger than this, an error message will appear and the list will be rejected. Simply open the list in your text editor, make it smaller, and try to load it again.

The list cannot contain data from two different fields, such as a mix of LCCNs and ISBNs. However, you could use one pattern to match a list of LCCNS and a second pattern, in the same review, to match a list of ISBNS.

All items in the list are joined together by an 'OR' operator. However, you can easily find records that do NOT match an item in your list by setting the List pattern's rule to 'Not'.

List items are considered 'literal'–so do not use special characters in list items, eg. MARC Subfield delimiters, boolean operators, etc. There is one exception, and that is the use of curly braces to specify a diacritic (See 'Finding diacritics' above).

Regular expressions per se are not supported in a list search. However, you can tell the program to match 'Whole words only'. (When a list is loaded into a pattern, the 'Regular expression' option is hidden and replaced by the 'Whole words only' option). When this option is selected, then then each item in the list must completely match the MARC data being searched.

For example, if 'Whole words only' is selected, and the MARC Data field/subfield contains 'Cancer research', and your list contains 'Cancer', there will be no match. Au contraire, if 'Whole words only' is not selected, then the pattern matching is left-anchored. In this case, a list item of 'Cancer' will match 'Cancer research' in the MARC data, but not 'Breast Cancer'.

If you need to use a regular expression with a list of items, or search for terms within terms, then refer to the 'Embedded Boolean' topic above.

TYPES OF LIST SEARCHING

MARC Review supports two different types of list searching:

1. Simple list
2. Value list

The “simple list” search uses the default MARC Review pattern match behavior. The program will check the specified tag/subfield for the presence of any string in the list. Therefore, a search against Tag 650 $a for “libraries” (not case-sensitive) would match the following headings:

$aDigital libraries
$aLibraries and people with disabilities

The “value list” search type is implemented differently. The program will match the content of the specified tag/subfield against the value list. Therefore, a search against Tag 650 $a for “libraries” (not case-sensitive) would match only the heading:

$aLibraries

–it would not match headings where “libraries” is a sub-string.

For more info and examples on list-searching, please visit the wiki: http://www.marcofquality.com/wiki/mrt/doku.php?id=help:mr_list_search_236

TAG/SUBFIELD LENGTH OPTION

MARC Review supports a method that will let you filter records containing tags or subfields of a specified length. For example, some OPACs may truncate a display field at a certain number of characters, and some systems may return an error when trying to load a record with a very long field. You could use this new MARC Review option to identify these records (and perhaps modify them accordingly).

In the TAG box, enter the tag number of the field you want to check. The usual MARC Review options apply here (e.g., '5XX' will search all the Notes fields) except that you cannot use the Whole Record ('XXX') pattern match.

Leave the SUBF box blank to check the length of the whole tag; or, if you want to check the length of a specific subfield, specify a subfield here.

The DATA box is where you specify the length. The format that must be used is:

'-'	(a dash)
operator (see the list below)
'#'	(a pound sign)
number	(the length of the field or subfield)

The operators are:

'ge' 	Greater than or equal to
'gt' 	Greater than
'le' 	Less than or equal to
'lt' 	Less than
'eq' 	Equal to
'ne' 	Not equal to

Examples:

  1. ge#512 Field/Subfield length greater than or equal to 512 bytes
  2. gt#256 Field/Subfield length greater than 256 bytes
  3. le#512 Field/Subfield length less than or equal to 512 bytes
  4. lt#512 Field/Subfield length less than 512 bytes
  5. eq#512 Field/Subfield length equal to 512 bytes
  6. ne#512 Field/Subfield length not equal to 512 bytes

This syntax must be followed exactly (so that MARC Review doesn't confuse this special option with a standard pattern match or lexical comparison). There can be no spaces anywhere in the string, no commas in the number, etc.

When computing the length of a whole tag, indicators (if applicable), subfield delimiters (if applicable), and the field delimiter are all included. For example, this would mean that the correct length for an 008 is 41 bytes (add 1 for the field delimiter), instead of 40. When computing the length of a single subfield, the subfield delimiter (if applicable) is included.

HOW TO FIND RECORDS OVER A CERTAIN LENGTH

The short and simple answer is to use the MARC Verify Utility (the option 'Remove … records with Record Length > x' will redirect all records of the specified length to a separate file).

But you can also do this in MARC Review, and with quite alot more flexibility. The trick is to set a pattern on the Leader Record Length element.

On the pattern form, enter '000' in the TAG box, then press TAB to bring up the field field editor, then press Save to dismiss it. Now, delete the five blank spaces in the DATA box (which the fixed field editor put there) and replace them with '-ge 09000' (without the quotes). That's it.

This pattern will match all records where the leader's record length is greater than or equal to '09000'. There are two notes. First, we don't use the '#' here after the '-ge' operator, because '#' means we are looking for the length of a field (and we want the length of something else). Second, the length of our number must be the same length as the number of bytes from the leader we are comparing it to; and as you know, the leader record length field is normalized to five numeric characters. If we were to enter '-ge 9000', then the pattern would fail (unless you also changed the POS box to 01 to skip the first leader byte … )

NOTE ABOUT FIXED FIELDS

When working with Fixed Fields, the typical action is to enter the field in the Tag box, and then press the TAB key to bring up the Fixed Field Template. However, there are some cases when you do not want to do this.

If you simply want to test for the presence of a Fixed Field (eg. NOT 008), more than one Fixed Field (eg. AND 008 OCC=2), or search a complete Fixed Field without respect to position or length or format (eg. AND 008=' eng '), then instead of pressing TAB, click directly on the Data, Occ, or Rule box, as applicable.

The key is to keep the Pos and Len boxes empty (and pressing TAB will always set them to something when the Templates form closes). Whenever the Pos and Len boxes are empty in a Fixed Field pattern, the program will search the whole field for your pattern, without regard to position, length, or format.

You can also manually enter a position and a length for a Fixed Field. But since the default behavior is to not show these boxes, you will have to: enter a fixed field in the TAG box, press Tab, then press Cancel, to make the POS and LEN boxes appear on the form.

You can use this manual entry, for example, to search for data in fixed fields that is greyed out in the fixed field templates (like certain leader bytes that are not intended to be changed by a cataloger).

Finally, if you are on the Pattern form and need to re-display the Templates form, simply put your cursor in the TAG box and press TAB (assuming a fixed field has already been entered in the TAG box).

Hint: You can use MARC Review to find Fixed Fields that are not the correct length. For the 008, enter '008' in the TAG box, press TAB, and enter a '.' (dot) in the last box ('39/Cat Src'). This assumes the 'Any Format' template is selected when the 008 form opens–and it should be, as it is the default. Then click 'Save', select the 'Regular Expression' option, and set the RULE to 'Not'.

Finding 007 length problems is more time-consuming because the 007 is a different length for each format; the trick is to follow the above steps for EACH type of 007. For example, enter '007' in the TAG box, press TAB, select the MAP template, enter a dot in the last box ('07/Aspect'), click Save, select the 'Regular Expression' option, and set the RULE to 'Not'. This will find 007s that begin with 'a' that are not the correct length.

In either case, be sure to save the results as MARC records so that you can fix them in MARC Report.

NOTE ABOUT TAG OCCURRENCES

An additional feature of the Tag occurrence option is the ability to specify a relative occurrence vrs an absolute occurrence. This is necessary to allow advanced searching of repeatable fields (one of the most complex aspects of MARC, at least from the program's perspective).

In MARC Review (and MARC Global), all user-entered occurrences are absolute by default. This means that whenever you enter a specific occurrence number, the program will find a match only if 1) that occurrence of the tag exists, and 2) that occurrence of the tag matches your pattern. In short, the program simply goes to the specified occurrence and tries to match the pattern.

The following example shows a record that contains two 035 tags:

035  $a(OCoLC)12345678
035  $a(FMlbTmq)2003012345

If we were to enter a pattern consisting of '035' in the Tag box, '2' in the Occ box, and 'FMlbTmq' in the Data box, then this record would match–because the second occurrence of 035 contains 'FMlbTmq'.

However, if we used the same pattern, the following record would not match, because the second occurrence does not contain 'FMlbTmq'.

035  $a(FMlbTmq)2003012345
035  $a(OCoLC)12345678
035  $a(FMlbTmq)2003012345

If we want to find all records with two (or more) 035 tags that contained the same pattern (eg. 'FMlbTmq'), then we are going to have to look for a 'relative' occurrence.

To set-up a 'relative' occurrence pattern you must click on the 'Occ' label. When you do this, the text for 'Occ' will change in color from Gray to Red (click it again to toggle the color from Red back to Gray; a memory aid for this might be 'red is for relative').

When a relative occurrence is specified, the program will evaluate every aspect of the pattern that you specify before it checks for occurrence. Therefore, if we use the same pattern as above (Tag=035, Occ=2, Data=FMlbTmq), and then click on the Occ label (to set it to relative/red), then the first record will not match (because there is only one matching occurrence and we asked for two), but the second record will match (because the program matches the pattern on the first and third occurrences, and then counts the matching occurrences, and so it matches on the second 'FMlbTmq').

In the current version, the Relative Occ option applies only to tags (not to subfields).

ADDITIONAL HELP

If you come across a problem that you cannot solve using the methods on this page, please send us an email. There is usually a workaround that will do the job; or we may be able to update the program to meet your need. Don't be shy about this. We want MARC Review to do the things that need to be done by our customers.