New UTF8 category in MARC Analysis
In the past, MARC Analysis maintained a table of UTF8 sequences like this:
UTF-8 Multi-Byte Sequences (each counted as 1 character) Hex Visual Count Percent xE3 x81 x88 え 1 0.000 xE3 x81 x8E ぎ 1 0.000 xE3 x81 xA8 と 1 0.000 xE3 x81 xB2 ひ 1 0.000 xE3 x82 x8B る 2 0.000 xE3 x82 x92 を 1 0.000 xE3 x83 x88 ト 1 0.000 etc.
Beginning with 2.32, a new UTF8 category will follow the one above; this new section counts the number of multibyte UTF8 sequences found in each record and in each tag. This category will look like this:
UTF-8 Multi-Byte Sequences by MARC Tag and Number of records: Tag Seq Records Found In 020: 113 111 100: 226 129 110: 20 13 130: 21 16 210: 5 5 222: 16 14 240: 21 15 245: 1120 414 etc.Back to top