Differences

This shows you the differences between two versions of the page.

Link to this comparison view

details:unicode [2015/08/24 17:10] (current)
Line 1: Line 1:
 +====== Unicode in RIMMF3 ======
 +
 +Beginning with update 141206 (mid-December 2014), all non-ASCII data in RIMMF3 (whether created or imported) is '\u' encoded. 
 +
 +For example:
 +  \u00E9
 +
 +where '00E9' is a hexadecimal number representing the UTF-16 code point of the character.
 +
 +This character encoding is Unicode-compatible. 
 +
 +Beginning with update 150801, the RIMMF application itself supports the display of Unicode characters. There is no change to the way these characters are stored, however--they are still '\u' encoded.
 +
 +
 +----
 +
 +
 +Here are a few screenshots to illustrate.
 +
 +1. RIMMF3 display of diacritics, between update 141206 and 150801:
 +
 +{{:details:lecarr1.png}}
 +
 +2. RIMMF3 display of diacritics, beginning with update 150801
 +
 +{{:details:lecarr2.png}}
 +
 +3. RDF text (snippet) for both #1 and #2 (beginning with update 141206)
 +
 +{{:details:lecarr3.png|}}
 +
 +
 +----
 +
 +
 +===== Non-Unicode RIMMF  =====
 +
 +Diacritics in data generated in RIMMF before 141206 are not Unicode-compatible.
 +
 +We tried to add a character encoding conversion utility to RIMMF3 at the same time we added the \u-encoding support, but this utility succeeds only with the most basic diacritics.
 +
 +===== How to handle encoding problems =====
 +
 +In the current RIMMF3 application (beginning with update 150801), loading older data that contains diacritics that are not \u-encoded may generate a character-encoding exception when the program starts((because at this time, when the EI is created, every record is parsed)).
 +
 +When this happens, the default behavior is to remove the record. RIMMF does this by moving the record that generated the error from the data folder into the subdirectory named '__history'. 
 +
 +RIMMF also logs the error in the 'RIMMF3.log' (which is found in your 'RIMMF3' folder):
 +
 +<code>
 +08/11/15 8:15:10 PM
 +EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000036.txt
 +EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000099.txt
 +EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000182.txt
 +EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000183.txt
 +EI Indexing Error: Exception trapped processing D:\Demo data\qpq00000015.txt
 +73 records indexed for EI; 5 errors during indexing.
 +</code>
 +
 +Unfortunately, removing the record in this way breaks any links present in the record. 
 +
 +To workaround this problem, we added an option with a different default behavior to update 150812. 
 +
 +The new option is located on the 'Data options' form which is accessed from the main menu:
 +
 +{{:details:dataopts.png|}}
 +
 +The new option is named: 
 +
 +  During EI creation, try to automatically fix character encoding errors
 +  
 +and it is enabled by default. The way this works is that when a character encoding exception is found during start-up, instead of removing the record from the data folder, RIMMF will try to fix the encoding problem and keep the record. 
 +
 +In the EI, these encoding problems will display like this:
 +
 +{{:details:encerr.png|}}
 +
 +To fix the problem, open the record and replace the 'diamond' with the correct diacritic
 +
 +For complete information about diacritics in RIMMF, please see the [[howto:diacritics|Diacritics and Unicode]] article.
  
details/unicode.txt ยท Last modified: 2015/08/24 17:10 (external edit)
Back to top
CC Attribution-Noncommercial-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed