[Dspace-general] "language" usage

Scott Yeadon scott.yeadon at anu.edu.au
Wed Apr 12 19:55:51 EDT 2006


Hi David,

>
>In the Dspace "full item record" view, it shows a "language" field on the
>right side of the screen for every DC element.  The default for this is
>"en_US".  Because it has become confusing for us, I am considering putting
>everything to blank "--".  Or, is there some use for this language value on
>each element of the Dublin Core?
>  
>
If you have metadata values in different languages you can use this 
language value to indicate this.

>I consider that,
>
>1) there is already a DC element in the record for language; ex.,
>- language = chi
>- language.iso = zh_HK
>So, why must there be a specific & repeated language indication for every DC
>element in the record?
>  
>
The DC language elements pertain to the digital object (bitstream()), 
not the metadata. The specific language can be used where the metadata 
is in various languages (e.g. French title and an english translation)

>2) What to put into the language value on an individual DC element when it
>uses a Romanization scheme of Chinese, such as Hanyu Pinyin (LC standard)?
>Should I put "zh"?  If so, will there be confusion when one record holds
>both Chinese vernacular script, and Romanized Hanyu Pinyin?  Both of these
>will show "zh", yes?
>  
>
I'd leave it blank. As I understand it pinyin is a transliteration 
scheme and not a language. For a collection I have in test I have 
created a title.pinyin element - not a nice solution but I couldn't find 
a nicer way of recording it.

>3) This language value on each DC element, in the case of "en_US, en_UK, &
>en_HK", or in, "fr_FR & fr_CA", only indicates orthographical differences,
>but not encoding of script type nor publication place?  But in the case of
>zh_TW, zh_HK, & zh_ZH (or zh_CN?), this is not relevant?  The zh codes do
>not indicate 
>	A) encoding schemes (BIG5, GB, EACC, CCCII, etc) on vernacular
>script, nor 
>	B) "simplifed character" or "traditional character" on vernacular
>script, nor 
>	C) type of Romanization used on romanization.
>  
>
That's correct. The ISO language scheme has a deliberately limited scope.

>4) to go to an extreme, what codes would I show for a Japanese title shown
>in these 7 various representations?
>- Kanji
>- Hiragana
>- Katagana
>- Romanized
>- Arabicized (using Arabic script to show Japanese pronounciation)
>- Cyrillcized (using Cyrillic script to show Japanese pronunciation)
>- translated into Eng
>  
>
The way I would handle it would be to have a separate title element for 
each with a unique qualifier where a satisfactory language code is not 
available. In DSpace 1.4 you could create your own DC-extended schema 
which supported these new qualified title elements.

>Or, should we make the zh codes meaningful by creating a value such as
>zh_HK.utf8,  Or, zh_HK.pinyin?
>  
>
That could be another option that would work, however keep in mind that 
if you want to configure item display/browse pages and search indexes, 
the dspace.cfg tends to require metadata elements to be specified. 
Recording the information in the database language field may limit some 
of your easy customisation options.

>Thanks
>David Palmer
>HKU
>
>  
>
There are probably many others out there who have investigated this in 
more depth than I,  so hopefully some other opinions will be forthcoming 
to assist you.

Scott.



More information about the Dspace-general mailing list