[Matroska-devel] [Cellar] using ISO 639-3 language codes in Matroska
Dave Rice
dave at dericed.com
Wed Jan 13 01:15:27 CET 2016
> On Jan 12, 2016, at 8:28 AM, Moritz Bunkus <moritz at bunkus.org> wrote:
[…]
> Problem is I don't know the best way to do this. I see three possible
> avenues each with their own sets of pros and cons, and I'd like some
> feedback in order to turn this into a proper proposal:
>
> 1. Change the specs so that all language elements use 639-3 codes
>
> 2. Introduce new elements on the same level as the existing language
> elements that determine the standard the corresponding language
> element uses defaulting to 639-2 if missing
>
> 3. Introduce new elements on the same level as the existing language
> elements that contain a 639-3 code
I’d vote for #2 or #1 with a preference to #2. If option #1 I’d suggest that the language elements may use 639-2 OR 639-3. I’d also suggest that the Matroska specification adopt a externally-managed language authority (639-3 is the most obvious choice) and not labor to extend language identification beyond the adopted standard(s). For instance if there is a noticed deficiency in ISO 639-3, I think that the deficiency is most likely in shared interest with other projects and that addressing that deficiency would be better to direct to http://www-01.sil.org/iso639-3/.
If it aids in the discussion here is a histogram of usage for the Language [22][B5][9C] Element for a large Matroska sample set from archive.org.
95254 und
8073 ara
6416 jpn
5461 eng
990 hin
869 fre
651 spa
612 per
555 kor
477 ger
456 chi
325 rus
275 ind
203 por
196 dut
187 ita
172 tur
155 tam
148 rum
110 dan
104 pol
101 swe
97 heb
91 lat
88 fin
47 nob
45 nor
45 ???
40 unknown
38 vie
34 hun
34 cat
32 tha
30 cze
21 en
20 tel
19 ben
18 deu
15 gre
14 urd
9 ukr
9 may
8 pan
7 abk
6 slv
6 Unspecified
6
5 hrv
5 fra
4 srp
4 mar
4 lit
4 lav
4 ina
4 ice
4 NE
3 nno
3 jav
3 est
3 bul
3 arn
3 alb
2 zul
2 enm
2 arp
2 arm
2 arg
2 arc
2 ar
2 aar
1 zho
1 yao
1 wak
1 vai
1 tiv
1 std
1 scr
1 scc
1 rom
1 oci
1 kru
1 ira
1 inc
1 hau
1 grc
1 fr
1 epo
1 efi
1 bos
1 bnt
1 ava
1 apa
1 amh
1 aka
1 NAR
Dave Rice
More information about the Matroska-devel
mailing list