[Matroska-devel] [Cellar] using ISO 639-3 language codes in Matroska

Dave Rice dave at dericed.com
Wed Jan 13 01:15:27 CET 2016


> On Jan 12, 2016, at 8:28 AM, Moritz Bunkus <moritz at bunkus.org> wrote:

[…]

> Problem is I don't know the best way to do this. I see three possible
> avenues each with their own sets of pros and cons, and I'd like some
> feedback in order to turn this into a proper proposal:
> 
> 1. Change the specs so that all language elements use 639-3 codes
> 
> 2. Introduce new elements on the same level as the existing language
>   elements that determine the standard the corresponding language
>   element uses defaulting to 639-2 if missing
> 
> 3. Introduce new elements on the same level as the existing language
>   elements that contain a 639-3 code

I’d vote for #2 or #1 with a preference to #2. If option #1 I’d suggest that the language elements may use 639-2 OR 639-3. I’d also suggest that the Matroska specification adopt a externally-managed language authority (639-3 is the most obvious choice) and not labor to extend language identification beyond the adopted standard(s). For instance if there is a noticed deficiency in ISO 639-3, I think that the deficiency is most likely in shared interest with other projects and that addressing that deficiency would be better to direct to http://www-01.sil.org/iso639-3/.

If it aids in the discussion here is a histogram of usage for the Language [22][B5][9C] Element for a large Matroska sample set from archive.org.

95254 und
8073 ara
6416 jpn
5461 eng
 990 hin
 869 fre
 651 spa
 612 per
 555 kor
 477 ger
 456 chi
 325 rus
 275 ind
 203 por
 196 dut
 187 ita
 172 tur
 155 tam
 148 rum
 110 dan
 104 pol
 101 swe
  97 heb
  91 lat
  88 fin
  47 nob
  45 nor
  45 ???
  40 unknown
  38 vie
  34 hun
  34 cat
  32 tha
  30 cze
  21 en
  20 tel
  19 ben
  18 deu
  15 gre
  14 urd
   9 ukr
   9 may
   8 pan
   7 abk
   6 slv
   6 Unspecified
   6 
   5 hrv
   5 fra
   4 srp
   4 mar
   4 lit
   4 lav
   4 ina
   4 ice
   4 NE
   3 nno
   3 jav
   3 est
   3 bul
   3 arn
   3 alb
   2 zul
   2 enm
   2 arp
   2 arm
   2 arg
   2 arc
   2 ar
   2 aar
   1 zho
   1 yao
   1 wak
   1 vai
   1 tiv
   1 std
   1 scr
   1 scc
   1 rom
   1 oci
   1 kru
   1 ira
   1 inc
   1 hau
   1 grc
   1 fr
   1 epo
   1 efi
   1 bos
   1 bnt
   1 ava
   1 apa
   1 amh
   1 aka
   1 NAR

Dave Rice


More information about the Matroska-devel mailing list