From lrn at land.ru Sun Apr 2 16:52:45 2006 From: lrn at land.ru (LRN) Date: Sun, 2 Apr 2006 14:52:45 +0000 (UTC) Subject: [Matroska-devel] Re: Subtitle font and size style in MP4 TTXT not decoded correctly References: Message-ID: ???????????. HMS ??????????? ?????????? ????? ? MP4 TTXT: ???? ? ???????? ?? ???????????? ?????, ?? ?? ???????????? ? MPC ? ???????????? ? ??????????? MPC. ???? ? ???????? ???????????? ?????, ?? ?? ???????????? ??? MS Sans Serif 18, ?????? ???? (???? ???? ????? ???????? ?? ?? ????? ????????). ????? (???????? ???????) ??????????? ???????????? ????? ??? TTXT: ???????? ??????????. ??? ?????????? HMS, MPC (?????? ? ??????? Gabest Splitter) ?????????? TTXT ????? ? ???????????? ????????, ?? ??? ???? ????????? ????????? ????? (?? ????????? ????????? MPC ??? ???? ?? ???????? - ???????? ???????????? ? ??? ????, ? ??????? ??? ????????? ? MP4) Osmo4 ?????????? ???????? ????? ?? ???? ?????????? (? ? ????? ?????? (?????????? ????? ?? MP4, ?? ????????? ??????? ?????) ? ? ????? ???????). From lrn at land.ru Tue Apr 4 19:20:01 2006 From: lrn at land.ru (LRN) Date: Tue, 4 Apr 2006 17:20:01 +0000 (UTC) Subject: [Matroska-devel] Re: Subtitle font and size style in MP4 TTXT not decoded correctly References: Message-ID: ????????: ???????????? K-Lite Mega Codec Pack v1.52 ??? ??????? "directshow filters instead of quicktime". ????? ??? ?????????? ?????? ??????? (??, ????? ??? ??? ???????????? ???????????). VobSub, ??????, ?????????? ???????? ?????????. From steve.lhomme at free.fr Wed Apr 5 10:12:23 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Wed, 05 Apr 2006 10:12:23 +0200 Subject: [Matroska-devel] Re: Ebml and Borland C++ Builder 2006 In-Reply-To: <43D7E933.80203@aaaa.com> References: <43CEAACA.4030001@MediaArea.net> <43D6CDDF.20103@free.fr> <43D7E933.80203@aaaa.com> Message-ID: <44337BE7.1030102@free.fr> This is integrated in the new libebml 0.7.7 Thanks for the bug report! Steve Zen wrote: > From Ebml 0.7.6 : > [C++ Warning] EbmlId.h(56): W8027 Functions containing for are not > expanded inline > [C++ Warning] EbmlId.h(76): W8027 Functions containing for are not > expanded inline > [C++ Warning] EbmlId.h(56): W8027 Functions containing for are not > expanded inline > [C++ Warning] EbmlId.h(76): W8027 Functions containing for are not > expanded inline > [C++ Warning] EbmlId.h(56): W8027 Functions containing for are not > expanded inline > [C++ Warning] EbmlId.h(76): W8027 Functions containing for are not > expanded inline > [C++ Warning] EbmlId.h(56): W8027 Functions containing for are not > expanded inline > [C++ Warning] EbmlId.h(76): W8027 Functions containing for are not > expanded inline > [C++ Warning] EbmlDate.h(90): W8022 'EbmlDate::operator <(const > EbmlDate &) const' hides virtual function 'EbmlElement::operator <(const > EbmlElement &) const' > [C++ Warning] EbmlElement.cpp(76): W8012 Comparing signed and unsigned > values > [C++ Warning] EbmlElement.cpp(156): W8004 'Result' is assigned a value > that is never used > [C++ Warning] EbmlElement.cpp(142): W8004 'PossibleSizeLength' is > assigned a value that is never used > [C++ Warning] EbmlElement.cpp(287): W8004 'Result' is assigned a value > that is never used > [C++ Warning] EbmlElement.cpp(281): W8004 'ReadSize' is assigned a > value that is never used > [C++ Warning] EbmlElement.cpp(354): W8004 'IdBitMask' is assigned a > value that is never used > [C++ Warning] EbmlElement.cpp(500): W8004 'Result' is assigned a value > that is never used > [C++ Warning] EbmlSInteger.cpp(71): W8072 Suspicious pointer arithmetic > [C++ Warning] EbmlSInteger.cpp(91): W8041 Negating unsigned value > [C++ Warning] EbmlSInteger.cpp(93): W8056 Integer arithmetic overflow > [C++ Warning] EbmlSInteger.cpp(94): W8056 Integer arithmetic overflow > [C++ Warning] EbmlSInteger.cpp(96): W8056 Integer arithmetic overflow > [C++ Warning] EbmlSInteger.cpp(97): W8056 Integer arithmetic overflow > [C++ Warning] EbmlSInteger.cpp(99): W8056 Integer arithmetic overflow > [C++ Warning] EbmlSInteger.cpp(100): W8056 Integer arithmetic overflow > [C++ Warning] EbmlString.cpp(136): W8072 Suspicious pointer arithmetic > [C++ Warning] EbmlString.cpp(137): W8072 Suspicious pointer arithmetic > [C++ Warning] EbmlUInteger.cpp(72): W8072 Suspicious pointer arithmetic > [C++ Warning] EbmlUInteger.cpp(94): W8056 Integer arithmetic overflow > [C++ Warning] EbmlUInteger.cpp(96): W8056 Integer arithmetic overflow > [C++ Warning] EbmlUInteger.cpp(98): W8056 Integer arithmetic overflow > [C++ Warning] EbmlUnicodeString.cpp(172): W8068 Constant out of range > in comparison > [C++ Warning] EbmlUnicodeString.cpp(184): W8068 Constant out of range > in comparison > [C++ Warning] EbmlUnicodeString.cpp(293): W8072 Suspicious pointer > arithmetic > [C++ Warning] EbmlUnicodeString.cpp(294): W8072 Suspicious pointer > arithmetic > [C++ Warning] xlocinfo(53): W8058 Cannot create pre-compiled header: > initialized data in header > [C++ Warning] MemIOCallback.cpp(74): W8072 Suspicious pointer arithmetic > [C++ Warning] MemIOCallback.cpp(80): W8072 Suspicious pointer arithmetic > [C++ Warning] MemIOCallback.cpp(103): W8072 Suspicious pointer arithmetic > [C++ Warning] MemIOCallback.cpp(118): W8072 Suspicious pointer arithmetic > [C++ Warning] StdIOCallback.cpp(138): W8066 Unreachable code > [C++ Warning] WinIOCallback.cpp(99): W8012 Comparing signed and > unsigned values > [C++ Warning] WinIOCallback.cpp(182): W8012 Comparing signed and > unsigned values > [C++ Warning] WinIOCallback.cpp(254): W8055 Possible overflow in shift > operation > > > From Matroska 0.8.0 > > [C++ Warning] KaxBlock.cpp(320): W8004 'cursor' is assigned a value > that is never used > [C++ Warning] KaxBlock.cpp(391): W8004 'cursor' is assigned a value > that is never used > [C++ Warning] KaxBlock.cpp(501): W8004 'cursor' is assigned a value > that is never used > [C++ Warning] KaxBlock.cpp(700): W8004 'Result' is assigned a value > that is never used > [C++ Warning] KaxCluster.cpp(199): W8012 Comparing signed and unsigned > values > [C++ Warning] KaxCluster.cpp(246): W8012 Comparing signed and unsigned > values > [C++ Warning] KaxCues.cpp(134): W8056 Integer arithmetic overflow > [C++ Warning] KaxCues.cpp(158): W8004 'aPointNext' is assigned a value > that is never used > [C++ Warning] KaxCuesData.cpp(273): W8056 Integer arithmetic overflow > [C++ Warning] KaxSeekHead.cpp(105): W8004 'aId' is assigned a value > that is never used > [C++ Warning] xlocinfo(53): W8058 Cannot create pre-compiled header: > initialized data in header > > Zen > > Steve Lhomme a ?crit : >> Hi Zen, >> >> I added these lines to EbmlConfig.h: >> #if __BORLANDC__ >= 0x0581 //Borland C++ Builder 2006 preview >> #include //malloc(), free() >> #include //memcpy() >> #endif //__BORLANDC__ >> >> You can get it from SVN. Can you send us the log of warnings ? Some >> may be interresting. >> >> thanks >> >> Zen wrote: >>> Hi, >>> >>> I use Ebml and Matroska libraries with Borland C++ Builder. >>> >>> No problem with the version 5.5 (=Borland C++ Builder 6) of this >>> compiler, but the version 5.81 (= Borland Developper Studio C++ 2006 >>> Preview technology), there is one little problem with EBML. >>> >>> The compiler does not succeed to find malloc, free and memcpy. >>> >>> I did a quick workaroud, with theses lines in ebml/EbmlConfig.h >>> (where you want) : >>> --- >>> #if __BORLANDC__==0x0581 //Borland C++ Builder 2006 preview >>> #include //malloc(), free() >>> #include //memcpy() >>> #endif //__BORLANDC__ >>> --- >>> >>> It would be nice if you can add theses lines in future versions... >>> (there are a lot of other things, but only warnings) >>> >>> Zen >>> http://mediainfo.sourceforge.net >> > > _______________________________________________ > Matroska-devel mailing list > Matroska-devel at lists.matroska.org > http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel > Read Matroska-Devel on GMane: > http://dir.gmane.org/gmane.comp.multimedia.matroska.devel -- robUx4 on blog From haessije at eps.e-i.com Wed Apr 5 14:40:18 2006 From: haessije at eps.e-i.com (HAESSIG Jean-Christophe) Date: Wed, 5 Apr 2006 14:40:18 +0200 Subject: [Matroska-devel] EBML Namespaces Message-ID: <2684397F36DC8849A9BF842433B7843601EB7E57@GZI-VM01.cm-cic.fr> Hello, I would like to know if there are any plans to include some namespacing feature into EBML. I think namespaces are an important feature to enable compositing of EBML documents and transparent extension of existing formats. Just in case, if nobody ever seriously thought about it, please consider the following proposal : my idea is to replace the high order bits (not the ones coding for the ID length) of the Class-IDs with a namespace ID. First of all, a few new level 1+ Class-IDs are defined : 4288, an integer element that defines how many bits are used for namespacing. 4289 is the namespace declaration element. It has two sub-elements : 81 is an integer representing the namespace ID, 82 is a string containing the namespace key, which can be a URL, as in traditional XML namespaces, or a public key fingerprint. When a Class-ID has high-order set bits that would conflict with the namespace ID, that Class-ID is simply represented as a larger class (it would be coherent with the EBML RFC section 2.2, which states that Class-IDs are always encoded in their shortest Form, therefore no ID clashes can happen) The namespace ID is always 0 for EBML elements, so for files using up to 7 additional namespaces, the header elements wouldn't change at all. Another advantage of my approach is that the lowest level of EBML parsers (which do not interpret Class-IDs) would not be confused by the files using namespaces. I welcome any comments and would be pleased to answer if further detail is required JC Haessig. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.lhomme at free.fr Sun Apr 9 16:19:07 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Sun, 09 Apr 2006 16:19:07 +0200 Subject: [Matroska-devel] EBML Namespaces In-Reply-To: <2684397F36DC8849A9BF842433B7843601EB7E57@GZI-VM01.cm-cic.fr> References: <2684397F36DC8849A9BF842433B7843601EB7E57@GZI-VM01.cm-cic.fr> Message-ID: <443917DB.4040003@free.fr> HAESSIG Jean-Christophe wrote: > Hello, Hi, > I would like to know if there are any plans to include some namespacing > feature into EBML. > > I think namespaces are an important feature to enable compositing of > EBML documents and > > transparent extension of existing formats. Well, extending an EBML document with more tags has been discussed in the past. The idea was to include a DTD in the header. But using DTDs mean we can use external ones too (as in HTML/XML). But there's always the issue of ID collision. > Just in case, if nobody ever seriously thought about it, please consider > the following > > proposal : > > my idea is to replace the high order bits (not the ones coding for the > ID length) of > > the Class-IDs with a namespace ID. First of all, a few new level 1+ > Class-IDs are > > defined : 4288, an integer element that defines how many bits are used > for namespacing. > > 4289 is the namespace declaration element. It has two sub-elements : 81 > is an integer > > representing the namespace ID, 82 is a string containing the namespace > key, which can > > be a URL, as in traditional XML namespaces, or a public key fingerprint. This is similar to the DTD system. Except you're changing the ID parsing. I think Class 3, 4 (and even Class 2) level offers enough IDs to avoid collision for formats in the same field of work (multimedia, banking, tagging, etc). Now the idea of a namespace would mean that the same ID would be used by 2 formats but with a different meaning. But given you set the different namespace in each ID, de facto they have a different ID. So I don't really see how it solves the problem of collision. > When a Class-ID has high-order set bits that would conflict with the > namespace ID, > > that Class-ID is simply represented as a larger class (it would be > coherent with the > > EBML RFC section 2.2, which states that Class-IDs are always encoded in > their shortest > > Form, therefore no ID clashes can happen) > > > > The namespace ID is always 0 for EBML elements, so for files using up to > 7 additional > > namespaces, the header elements wouldn't change at all. Another > advantage of my > > approach is that the lowest level of EBML parsers (which do not > interpret Class-IDs) > > would not be confused by the files using namespaces. Yes, but you still need to map, at the lowest level, the namespaces for each upper level reader. > I welcome any comments and would be pleased to answer if further detail > is required Sure. Maybe I didn't get your solution right. But I'm glad someone is trying to extend EBML. The main missing feature for the moment is the inability at the lower level to know if an element is EBMLMaster or not. So it's impossible to display a map of an EBML document without knowing the semantic. Steve From haessije at eps.e-i.com Mon Apr 10 11:08:46 2006 From: haessije at eps.e-i.com (HAESSIG Jean-Christophe) Date: Mon, 10 Apr 2006 11:08:46 +0200 Subject: [Matroska-devel] EBML Namespaces Message-ID: <2684397F36DC8849A9BF842433B7843601EB8455@GZI-VM01.cm-cic.fr> > Hi, Hi, > Well, extending an EBML document with more tags has been > discussed in the past. The idea was to include a DTD in the > header. But using DTDs mean we can use external ones too (as > in HTML/XML). But there's always the issue of ID collision. Indeed, DTDs provide semantic information about a file, but no namespace isolation. > This is similar to the DTD system. Except you're changing the > ID parsing. I think Class 3, 4 (and even Class 2) level > offers enough IDs to avoid collision for formats in the same > field of work (multimedia, banking, tagging, etc). Yes, this is possible, but vocabulary writers will need to cooperate to avoid using the same IDs. Some will release a format and afterwards notice that they have clashes. Some will take clashing IDs on purpose, to be incompatible with their competitor. > Now the idea of a namespace would mean that the same ID would > be used by > 2 formats but with a different meaning. But given you set the > different namespace in each ID, de facto they have a > different ID. So I don't really see how it solves the problem > of collision. Empirically, yes. But the namespace ID should not be seen as part of the Class-ID, since the used namespace ID can virtually hold any value, and will probably be different for the same vocabulary used in two distinct files. If you had 2 files, each One using two namespaces : File A [0 (EBML); 1 (Private NS A)], And File B [0 (EBML); 1(Private NS B)], a suitable program could Merge them in file C [0 (EBML); 1 (Private NS A); 2(Private NS B)]. I think a little confusion has been introduced in Class-ID naming Because their representation in the specs is the full byte dump, so ID [A1] is represented as [A1], while its real VINT value really is 21(hex). After realizing that the size descriptor is not part of the class ID value we can introduce another object that will not be counted as part of the class ID : the namespace ID. For example, if the namespace ID width is 3, the ID represented as [81] would have VINT value 1, namespace 0. The same ID in namespace 1 would read [91] and [F1] in namespace 7. Notice that only the representation in the byte stream changes, not the real value of the ID. > Yes, but you still need to map, at the lowest level, the > namespaces for each upper level reader. Of course. This job will be done by new elements in the EBML namespace (further noted as NSDE -- namespace declaration elements). I proposed value 0 for the namespace for EBML elements, mainly for convenience reasons. We need one element to set the namespace width and one container element to declare a namespace : in this element must be a sub-element to set the namespace value, and a sub-element to associate a namespace key with it (the only thing formats need to be globally unique). The trickiest problem I can see is deciding in what scope a namespace is active. The cleanest rule would be (just as in XML): a NSDE controls the namespace of its parent and its parent's children (the NSDE is therefore included) but this would be harder to implement because it requires forward-checking to decide to which namespace the current element belongs. Happily we are allowed to add restrictions on where NSDEs can be used in elements, if any. Another scoping rule can be "following-siblings" where a NSDE changes NS rules for the next elements and their children. It is technically correct and easy to implement, but for the moment I dislike it, I can't tell why... A third option is to only allow the use of NSDE near the beginning of the file and make the rules global to the whole EBML file, but this is rather gory. > Sure. Maybe I didn't get your solution right. But I'm glad > someone is trying to extend EBML. The main missing feature > for the moment is the inability at the lower level to know if > an element is EBMLMaster or not. > So it's impossible to display a map of an EBML document > without knowing the semantic. I've had some success in that, but not full, which means my solution cannot be used in real programs (see the attached Python script -- GTK2 libs needed), the idea is to always parse the content of elements -- be they master or not. The data in the element is searched for sub-elements. If the length of the found sub-elements overflows the parent, then parsing is cancelled and the data returns to raw status. Of course, this doesn't work if the data looks like legitimate EBML, but in fact isn't. There I can see only one solution : escape it. A code that says 'EBML stops here' should be inserted just before he raw data that needs it. This job can be done by a normal EBML element (with size 0), which is minimum 2 bytes long. Statistically I didn't encounter much cases where bogus EBML was interpreted, so it wouldn't be a problem for terseness. As an added benefit, this code could be used as a marker for the end of unbounded (size unknown) container elements and totally relieve the class ID from Providing hints about the level of that element (which is currently the case). And last, but not least, to provide real compositing and annotation with namespaces, all elements should be allowed to contain sub-elements (except size unbounded ones). JCH -------------- next part -------------- A non-text attachment was scrubbed... Name: ebml.py Type: application/octet-stream Size: 7752 bytes Desc: ebml.py URL: From steve.lhomme at free.fr Mon Apr 10 12:26:16 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Mon, 10 Apr 2006 12:26:16 +0200 Subject: [Matroska-devel] EBML Namespaces In-Reply-To: <2684397F36DC8849A9BF842433B7843601EB8455@GZI-VM01.cm-cic.fr> References: <2684397F36DC8849A9BF842433B7843601EB8455@GZI-VM01.cm-cic.fr> Message-ID: <443A32C8.40401@free.fr> HAESSIG Jean-Christophe wrote: >> Now the idea of a namespace would mean that the same ID would >> be used by >> 2 formats but with a different meaning. But given you set the >> different namespace in each ID, de facto they have a >> different ID. So I don't really see how it solves the problem >> of collision. > > Empirically, yes. But the namespace ID should not be seen as > part of the Class-ID, since the used namespace ID can virtually Yes, if that works this way, it's much better as it would break forward compatibility of older files (Matroska at least). > hold any value, and will probably be different for the same > vocabulary used in two distinct files. If you had 2 files, each > One using two namespaces : File A [0 (EBML); 1 (Private NS A)], > And File B [0 (EBML); 1(Private NS B)], a suitable program could > Merge them in file C [0 (EBML); 1 (Private NS A); 2(Private NS B)]. > > I think a little confusion has been introduced in Class-ID naming > Because their representation in the specs is the full byte dump, > so ID [A1] is represented as [A1], while its real VINT value > really is 21(hex). Yes, but it's easier for coders who are looking for a value. Maybe we should add the simplified value too. > After realizing that the size descriptor is not part of the class > ID value we can introduce another object that will not be counted > as part of the class ID : the namespace ID. Something like EbmlString, EbmlUInt, EbmlMaster, etc ? That's an option. The problem I see is how to mix elements of different namespaces. Or it could just mean like in C++: using namespace xyz; So that all IDs from xyz should be recognised. That implies that all namespaces used in a file cannot have overlapping IDs (collision conflict). > For example, if the namespace ID width is 3, the ID represented > as [81] would have VINT value 1, namespace 0. The same ID in > namespace 1 would read [91] and [F1] in namespace 7. Notice that > only the representation in the byte stream changes, not the real > value of the ID. This is a good solution. But it probably wouldn't work with Matroska (the only known format to use EBML so far). Because the bit(s) used to mark the namespace are probably already used by some IDs. Also, the limit to 3 (or 2 or 5) is arbitrary and doesn't meet the goal of EBML to be a format with no limits. (the only one we have is Matroska legacy, but we could evolve EBML independently of Matroska too) >> Yes, but you still need to map, at the lowest level, the >> namespaces for each upper level reader. > > Of course. This job will be done by new elements in the EBML > namespace (further noted as NSDE -- namespace declaration elements). > I proposed value 0 for the namespace for EBML elements, mainly for > convenience reasons. > We need one element to set the namespace width and one container Setting the namespace width might be a good idea. Because older formats (like Matroska) could set it to 0 (default value of the new element in the EBML header). And new files could use more space for the IDs (with namespace). In that case each namespace could use a custom (to the file) ID and be defined by a URL (string ID) or a (EBML) DTD. > element to declare a namespace : in this element must be a > sub-element to set the namespace value, and a sub-element to > associate a namespace key with it (the only thing formats need to > be globally unique). > > The trickiest problem I can see is deciding in what scope a namespace > is active. The cleanest rule would be (just as in XML): a NSDE > controls the namespace of its parent and its parent's children (the > NSDE is therefore included) but this would be harder to implement > because it requires forward-checking to decide to which namespace > the current element belongs. Happily we are allowed to add > restrictions on where NSDEs can be used in elements, if any. Given we extend the ID size (to include the namespace of each ID) I don't see a problem here. The scope applies to the ID itself. > Another scoping rule can be "following-siblings" where a NSDE changes > NS rules for the next elements and their children. It is technically > correct and easy to implement, but for the moment I dislike it, I > can't tell why... > > A third option is to only allow the use of NSDE near the beginning > of the file and make the rules global to the whole EBML file, but this > is rather gory. > >> Sure. Maybe I didn't get your solution right. But I'm glad >> someone is trying to extend EBML. The main missing feature >> for the moment is the inability at the lower level to know if >> an element is EBMLMaster or not. >> So it's impossible to display a map of an EBML document >> without knowing the semantic. > > I've had some success in that, but not full, which means my solution > cannot be used in real programs (see the attached Python > script -- GTK2 libs needed), the idea is to always parse the content > of elements -- be they master or not. The data in the element is > searched for sub-elements. If the length of the found sub-elements > overflows the parent, then parsing is cancelled and the data returns > to raw status. Yeah, there are too many possible false alarm. That's not a reliable solution. > Of course, this doesn't work if the data looks like legitimate EBML, > but in fact isn't. There I can see only one solution : escape it. No, the data in each EBML should not be modified because of the EBML ID it's in. That will make parsers way too complex. There could be a rule that all EBML Master IDs have a certain bit set, and the others don't. That could mean that one of the ID bits would be used for EBML Master. That will break Matroska compatibility but it could be added as an EBML 2 version (and Matroska v3 bitstream). > A code that says 'EBML stops here' should be inserted just before > he raw data that needs it. This job can be done by a normal EBML > element (with size 0), which is minimum 2 bytes long. Statistically > I didn't encounter much cases where bogus EBML was interpreted, so > it wouldn't be a problem for terseness. As an added benefit, this > code could be used as a marker for the end of unbounded (size > unknown) container elements and totally relieve the class ID from > Providing hints about the level of that element (which is currently > the case). > > And last, but not least, to provide real compositing and annotation > with namespaces, all elements should be allowed to contain > sub-elements (except size unbounded ones). -- robUx4 on blog From haessije at eps.e-i.com Mon Apr 10 18:17:49 2006 From: haessije at eps.e-i.com (HAESSIG Jean-Christophe) Date: Mon, 10 Apr 2006 18:17:49 +0200 Subject: [Matroska-devel] EBML Namespaces Message-ID: <2684397F36DC8849A9BF842433B7843601EB856C@GZI-VM01.cm-cic.fr> Steve Lhomme wrote: > This is a good solution. But it probably wouldn't work with > Matroska (the only known format to use EBML so far). Because > the bit(s) used to mark the namespace are probably already > used by some IDs. Also, the limit to 3 (or 2 or 5) is > arbitrary and doesn't meet the goal of EBML to be a format > with no limits. (the only one we have is Matroska legacy, but > we could evolve EBML independently of Matroska too) Out of luck, the first EBML (lev.0) master element [1A45DFA3] (VINT=0A45DFA3) falls in that category since it has no unset bit after the first 1. The solution I proposed in my first post was simply to move the conflicting bits out the way by sliding them one byte to the right. Since the current Class-Ids are supposed to be represented as their shortest form, this doesn't introduce new conflicts. 0A45DFA3 would then be encoded in the byte stream as [080A45DFA3], which makes room for 7 bits. Of course this violently breaks compatibility, but the problem could be solved by clever rules about when namespaces are enabled and to what scope they apply. > Setting the namespace width might be a good idea. Because > older formats (like Matroska) could set it to 0 (default > value of the new element in the EBML header). And new files > could use more space for the IDs (with namespace). I was thinking the same. If no namespaces declarations are found in the file, the parser should assume there is no use of namespaces and just talk to the EBML engine. I also just found out that being able to set the namespace length is not enough to meet the 'no limits' goal of EBML. It is not even technically good in a binary format which needs terseness, because it is very likely that in the same file, some elements from namespace A are used very often and elements from namespaces B,C,D, and E more rarely. My current proposal would force the EBML writer to add either : * a constant overhead of N bits in front of each ID, which is bad because 1) all possible NSIDs are not used and 2) many IDs will be stretched to make room for the NSID * a constant overhead of 1 bit (to distinguish between EBML and the private namespace) plus occasional (long) namespace declarations for local use of extra elements from the least used namespaces. A way better solution, that will remove the need for an element to declare the length of NSIDs is to use prefix codes (as in Huffman compression) to identify namespaces. > In that case each namespace could use a custom (to the file) > ID and be defined by a URL (string ID) or a (EBML) DTD. > > Given we extend the ID size (to include the namespace of each > ID) I don't see a problem here. The scope applies to the ID itself. In fact, the problem is that we need to support random seeking in a file, since at least one format (Matroska) will do it (and many others will, it is a performance requirement). Consider a cue head that points to the beginning of an EBML element somewhere in the file : what context do we have about that element ? Do we know its level, its parent, etc ? While this information might be more or less important to a particular vocabulary, we'll have a hard time guessing which namespaces apply to that specific element, depending on the scoping rules we agreed on. In the "parent & parent's children" model, there must be a way to find the parent of an element. In the "following-siblings", one must scan all the preceding siblings of an element to check for namespace declarations (I said I didn't like it). In the "global" model, everything is easier, but it limits the flexibility of the format, i.e. it will not be possible to have local namespace declarations for rarely used namespaces... > > Of course, this doesn't work if the data looks like > legitimate EBML, > > but in fact isn't. There I can see only one solution : escape it. > > No, the data in each EBML should not be modified because of > the EBML ID it's in. That will make parsers way too complex. > There could be a rule that all EBML Master IDs have a certain > bit set, and the others don't. > That could mean that one of the ID bits would be used for EBML Master. > > That will break Matroska compatibility but it could be added > as an EBML > 2 version (and Matroska v3 bitstream). Adding a bit to tell whether an element is master or not is not a solution IMHO, because adding namespaces in EBML should enable "annotation" : Consider an EBML snippet : [C5] # this is an element [C7] # this is an element holding some integer value 1234 # the value itself . # C7 element ends here . # C5 element ends here Now imagine you want to use an element in another vocabulary to describe that particular C7 instance : [C5] # this is an element [C7] # this is an element holding some integer value [D1] # this is an element holding a string value This value has been entered by user john # D1 element ends here 1234 . # C7 element ends here . # C5 element ends here As you can see, with namespaces all elements are potential "masters". I can not see another way to do it cleanly. Annotating as sibling nodes would be a mess, since annotations should themselves be annotable. Because of this, there must be some 'end-of-elements' tag to tell the parser that raw data starts here, if necessary. I can imagine a solution for future versions of EBML parsers that use namespaces. A lowlevel module will parse the file, identify to which namespace the elements belong and talk to the specific vocabulary modules. Namespace-unaware formats will extend the EBML vocabulary, as usual. Namespace-aware formats will register their own vocabulary module to the lowlevel parser. The EBML vocabulary module will be responsible for notifying the lowlevel parser about namespace declarations. JCH From vic_mozgin at hotmail.com Tue Apr 11 04:38:35 2006 From: vic_mozgin at hotmail.com (Victor Mozgin) Date: Tue, 11 Apr 2006 02:38:35 +0000 (UTC) Subject: [Matroska-devel] Unknown-> English language tag conversion by Haali splitter Message-ID: Hi, It seems that Haali splitter sets "English" language tags on streams that have none. At least that's what I see for video streams and audio streams split from AVI files. Can we make it customizable (with the possibility to keep "Unknown")? My filter chain depends on it, i.e. if there is a stream of unknown audio and a stream of English subtitles (common case of .avi + external .srt), then it is assumed that audio is non-English, and subtitles are enabled; with Haali audio stream comes as English, so there is no point in enabling subtitles. Thanks, /Victor From steve.lhomme at free.fr Tue Apr 11 10:43:17 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Tue, 11 Apr 2006 10:43:17 +0200 Subject: [Matroska-devel] Unknown-> English language tag conversion by Haali splitter In-Reply-To: References: Message-ID: <443B6C25.8040503@free.fr> Victor Mozgin wrote: > Hi, > > It seems that Haali splitter sets "English" language tags on streams that have > none. At least that's what I see for video streams and audio streams split > from AVI files. Can we make it customizable (with the possibility to > keep "Unknown")? My filter chain depends on it, i.e. if there is a stream of > unknown audio and a stream of English subtitles (common case of .avi + > external .srt), then it is assumed that audio is non-English, and subtitles > are enabled; with Haali audio stream comes as English, so there is no point in > enabling subtitles. > Thanks, In mastroska tracks, the default language is always "eng" for english. That's the case even if the language attribute is not set. If you want to set a track to unknown you have to use "und" (undetermined) for the language. Steve From vic_mozgin at hotmail.com Tue Apr 11 17:19:52 2006 From: vic_mozgin at hotmail.com (Victor Mozgin) Date: Tue, 11 Apr 2006 15:19:52 +0000 (UTC) Subject: [Matroska-devel] Re: Unknown-> English language tag conversion by Haali splitter References: <443B6C25.8040503@free.fr> Message-ID: > > It seems that Haali splitter sets "English" language tags on streams that have > > none. At least that's what I see for video streams and audio streams split > > from AVI files. Can we make it customizable (with the possibility to > > keep "Unknown")? My filter chain depends on it, i.e. if there is a stream of > > unknown audio and a stream of English subtitles (common case of .avi + > > external .srt), then it is assumed that audio is non-English, and subtitles > > are enabled; with Haali audio stream comes as English, so there is no point in > > enabling subtitles. > > Thanks, > > In mastroska tracks, the default language is always "eng" for english. > That's the case even if the language attribute is not set. If you want > to set a track to unknown you have to use "und" (undetermined) for the > language. OK, may I ask why? While I can see benefits of always having a determined language tag, it seems that it makes it impossible to differentiate between tracks with "eng" and tracks without the attribute. In any case, I wasn't so concerned with native matroska containers - they usually contain proper tags. But Haali splitter also supports AVI, and it's not possible to set this tag there (even if it's possible, obviously no one does it; at least all the .avi that I tried are reported by Haali as English, and many of them are not). I'd like to have a way to change the reported language for .avi, that's all. Does it make sense? Thanks, /Victor From steve.lhomme at free.fr Thu Apr 13 11:04:28 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Thu, 13 Apr 2006 11:04:28 +0200 Subject: [Matroska-devel] EBML Namespaces In-Reply-To: <2684397F36DC8849A9BF842433B7843601EB856C@GZI-VM01.cm-cic.fr> References: <2684397F36DC8849A9BF842433B7843601EB856C@GZI-VM01.cm-cic.fr> Message-ID: <443E141C.70803@free.fr> HAESSIG Jean-Christophe wrote: > Steve Lhomme wrote: >> This is a good solution. But it probably wouldn't work with >> Matroska (the only known format to use EBML so far). Because >> the bit(s) used to mark the namespace are probably already >> used by some IDs. Also, the limit to 3 (or 2 or 5) is >> arbitrary and doesn't meet the goal of EBML to be a format >> with no limits. (the only one we have is Matroska legacy, but >> we could evolve EBML independently of Matroska too) > > Out of luck, the first EBML (lev.0) master element [1A45DFA3] > (VINT=0A45DFA3) falls in that category since it has no unset > bit after the first 1. The solution I proposed in my first post > was simply to move the conflicting bits out the way by sliding > them one byte to the right. Since the current Class-Ids are > supposed to be represented as their shortest form, this doesn't > introduce new conflicts. 0A45DFA3 would then be encoded in the > byte stream as [080A45DFA3], which makes room for 7 bits. Sliding of 8 bits to the right, should make room for 8 bits. Depending on the EBML header we could know wether IDs are supposed to have a namespace or not. But I may have another option: why not but the bits *after* the current bits used for the ID ? All the ID processing of IDs would remain unchanged. And we would only need code to handle the namespace, the same way we have the length. So parsing would be split like this: [ID][namespace][size][data] it could also be [ID][size][namespace][data] What we need is to make one of the namespace in the document be set as "default", ie not marked. The same way we don't have to write mandatory elements that have the default value. This way Matroska can keep its low overhead and be extended by new namespaces. Of course for other formats than Matroska, this default namespace doesn't need to be mandatory. Also, if we use an EBML element to say: all lower elements use namespace XYZ it could replace the default value. Namespace switching would only occur in very localized places. That's the difference between having the "using namespace XYZ" approach and the "XYZ::element" one. We might use both (as in C++). >> In that case each namespace could use a custom (to the file) >> ID and be defined by a URL (string ID) or a (EBML) DTD. >> >> Given we extend the ID size (to include the namespace of each >> ID) I don't see a problem here. The scope applies to the ID itself. > > In fact, the problem is that we need to support random seeking > in a file, since at least one format (Matroska) will do it (and > many others will, it is a performance requirement). Consider a > cue head that points to the beginning of an EBML element > somewhere in the file : what context do we have about that > element ? Do we know its level, its parent, etc ? While this > information might be more or less important to a particular > vocabulary, we'll have a hard time guessing which namespaces > apply to that specific element, depending on the scoping rules > we agreed on. Yes, I was thinking about that too. That's why I prefer to keep the IDs intact and the format proposed above is good. Seeking (at least in matroska) can remain unchanged. For other formats we would need to take the namespace in account to make sure the element is the namespace we're looking. > In the "parent & parent's children" model, there must be a way > to find the parent of an element. In the "following-siblings", > one must scan all the preceding siblings of an element to check > for namespace declarations (I said I didn't like it). In the > "global" model, everything is easier, but it limits the > flexibility of the format, i.e. it will not be possible to have > local namespace declarations for rarely used namespaces... Indeed. But there is no case where you seek randomly in an EBML stream and look around for the context. It's just not possible. Seeking is only allowed for elements where you can recover the context (namely Level 0 and Level 1 in matroska). And these elements should contain the namespace (unless it's the default one). So the "using namespace" could still work just fine. >>> Of course, this doesn't work if the data looks like >> legitimate EBML, >>> but in fact isn't. There I can see only one solution : escape it. >> No, the data in each EBML should not be modified because of >> the EBML ID it's in. That will make parsers way too complex. >> There could be a rule that all EBML Master IDs have a certain >> bit set, and the others don't. >> That could mean that one of the ID bits would be used for EBML Master. >> >> That will break Matroska compatibility but it could be added >> as an EBML >> 2 version (and Matroska v3 bitstream). > > Adding a bit to tell whether an element is master or not is not > a solution IMHO, because adding namespaces in EBML should > enable "annotation" : Indeed, if we use this format [ID][namespace][size][data] the namespace part could also be used to say if the element is master or not. Maybe even the element type (3 or 4 bits). I think we're getting close to a solution that could solve our namespace & DTD problem ! Then you could compress any XML document to/from EBML. Well, there would still the need to know if a sub-element is an XML attribute or an XML value: vs val Which roughly translate to EBML as: EBML_element (Master) EBML_attr val (String) From steve.lhomme at free.fr Thu Apr 13 11:09:25 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Thu, 13 Apr 2006 11:09:25 +0200 Subject: [Matroska-devel] Re: Unknown-> English language tag conversion by Haali splitter In-Reply-To: References: <443B6C25.8040503@free.fr> Message-ID: <443E1545.5020406@free.fr> Victor Mozgin wrote: >>> It seems that Haali splitter sets "English" language tags on streams that > have >>> none. At least that's what I see for video streams and audio streams split >>> from AVI files. Can we make it customizable (with the possibility to >>> keep "Unknown")? My filter chain depends on it, i.e. if there is a stream > of >>> unknown audio and a stream of English subtitles (common case of .avi + >>> external .srt), then it is assumed that audio is non-English, and > subtitles >>> are enabled; with Haali audio stream comes as English, so there is no > point in >>> enabling subtitles. >>> Thanks, >> In mastroska tracks, the default language is always "eng" for english. >> That's the case even if the language attribute is not set. If you want >> to set a track to unknown you have to use "und" (undetermined) for the >> language. > > OK, may I ask why? While I can see benefits of always having a determined > language tag, it seems that it makes it impossible to differentiate between > tracks with "eng" and tracks without the attribute. In any case, I wasn't so In Matroska no attribute = default value. So if you don't set undertermined, it defaults to english. That's why there is a default value! Now the choice of english wasn't a good one, but it's too late now. > concerned with native matroska containers - they usually contain proper tags. > But Haali splitter also supports AVI, and it's not possible to set this tag > there (even if it's possible, obviously no one does it; at least all the .avi > that I tried are reported by Haali as English, and many of them are not). I'd > like to have a way to change the reported language for .avi, that's all. Does > it make sense? Yes. The DivX extensions support that, but I think they use metadata (tags) for that. Steve From xlazom00 at gmail.com Sat Apr 15 22:06:07 2006 From: xlazom00 at gmail.com (m][sko) Date: Sat, 15 Apr 2006 22:06:07 +0200 Subject: [Matroska-devel] Haali Media Splitter source files Message-ID: Hi, I would like to build haali media splitter by myself. Any source files of splitter ? From cedilla at gmail.com Mon Apr 17 01:19:00 2006 From: cedilla at gmail.com (Reed Wilson) Date: Sun, 16 Apr 2006 16:19:00 -0700 Subject: [Matroska-devel] Haali splitter seek to keyframe Message-ID: <4442D0E4.2060401@gmail.com> Hi all, I've been using Haali's media splitter for a while now, and although I like it, there is one major thing missing: Media Player Classic will seek to keyframes if you hold shift while pushing the left and right arrow keys. This does not work when using Haali's splitter. Is there any possibility of getting this working? Thanks, Reed -- ? From mike at po.cs.msu.su Mon Apr 17 01:30:12 2006 From: mike at po.cs.msu.su (Mike Matsnev) Date: Mon, 17 Apr 2006 03:30:12 +0400 Subject: [Matroska-devel] Haali splitter seek to keyframe In-Reply-To: <4442D0E4.2060401@gmail.com> References: <4442D0E4.2060401@gmail.com> Message-ID: <4442D384.7050005@po.cs.msu.su> Reed Wilson wrote: > I've been using Haali's media splitter for a while now, and although I > like it, there is one major thing missing: Media Player Classic will > seek to keyframes if you hold shift while pushing the left and right > arrow keys. This does not work when using Haali's splitter. Is there any > possibility of getting this working? Maybe sometime. Matroska doesn't store keyframe positions like avi container, so some file scanning is needed for keyframe seeking. This is not implemented atm. From steve.lhomme at free.fr Mon Apr 17 09:16:45 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Mon, 17 Apr 2006 09:16:45 +0200 Subject: [Matroska-devel] Haali Media Splitter source files In-Reply-To: References: Message-ID: <444340DD.6080502@free.fr> m][sko wrote: > Hi, > I would like to build haali media splitter by myself. > Any source files of splitter ? You can find the C parser on Haali's website. The DirectShow code is not available to avoid theft. Steve From paul at msn.com Thu Apr 20 17:27:53 2006 From: paul at msn.com (Paul Bryson) Date: Thu, 20 Apr 2006 10:27:53 -0500 Subject: [Matroska-devel] Re: Haali splitter seek to keyframe References: <4442D0E4.2060401@gmail.com> <4442D384.7050005@po.cs.msu.su> Message-ID: "Mike Matsnev" wrote in message news:4442D384.7050005 at po.cs.msu.su... > Maybe sometime. Matroska doesn't store keyframe positions like avi > container, > so some file scanning is needed for keyframe seeking. This is not > implemented > atm. Aren't the keyframes usually what are stored in the Cues? Atamido From haessije at eps.e-i.com Thu Apr 20 17:52:14 2006 From: haessije at eps.e-i.com (HAESSIG Jean-Christophe) Date: Thu, 20 Apr 2006 17:52:14 +0200 Subject: [Matroska-devel] EBML Namespaces Message-ID: <2684397F36DC8849A9BF842433B784360201E512@GZI-VM01.cm-cic.fr> Sorry for the long break, I couldn't find time to answer properly, since the subject isn't trivial. > > would then be encoded in the byte stream as [080A45DFA3], > which makes > > room for 7 bits. > > Sliding of 8 bits to the right, should make room for 8 bits. Except that one bit out of the 8 is eated by the size descriptor because the ID is made longer. > Depending on the EBML header we could know wether IDs are > supposed to have a namespace or not. But I may have another > option: why not but the bits > *after* the current bits used for the ID ? All the ID Putting the namespace value before or after the Class-ID would basically have the same effect, except that values are more likely to change in their low order digits, and therefore it's harder to find unused space here. > processing of IDs would remain unchanged. And we would only I'm not quite sure how you see this, but AFAIC imagine, one should seek for the namespace part of the ID, remove it, and then resume with normal ID interpretation. > need code to handle the namespace, the same way we have the > length. So parsing would be split like this: > > [ID][namespace][size][data] > it could also be > [ID][size][namespace][data] I sense that you want to encode the namespace value as a totally separate field, with equal status compared to Class-ID, Size, and Data. However there is a slight problem with this : EBML is supposed to be a byte-aligned format and it would require at least 1 extra byte for each element. This is not bad in itself, but it would waste a great amount of bits, since I do not expect files with more than 5 mixed namespaces to be frequent. Therefore, I expect the namespace value to take up to 3 bits in most cases, this is why I try to pack it into an existing field. You seem to be prepared to make big changes to the format, but I don't know to what extent whe should break compatibility... > What we need is to make one of the namespace in the document > be set as "default", ie not marked. The same way we don't > have to write mandatory elements that have the default value. > This way Matroska can keep its low overhead and be extended > by new namespaces. This could be the best solution, if we can find a way to express the namespace descriptors in a space-efficient manner *and* not making it a pain in the a** for random-seeking applications to recover the namespace state. However if it can't be done I would rather have the namespace expressed for each element in a file using them, and have files with no namespaces at all (ns desl length=0) like plain Matroska. With proper prefix-coding of the ns descriptor, one could use only one or two *bits* per element. > Also, if we use an EBML element to say: all lower elements > use namespace XYZ it could replace the default value. > Namespace switching would only occur in very localized > places. That's the difference between having the "using > namespace XYZ" approach and the "XYZ::element" one. We might > use both (as in C++). Using such a following-sibling approach would hurt seeking as it is currently done. Of course we can add specific rules like : an element containing namespace switches MUST NOT have its sub-elements indexed by seek heads, except if these seek heads point the parser to all relevant ns switches. This raises an important issue about the effective structure of libraries (of course, people who implement the whole parsing for their own application will have less problems here) dedicated to do the parsing. I believe that namespace processing really should be unknown to the specific applications. > Yes, I was thinking about that too. That's why I prefer to > keep the IDs intact and the format proposed above is good. > Seeking (at least in > matroska) can remain unchanged. For other formats we would Since there is no foreign-format mixing possibility due to The lack of namespaces, there is indeed no problem. > need to take the namespace in account to make sure the > element is the namespace we're looking. I was thinking a little more about seeking and I came to the conclusion that seeking (indexing and pointing to some part of the file, and the like) should go in EBML (or some seeking NS), and not in each specific application. Why ? Imagine you have some program to add comments in EBML files. You could take any element in the file and add a string comment. The app has its private elements and would use a separate namespace, so it wouldn't interfere with the existing data. The file would still be readable by the original program, the natural rule being to simply ignore unknown elements. However, adding elements changes the size of the file, and therefore the positions to which the seek heads point. Moving seeking into EBML would enable automatic relocation of the seek-heads. A more interesting thing with this is that local namespace state can be recovered while seeking, since it would be the job of the EBML library to make seek heads and it could include all the necessary information. JC From mike at po.cs.msu.su Thu Apr 20 18:02:36 2006 From: mike at po.cs.msu.su (Mike Matsnev) Date: Thu, 20 Apr 2006 20:02:36 +0400 Subject: [Matroska-devel] Re: Haali splitter seek to keyframe In-Reply-To: References: <4442D0E4.2060401@gmail.com> <4442D384.7050005@po.cs.msu.su> Message-ID: <4447B09C.4030200@po.cs.msu.su> Paul Bryson wrote: > "Mike Matsnev" wrote in message > news:4442D384.7050005 at po.cs.msu.su... >> Maybe sometime. Matroska doesn't store keyframe positions like avi >> container, >> so some file scanning is needed for keyframe seeking. This is not >> implemented >> atm. > > Aren't the keyframes usually what are stored in the Cues? No, cues contain cluster positions. From paul at msn.com Thu Apr 20 19:18:34 2006 From: paul at msn.com (Atamido) Date: Thu, 20 Apr 2006 12:18:34 -0500 Subject: [Matroska-devel] Re: Haali splitter seek to keyframe In-Reply-To: <4447B09C.4030200@po.cs.msu.su> References: <4442D0E4.2060401@gmail.com> <4442D384.7050005@po.cs.msu.su> <4447B09C.4030200@po.cs.msu.su> Message-ID: Mike Matsnev wrote: > Paul Bryson wrote: >> "Mike Matsnev" wrote in message >> news:4442D384.7050005 at po.cs.msu.su... >>> Maybe sometime. Matroska doesn't store keyframe positions like avi >>> container, >>> so some file scanning is needed for keyframe seeking. This is not >>> implemented >>> atm. >> >> Aren't the keyframes usually what are stored in the Cues? > No, cues contain cluster positions. But aren't cues typically only made for key frames, and doesn't it also contain the Block number within the Cluster? Atamido From alexander.noe at s2001.tu-chemnitz.de Thu Apr 20 19:23:16 2006 From: alexander.noe at s2001.tu-chemnitz.de (Alexander Noe') Date: Thu, 20 Apr 2006 19:23:16 +0200 Subject: [Matroska-devel] Re: Haali splitter seek to keyframe In-Reply-To: References: <4442D0E4.2060401@gmail.com> <4442D384.7050005@po.cs.msu.su> <4447B09C.4030200@po.cs.msu.su> Message-ID: <4447C384.1060006@hrz.tu-chemnitz.de> Atamido wrote: >> No, cues contain cluster positions. > > > But aren't cues typically only made for key frames yes > and doesn't it also contain the Block number within the Cluster? It can, but it is not required, and neither is it required to have a Cue point into each cluster. Cues *could* be used for that if they are suitable (i.e. dense enough), but it won't work properly on some files. Alex From steve.lhomme at free.fr Thu Apr 20 21:20:22 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Thu, 20 Apr 2006 21:20:22 +0200 Subject: [Matroska-devel] Re: Haali splitter seek to keyframe In-Reply-To: <4447C384.1060006@hrz.tu-chemnitz.de> References: <4442D0E4.2060401@gmail.com> <4442D384.7050005@po.cs.msu.su> <4447B09C.4030200@po.cs.msu.su> <4447C384.1060006@hrz.tu-chemnitz.de> Message-ID: <4447DEF6.8090400@free.fr> Alexander Noe' wrote: > Atamido wrote: > >>> No, cues contain cluster positions. >> >> >> But aren't cues typically only made for key frames > > yes > >> and doesn't it also contain the Block number within the Cluster? > > It can, but it is not required, and neither is it required to have a Cue > point into each cluster. Cues *could* be used for that if they are > suitable (i.e. dense enough), but it won't work properly on some files. Something that might be added to the specs is that Cue points are guaranteed to be seekable (ie play from the pointed location). As it's only implicit right now. Steve From alexander.noe at s2001.tu-chemnitz.de Thu Apr 20 21:26:38 2006 From: alexander.noe at s2001.tu-chemnitz.de (Alexander Noe') Date: Thu, 20 Apr 2006 21:26:38 +0200 Subject: [Matroska-devel] Re: Haali splitter seek to keyframe In-Reply-To: <4447DEF6.8090400@free.fr> References: <4442D0E4.2060401@gmail.com> <4442D384.7050005@po.cs.msu.su> <4447B09C.4030200@po.cs.msu.su> <4447C384.1060006@hrz.tu-chemnitz.de> <4447DEF6.8090400@free.fr> Message-ID: <4447E06E.7050805@hrz.tu-chemnitz.de> Steve Lhomme wrote: > Something that might be added to the specs is that Cue points are > guaranteed to be seekable (ie play from the pointed location). As it's > only implicit right now. Doesn't the lack of a CueReference already say that? Alex From steve.lhomme at free.fr Fri Apr 21 13:48:43 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Fri, 21 Apr 2006 13:48:43 +0200 Subject: [Matroska-devel] EBML Namespaces In-Reply-To: <2684397F36DC8849A9BF842433B784360201E512@GZI-VM01.cm-cic.fr> References: <2684397F36DC8849A9BF842433B784360201E512@GZI-VM01.cm-cic.fr> Message-ID: <4448C69B.1090409@free.fr> HAESSIG Jean-Christophe wrote: > > Sorry for the long break, I couldn't find time to answer > properly, since the subject isn't trivial. No problem, there's no rush. >>> would then be encoded in the byte stream as [080A45DFA3], >> which makes >>> room for 7 bits. >> Sliding of 8 bits to the right, should make room for 8 bits. > > Except that one bit out of the 8 is eated by the size descriptor > because the ID is made longer. It all depends on the solution adopted. But I still didn't understand how your solution can be backward compatible (EBML & Matroska) since it modifies the rules already in place. BTW, while backward compatibility is the goal we should think about all the possible options without that in mind. Then we'll see all that is possible and only then if it's worth to keep compatibility or not. >> Depending on the EBML header we could know wether IDs are >> supposed to have a namespace or not. But I may have another >> option: why not but the bits >> *after* the current bits used for the ID ? All the ID > > Putting the namespace value before or after the Class-ID would > basically have the same effect, except that values are more > likely to change in their low order digits, and therefore it's > harder to find unused space here. > >> processing of IDs would remain unchanged. And we would only > > I'm not quite sure how you see this, but AFAIC imagine, one > should seek for the namespace part of the ID, remove it, and then > resume with normal ID interpretation. Why do you want to remove the ID ? As seen below there are different options of where we could put it. Now instead of using another byte (or more) instead of splitting the IDs we could reuse bits in the data length. It would be no more backward compatible as using bits in the IDs but there would be more room for improvement (as the size is known to be encoded in different byte sizes). >> need code to handle the namespace, the same way we have the >> length. So parsing would be split like this: >> >> [ID][namespace][size][data] >> it could also be >> [ID][size][namespace][data] > > I sense that you want to encode the namespace value as a > totally separate field, with equal status compared to Class-ID, > Size, and Data. However there is a slight problem with this : Yes, that's the idea. > EBML is supposed to be a byte-aligned format and it would > require at least 1 extra byte for each element. This is not bad > in itself, but it would waste a great amount of bits, since I do > not expect files with more than 5 mixed namespaces to be > frequent. Therefore, I expect the namespace value to take up to > 3 bits in most cases, this is why I try to pack it into an > existing field. Well, what happens when you need 6 or 7 ? You don't have any more bits left. Adding another byte or 2 gives room for unlimited extensions (including the ability to use some other bits for other things like marking an element as EBML Master). That's what EBML is good at: no limits and still having very basic rules. I'm not too concerned about the overhead because right now if you need a lot of IDs you need to use 2 octets long ones. While with a namespace most IDs for each namespace won't need a lot of room (127 possibilities for Class A IDs). So in the end there should be a good balance. Using 3 bits in the ID header would reduce the number of possible Class A IDs of a format to 2^4-1 = 15 ! That's too small IMO. So I think adding another bit will give us more freedom and space and almost no cost. > You seem to be prepared to make big changes to the format, but > I don't know to what extent whe should break compatibility... Again, for current Matroska files it shouldn't be a problem as matroska would be the default namespace. In that case the namespace shouldn't be used for such IDs. That means older files will play without any problem in namespace-aware parsers. Only newer files containing some namespace will not be usable by older parsers. Which AFAIK is the same with what you propose. BTW, we still haven't discussed how to define the namespace in the EBML header, but the DocType existing today will remain. And that is the way to define the special namespace that will be used as default... The other namespaces will probably fall back in an list. It's like the DocType in XML (like html). At least we got the right name for that field ;) >> What we need is to make one of the namespace in the document >> be set as "default", ie not marked. The same way we don't >> have to write mandatory elements that have the default value. >> This way Matroska can keep its low overhead and be extended >> by new namespaces. > > This could be the best solution, if we can find a way to > express the namespace descriptors in a space-efficient manner > *and* not making it a pain in the a** for random-seeking > applications to recover the namespace state. However if it > can't be done I would rather have the namespace expressed for > each element in a file using them, and have files with no > namespaces at all (ns desl length=0) like plain Matroska. > With proper prefix-coding of the ns descriptor, one could use > only one or two *bits* per element. You want to use external files ? I don't really understand. The namespace 'tagging' of each element/ID has to be done inside the file... Well, actually not really you can use some namespaces in the file without defining them. You'd just know how to map IDs to different namespaces. The only thing missing for the file to be usable is the semantic. That's where the DTD (internal and/or external) comes in. Is that what you want to make external ? >> Also, if we use an EBML element to say: all lower elements >> use namespace XYZ it could replace the default value. >> Namespace switching would only occur in very localized >> places. That's the difference between having the "using >> namespace XYZ" approach and the "XYZ::element" one. We might >> use both (as in C++). > > Using such a following-sibling approach would hurt seeking as > it is currently done. Of course we can add specific rules > like : an element containing namespace switches MUST NOT have > its sub-elements indexed by seek heads, except if these seek > heads point the parser to all relevant ns switches. This Yes that would be a limit but seeking in a file format is a very special feature not used a lot for most formats. It makes sense in A/V formats where there is a timeline, but then you need to know the semantic to know what you're looking for when seeking. > raises an important issue about the effective structure of > libraries (of course, people who implement the whole parsing > for their own application will have less problems here) > dedicated to do the parsing. I believe that namespace > processing really should be unknown to the specific > applications. Yes. But again the namespace imply the semantic. All namespace-related functionalities should be done at the EBML level, but there will always be the need to map the semantic for the application. >> Yes, I was thinking about that too. That's why I prefer to >> keep the IDs intact and the format proposed above is good. >> Seeking (at least in >> matroska) can remain unchanged. For other formats we would > > Since there is no foreign-format mixing possibility due to > The lack of namespaces, there is indeed no problem. > >> need to take the namespace in account to make sure the >> element is the namespace we're looking. > > I was thinking a little more about seeking and I came to the > conclusion that seeking (indexing and pointing to some part > of the file, and the like) should go in EBML (or some seeking > NS), and not in each specific application. Why ? Imagine you > have some program to add comments in EBML files. You could > take any element in the file and add a string comment. The app > has its private elements and would use a separate namespace, > so it wouldn't interfere with the existing data. The file would > still be readable by the original program, the natural rule > being to simply ignore unknown elements. However, adding > elements changes the size of the file, and therefore the > positions to which the seek heads point. Moving seeking into > EBML would enable automatic relocation of the seek-heads. That's indeed a good point. Remuxing a matroska file with seek/cue elements would require to know the matroska semantic to remux it. When I read that it made me think of XPath in XML. I have no idea of how it works for XML but it seems an extension used to define pointers between elements and/or documents (one direction or bidirectional). And that's something that could make sense at the EBML level too. That wouldn't be backward compatible, but depending on the solution we come up with, it could be a good replacement. > A more interesting thing with this is that local namespace > state can be recovered while seeking, since it would be the > job of the EBML library to make seek heads and it could > include all the necessary information. Yes, that could work. But again, seeking (at the EBML or semantic level) is a special/tricky feature. In the case of Matroska it's only for Level 1 elements and therefore you always know the upper context (segment, since that's the only level 0 element). I can hardly imagine a format that would need to seek at level 2 or more. Especially because after seeking you probably need the context of upper elements (one of the feature of nested formats is that each element has a context to interpret it). While I'm all for seeking at the EBML level (would help for the format resistance to errors too) we shouldn't over design it for cases that won't make sense in the real world. We'll see what the discussion leads to :) Steve -- robUx4 on blog From chris at matroska.org Sun Apr 23 02:41:18 2006 From: chris at matroska.org (Christian HJ Wiesner) Date: Sun, 23 Apr 2006 02:41:18 +0200 Subject: [Matroska-devel] Historic moment : MKV/MKA playback on Windows, using Gstreamer (Ixion) !! Message-ID: <444ACD2E.7030907@matroska.org> Hi, just out of curiosity i was testing Michal Benes latest win32 mingw compile of gstreamer today, as a binary in his Ixion player on http://gstreamer.xeris.cz , and using his DirectX sinks. Here is what i could play fine (with almost 100% CPU, but working ) : MKA with Vorbis, MP3 ; not supported : AAC 5.1, FLAC, Wavpack MKV with MPEG4V2, DivX, XviD, h.264 (slow) ; not suported : Realvideo Unfortunately i had no other test files on my laptop, but this is exciting !! Imagine a world without Microsoft DirectShow !!! Think of a powerful, true x-platform media player based on Gstreamer ! Michal, congratulations. The player is still pretty unstable, as you are certainly aware, but this is definitely a very good start ! Please keep us in the loop, should you require more alpha testing. Regards Christian matroska project admin P.S. A readme in the ZIP, that all the packages have to be in c:\program files\ would be nice also ;) .... i found it on the homepage, but too late :) From chris at matroska.org Sun Apr 23 13:41:15 2006 From: chris at matroska.org (Christian HJ Wiesner) Date: Sun, 23 Apr 2006 13:41:15 +0200 Subject: [Matroska-devel] matroska plugin for Foobar2000 0.9 Message-ID: <444B67DB.7020100@matroska.org> >Thank you for a contact. >I'm ready to release the latest version on your server, please tell me a future procedure. > >Haru Ayana >ayana at reharmonize.net Hi Ayana, do you know how to use an IRC client, like mIRC or chatzilla (part of mozilla). If so, we'd welcome you to come to our developer IRC channel #matroska on irc.corecodec.com , so that we can discuss this personally. We are very happy that you decided to port the existing plugin to Foobar 0.9 and will gladly host it on our servers for you. As soon as the Foobar team decides to include the plugin in their special installer, the server load will drop daramatically anyhow, but in the meantime we are proud to host your plugin. Christian matroska project admin http://www.matroska.org From unonymouz at yahoo.com Sun Apr 23 13:12:34 2006 From: unonymouz at yahoo.com (Devender Parmar) Date: Sun, 23 Apr 2006 04:12:34 -0700 (PDT) Subject: [Matroska-devel] Please Help!!! Message-ID: <20060423111234.73653.qmail@web38403.mail.mud.yahoo.com> Hi! I am currently downloading a torrent which is a mkv anime video. It's total size is 395 MB and I have downloaded about 11 MB of it. I have download Matroska Full Pack 1.1.2 and The Core Media Player and have used the Matroska CDL plugin while using that player. But still it doesn't play that partially downloaded mkv file and it shows an error that the first element in the file is not EBML and hence it is unable to play that file. I have attached the corresponding log file along with this E-Mail. Please inform me as soon as you find out anything and please reply. I have tried out everything and am TOTALLY frustrated. Please Reply!!! --------------------------------- Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. Great rates starting at 1¢/min. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CoreConsole.log Type: application/octet-stream Size: 3197 bytes Desc: 2100513720-CoreConsole.log URL: From chris at matroska.org Sun Apr 23 14:29:39 2006 From: chris at matroska.org (Christian HJ Wiesner) Date: Sun, 23 Apr 2006 14:29:39 +0200 Subject: [Matroska-devel] Please Help!!! In-Reply-To: <20060423111234.73653.qmail@web38403.mail.mud.yahoo.com> References: <20060423111234.73653.qmail@web38403.mail.mud.yahoo.com> Message-ID: <444B7333.9070304@matroska.org> Devender Parmar schrieb: > Hi! > I am currently downloading a torrent which is a mkv anime video. It's > total size is 395 MB and I have downloaded about 11 MB of it. I have > download Matroska Full Pack 1.1.2 and The Core Media Player and have > used the Matroska CDL plugin while using that player. But still it > doesn't play that partially downloaded mkv file and it shows an error > that the first element in the file is not EBML and hence it is unable > to play that file. I have attached the corresponding log file along > with this E-Mail. Please inform me as soon as you find out anything > and please reply. I have tried out everything and am TOTALLY > frustrated. Please Reply!!! Who told you, that this is possible ? If your downloaded file doesnt contain the track header at the beginning of your file, it cant work in no case. If your partial download contains the track header already, you can try with VLC player 0.8.4 http://videolan.org or mplayer win32. CoreMediaplayer is DirectShow based, like Windows Mediaplayer, and will not be able to do that, no way. The CDL doesnt help here either, as this is a DirectShow limitation. Christian matroska project admin From seelie at faireal.net Sun Apr 23 14:24:29 2006 From: seelie at faireal.net (Liisachan) Date: Sun, 23 Apr 2006 12:24:29 GMT Subject: [Matroska-devel] Re: [Matroska-users] Please Help!!! In-Reply-To: <20060423111234.73653.qmail@web38403.mail.mud.yahoo.com> References: <20060423111234.73653.qmail@web38403.mail.mud.yahoo.com> Message-ID: <20060423122429uT#+S#@faireal.net> Hi, your question is not really related to Matroska, but related to how BT works. Anyway, if it were the first 11 MB of 395 MB MKV it would play, but that's not the case because most bt clients get the pieces in a random order. Your imagination: - I have [1][2][3]...[11] of [1][2][3]...[395] but [1][2][3]... doesn't play. Reality: - You have [24][39][65][98] or something like that. Devender Parmar wrote: > Hi! > I am currently downloading a torrent which is a mkv anime video. It's total size is 395 MB and I have downloaded about 11 MB of it. I have download Matroska Full Pack 1.1.2 and The Core Media Player and have used the Matroska CDL plugin while using that player. But still it doesn't play that partially downloaded mkv file and it shows an error that the first element in the file is not EBML and hence it is unable to play that file. I have attached the corresponding log file along with this E-Mail. Please inform me as soon as you find out anything and please reply. I have tried out everything and am TOTALLY frustrated. Please Reply!!! > > > --------------------------------- > Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. Great rates starting at 1¢/min. > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com From seelie at faireal.net Sun Apr 23 18:07:46 2006 From: seelie at faireal.net (Liisachan) Date: Sun, 23 Apr 2006 16:07:46 GMT Subject: [Matroska-devel] matroska plugin for Foobar2000 0.9 In-Reply-To: <444B67DB.7020100@matroska.org> References: <444B67DB.7020100@matroska.org> Message-ID: <20060423160746Z2?n?x@faireal.net> I tested foo_input_matroska.dll 0.9.0.3 and (FLAC).mka plays on foobar2000 0.9 but (WavPack).mka doesn't play, not to mention (TTA).mka Christian HJ Wiesner wrote: > > > >Thank you for a contact. > >I'm ready to release the latest version on your server, please tell me > a future procedure. > > > >Haru Ayana > >ayana at reharmonize.net > > > > Hi Ayana, > > do you know how to use an IRC client, like mIRC or chatzilla (part of > mozilla). If so, we'd welcome you to come to our developer IRC channel > #matroska on irc.corecodec.com , so that we can discuss this personally. > > We are very happy that you decided to port the existing plugin to Foobar > 0.9 and will gladly host it on our servers for you. As soon as the > Foobar team decides to include the plugin in their special installer, > the server load will drop daramatically anyhow, but in the meantime we > are proud to host your plugin. > > Christian > matroska project admin > http://www.matroska.org > _______________________________________________ > Matroska-devel mailing list > Matroska-devel at lists.matroska.org > http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel > Read Matroska-Devel on GMane: http://dir.gmane.org/gmane.comp.multimedia.matroska.devel From chris at matroska.org Sun Apr 23 20:31:06 2006 From: chris at matroska.org (Christian HJ Wiesner) Date: Sun, 23 Apr 2006 20:31:06 +0200 Subject: [Matroska-devel] matroska plugin for Foobar2000 0.9 In-Reply-To: <20060423160746Z2?n?x@faireal.net> References: <444B67DB.7020100@matroska.org> <20060423160746Z2?n?x@faireal.net> Message-ID: <444BC7EA.5080504@matroska.org> Hi Liisachan, it cant work because - Wavpack is now fully natively supported in foobar core, so there is no such thing as a foo_wavpack anylonger that we could modify to allow packet decoding. The only guy who could add packet decoding for wavpack is Peter Pawlowski himself, the main foobar author and core developer - the foo_tta plugin has not been ported to 0.9 yet, also there has never been a packet decoder coming with it, to my knowledge ? Did TTA in MKA work in foobar 0.8.3 before ? Regards Christian Liisachan schrieb: >I tested foo_input_matroska.dll 0.9.0.3 >and (FLAC).mka plays on foobar2000 0.9 >but (WavPack).mka doesn't play, >not to mention (TTA).mka > >Christian HJ Wiesner wrote: > > > >> >Thank you for a contact. >> >I'm ready to release the latest version on your server, please tell me >>a future procedure. >> > >> >Haru Ayana >> >ayana at reharmonize.net >> >> >> >>Hi Ayana, >> >>do you know how to use an IRC client, like mIRC or chatzilla (part of >>mozilla). If so, we'd welcome you to come to our developer IRC channel >>#matroska on irc.corecodec.com , so that we can discuss this personally. >> >>We are very happy that you decided to port the existing plugin to Foobar >>0.9 and will gladly host it on our servers for you. As soon as the >>Foobar team decides to include the plugin in their special installer, >>the server load will drop daramatically anyhow, but in the meantime we >>are proud to host your plugin. >> >>Christian >>matroska project admin >>http://www.matroska.org >>_______________________________________________ >>Matroska-devel mailing list >>Matroska-devel at lists.matroska.org >>http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel >>Read Matroska-Devel on GMane: http://dir.gmane.org/gmane.comp.multimedia.matroska.devel >> >> > > > > From christophe.paris at free.fr Sun Apr 23 21:44:00 2006 From: christophe.paris at free.fr (Christophe PARIS) Date: Sun, 23 Apr 2006 21:44:00 +0200 Subject: [Matroska-devel] matroska plugin for Foobar2000 0.9 In-Reply-To: <444BC7EA.5080504@matroska.org> References: <444B67DB.7020100@matroska.org> <20060423160746Z2?n?x@faireal.net> <444BC7EA.5080504@matroska.org> Message-ID: <444BD900.8050607@free.fr> > - the foo_tta plugin has not been ported to 0.9 yet, also there has > never been a packet decoder coming with it, to my knowledge ? Did TTA in > MKA work in foobar 0.8.3 before ? Yes there was TTA support, sources are here : http://www.matroska.org/~toff/foo_tta_with_packet_decoder_src.rar From pacolugo at videotron.ca Sun Apr 23 21:15:04 2006 From: pacolugo at videotron.ca (Francisco Lugo) Date: Sun, 23 Apr 2006 14:15:04 -0500 Subject: [Matroska-devel] windows mobile 2003 enable? Message-ID: <000001c6670a$3a200a30$ae601e90$@ca> Did you have any version that run in to this os windows mobile 2003??? thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From seelie at faireal.net Mon Apr 24 02:42:14 2006 From: seelie at faireal.net (Liisachan) Date: Mon, 24 Apr 2006 00:42:14 GMT Subject: [Matroska-devel] matroska plugin for Foobar2000 0.9 In-Reply-To: <444BC7EA.5080504@matroska.org> References: <20060423160746Z2?n?x@faireal.net> <444BC7EA.5080504@matroska.org> Message-ID: <200604240042147ZdA0o@faireal.net> http://tmp.reharmonize.net/foobar/ they have foo_input_tta Christian HJ Wiesner wrote: > Hi Liisachan, > > it cant work because > > - Wavpack is now fully natively supported in foobar core, so there is no > such thing as a foo_wavpack anylonger that we could modify to allow > packet decoding. The only guy who could add packet decoding for wavpack > is Peter Pawlowski himself, the main foobar author and core developer That could be a problem...hmm > > - the foo_tta plugin has not been ported to 0.9 yet, also there has > never been a packet decoder coming with it, to my knowledge ? Did TTA in > MKA work in foobar 0.8.3 before ? Yes, it did, with Toff's modified foo_tta. http://www.hydrogenaudio.org/forums/index.php?showtopic=26266&st=0&p=236267&#entry236267 And I was just told, Seems Ayana (or someone) already worked on it for 0.9! foo_input_tta_2.4.1.zip 14-Apr-2006 12:00 123k Fyi TTA.mka happens to be one of the most popular usage of Matroska in Asia. Many ppl love to make (TTA+CUE+Jacket Picture JPG Attached).mka, feeling that really COOL; someone even make a special tool just for that. This page is in Japanese, but you can guess what is written by a lot of pics there. http://musicpc.fc2web.com/mkaproject04.htm > > Regards > > Christian > > Liisachan schrieb: > > >I tested foo_input_matroska.dll 0.9.0.3 > >and (FLAC).mka plays on foobar2000 0.9 > >but (WavPack).mka doesn't play, > >not to mention (TTA).mka > > > >Christian HJ Wiesner wrote: > > > > > > > >> >Thank you for a contact. > >> >I'm ready to release the latest version on your server, please tell me > >>a future procedure. > >> > > >> >Haru Ayana > >> >ayana at reharmonize.net > >> > >> > >> > >>Hi Ayana, > >> > >>do you know how to use an IRC client, like mIRC or chatzilla (part of > >>mozilla). If so, we'd welcome you to come to our developer IRC channel > >>#matroska on irc.corecodec.com , so that we can discuss this personally. > >> > >>We are very happy that you decided to port the existing plugin to Foobar > >>0.9 and will gladly host it on our servers for you. As soon as the > >>Foobar team decides to include the plugin in their special installer, > >>the server load will drop daramatically anyhow, but in the meantime we > >>are proud to host your plugin. > >> > >>Christian > >>matroska project admin > >>http://www.matroska.org > >>_______________________________________________ > >>Matroska-devel mailing list > >>Matroska-devel at lists.matroska.org > >>http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel > >>Read Matroska-Devel on GMane: http://dir.gmane.org/gmane.comp.multimedia.matroska.devel > >> > >> > > > > > > > > From michal.benes at itonis.tv Mon Apr 24 09:06:05 2006 From: michal.benes at itonis.tv (Michal Benes) Date: Mon, 24 Apr 2006 09:06:05 +0200 Subject: [Matroska-devel] Re: Historic moment : MKV/MKA playback on Windows, using Gstreamer (Ixion) !! In-Reply-To: <444ACD2E.7030907@matroska.org> References: <444ACD2E.7030907@matroska.org> Message-ID: <1145862365.5644.13.camel@localhost.localdomain> Hi folks, unfortunately, I do not work on the Windows port anymore. Therefore, our web page can be somewhat outdated. But there is an active community (one of the most active developers is Sebastien Moutte) among GStreamer community pushing the work further. I know they did very much since my attempts but I do not know if anybody is working on a GUI player. I think this is a pity, precompiled binaries for Windows with GUI player could attract many new GStreamer developers from Windows world. Michal Christian HJ Wiesner p??e v Ne 23. 04. 2006 v 02:41 +0200: > > Hi, > > just out of curiosity i was testing Michal Benes latest win32 mingw > compile of gstreamer today, as a binary in his Ixion player on > http://gstreamer.xeris.cz , and using his DirectX sinks. > > Here is what i could play fine (with almost 100% CPU, but working ) : > > MKA with Vorbis, MP3 ; not supported : AAC 5.1, FLAC, Wavpack > MKV with MPEG4V2, DivX, XviD, h.264 (slow) ; not suported : Realvideo > > Unfortunately i had no other test files on my laptop, but this is > exciting !! Imagine a world without Microsoft DirectShow !!! Think of a > powerful, true x-platform media player based on Gstreamer ! > > Michal, congratulations. The player is still pretty unstable, as you are > certainly aware, but this is definitely a very good start ! Please keep > us in the loop, should you require more alpha testing. > > Regards > > Christian > matroska project admin > > P.S. A readme in the ZIP, that all the packages have to be in c:\program > files\ would be nice also ;) .... i found it on the homepage, but too > late :) From steve.lhomme at free.fr Mon Apr 24 09:47:39 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Mon, 24 Apr 2006 09:47:39 +0200 Subject: [Matroska-devel] windows mobile 2003 enable? In-Reply-To: <000001c6670a$3a200a30$ae601e90$@ca> References: <000001c6670a$3a200a30$ae601e90$@ca> Message-ID: <444C829B.4090908@free.fr> Francisco Lugo wrote: > Did you have any version that run in to this os windows mobile 2003??? You should try TCPMP (also known as BetaPlayer). Steve From chris at matroska.org Tue Apr 25 02:55:13 2006 From: chris at matroska.org (Christian HJ Wiesner) Date: Tue, 25 Apr 2006 02:55:13 +0200 Subject: [Matroska-devel] matroska plugin for Foobar2000 0.9 In-Reply-To: <200604240042147ZdA0o@faireal.net> References: <20060423160746Z2?n?x@faireal.net> <444BC7EA.5080504@matroska.org> <200604240042147ZdA0o@faireal.net> Message-ID: <444D7371.8000106@matroska.org> Wow :O ! Ayana, can you pls. tell us briefly who reharmonize.net are ? Could you come to our IRC channel by time, so that we can teach you about our uploading system ? Regards Christian Liisachan schrieb: >http://tmp.reharmonize.net/foobar/ >they have foo_input_tta > > >Christian HJ Wiesner wrote: > > > >>Hi Liisachan, >> >>it cant work because >> >>- Wavpack is now fully natively supported in foobar core, so there is no >>such thing as a foo_wavpack anylonger that we could modify to allow >>packet decoding. The only guy who could add packet decoding for wavpack >>is Peter Pawlowski himself, the main foobar author and core developer >> >> > >That could be a problem...hmm > > > >>- the foo_tta plugin has not been ported to 0.9 yet, also there has >>never been a packet decoder coming with it, to my knowledge ? Did TTA in >>MKA work in foobar 0.8.3 before ? >> >> > >Yes, it did, with Toff's modified foo_tta. >http://www.hydrogenaudio.org/forums/index.php?showtopic=26266&st=0&p=236267&#entry236267 > >And I was just told, >Seems Ayana (or someone) already worked on it for 0.9! >foo_input_tta_2.4.1.zip 14-Apr-2006 12:00 123k > From seelie at faireal.net Tue Apr 25 03:21:03 2006 From: seelie at faireal.net (Liisachan) Date: Tue, 25 Apr 2006 01:21:03 GMT Subject: [Matroska-devel] matroska plugin for Foobar2000 0.9 In-Reply-To: <444D7371.8000106@matroska.org> References: <200604240042147ZdA0o@faireal.net> <444D7371.8000106@matroska.org> Message-ID: <20060425012103qKUW7a@faireal.net> Christian HJ Wiesner wrote: > > Wow :O ! > > Ayana, can you pls. tell us briefly who reharmonize.net are ? Could you > come to our IRC channel by time, so that we can teach you about our > uploading system ? Is Ayana-san on this list? A different person than Ayaka who did the logo? Whois shows both the server and the domain owner are in Japan. If Ayana uses the mail address @ that domain, probably it's his/hers. (btw you shouldn't have quoted his/her mail address. That might be private one and might be spammed.) Anyway I'd really like to thank him/her personally too. Otherwise I'm useless there but I'll try to join the chan too starting tomorrow, just in case Ayana-san doesn't like to chat in english. > > Regards > > Christian > > Liisachan schrieb: > > >http://tmp.reharmonize.net/foobar/ > >they have foo_input_tta > > > > > >Christian HJ Wiesner wrote: > > > > > > > >>Hi Liisachan, > >> > >>it cant work because > >> > >>- Wavpack is now fully natively supported in foobar core, so there is no > >>such thing as a foo_wavpack anylonger that we could modify to allow > >>packet decoding. The only guy who could add packet decoding for wavpack > >>is Peter Pawlowski himself, the main foobar author and core developer > >> > >> > > > >That could be a problem...hmm > > > > > > > >>- the foo_tta plugin has not been ported to 0.9 yet, also there has > >>never been a packet decoder coming with it, to my knowledge ? Did TTA in > >>MKA work in foobar 0.8.3 before ? > >> > >> > > > >Yes, it did, with Toff's modified foo_tta. > >http://www.hydrogenaudio.org/forums/index.php?showtopic=26266&st=0&p=236267&#entry236267 > > > >And I was just told, > >Seems Ayana (or someone) already worked on it for 0.9! > >foo_input_tta_2.4.1.zip 14-Apr-2006 12:00 123k > > From chris at matroska.org Tue Apr 25 03:33:26 2006 From: chris at matroska.org (Christian HJ Wiesner) Date: Tue, 25 Apr 2006 03:33:26 +0200 Subject: [Matroska-devel] matroska plugin for Foobar2000 0.9 In-Reply-To: <20060425012103qKUW7a@faireal.net> References: <200604240042147ZdA0o@faireal.net> <444D7371.8000106@matroska.org> <20060425012103qKUW7a@faireal.net> Message-ID: <444D7C66.4050405@matroska.org> Liisachan schrieb: >Christian HJ Wiesner wrote: > > >>Wow :O ! >> >>Ayana, can you pls. tell us briefly who reharmonize.net are ? Could you >>come to our IRC channel by time, so that we can teach you about our >>uploading system ? >> >> > >Is Ayana-san on this list? >Anyway I'd really like to thank him/her personally too. >Otherwise I'm useless there but I'll try to join the chan too >starting tomorrow, just in case Ayana-san doesn't like to chat >in english. > Liisachan, i don't think she/he is subscribed to matroska-devel already. I set up an alias ayana AT matroska DOT org for her/him now, so that we can copy her/him without using his direct email adress on the list. I received a PM from her/him on HA.org in English language, but it's impossible for me to judge from that if she/he is comfortable speaking/writing english. In any case, we would truely appreciate if you help us establishing a close communication with her/him. I am a strong believer in a future for MKA in the audio world, and i am extremely happy to hear that this is used already in Japan, based on TTA lossless codec. Regards Christian From aaa at aaaa.com Wed Apr 26 09:17:20 2006 From: aaa at aaaa.com (Zen) Date: Wed, 26 Apr 2006 09:17:20 +0200 Subject: [Matroska-devel] AAC and SBR Message-ID: <444F1E80.3060409@aaaa.com> This question is not specific to MKV, but I think I have a good group of video and audio experts here ;-) On the matroska CodecID page, there are AAC with or without SBR. I though that you know how to detect SBR ;-) in the file aac_common.cpp, I found this : ****** if (size == 5) { output_sample_rate = aac_sampling_freq[(data[4] & 0x7f) >> 3]; sbr = true; } else if (sample_rate <= 24000) { output_sample_rate = 2 * sample_rate; sbr = true; } else sbr = false; ****** I don't understand where come from this "size" of 5, why do you do this? And for other sizes, is the sample_rate detection (>24K or not) is the only mean to detect SBR??? Zen, developper of MediaInfo From alexander.noe at s2001.tu-chemnitz.de Wed Apr 26 09:39:31 2006 From: alexander.noe at s2001.tu-chemnitz.de (Alexander Noe') Date: Wed, 26 Apr 2006 09:39:31 +0200 Subject: [Matroska-devel] AAC and SBR In-Reply-To: <444F1E80.3060409@aaaa.com> References: <444F1E80.3060409@aaaa.com> Message-ID: <444F23B3.3060309@hrz.tu-chemnitz.de> Zen schrieb: > I don't understand where come from this "size" of 5, why do you do this? HE-AAC has 5 bytes of private data in WAVEFORMATEX when muxed into AVI (and I guess also 5 bytes of private data in MP4). Thus, when using such a source, you can tell whether or not its SBR from looking at the private data size. Alex From aaa at aaaa.com Wed Apr 26 09:52:56 2006 From: aaa at aaaa.com (Zen) Date: Wed, 26 Apr 2006 09:52:56 +0200 Subject: [Matroska-devel] Re: AAC and SBR In-Reply-To: <444F23B3.3060309@hrz.tu-chemnitz.de> References: <444F1E80.3060409@aaaa.com> <444F23B3.3060309@hrz.tu-chemnitz.de> Message-ID: <444F26D8.2090704@aaaa.com> Alexander Noe' a ?crit : > HE-AAC has 5 bytes of private data in WAVEFORMATEX when muxed into AVI > (and I guess also 5 bytes of private data in MP4). Thus, when using such > a source, you can tell whether or not its SBR from looking at the > private data size. If I understand well, private data for AVI (and MP4) of a HE-AAC are different from other AAC (for me, the ADTS header is 8 byte long) Thanks! Now I have to handle this in my software ;-) But my major problem is for raw AAC (ADTS header), and for them I have not the size... From haessije at eps.e-i.com Wed Apr 26 10:00:14 2006 From: haessije at eps.e-i.com (HAESSIG Jean-Christophe) Date: Wed, 26 Apr 2006 10:00:14 +0200 Subject: [Matroska-devel] EBML Namespaces Message-ID: <2684397F36DC8849A9BF842433B784360201EC27@GZI-VM01.cm-cic.fr> > BTW, while backward compatibility is the goal we should think > about all the possible options without that in mind. Then > we'll see all that is possible and only then if it's worth to > keep compatibility or not. Yes. Since we agreed that the seeking issue should preferably be handled by EBML itself, lots of options have re-opened in my mind. > Why do you want to remove the ID ? As seen below there are > different options of where we could put it. What I mean by "removing" the Namespace-ID is that the application (e.g. Matroska) does not have to know which namespace ID it is using in the file. To put it simpler : multiple vocabularies from different applications can be multiplexed inside one file, without colliding thanks to the use of namespaces. The lowlevel EBML parser's job will be to demultiplex the elements from different vocabularies and feed them to the correct application. Since the namespace information is only a means to multiplexing, it is not forwarded to the application and therefore "removed". Below is a schematic of my idealized vision of the EBML logical infrastructure : _____ Lowlevel / \ parser --------->| | X application ______ __ /elements | | | | / \ / in NS X \_____/ | |----->| |---- _____ | | | |---- elements / \ |______| \__/ \in NS Y | | Y application EBML File ^| ---------->| | ||elements \_____/ Notification||in EBML NS ^ about NS || | registration|| _____ |EBML API &state after|| / \ |e.g. seeking seeking |*-->| |<--------* *----| | EBML application \_____/ A plain old EBML application (e.g. Matroska) without namespaces Would extend the EBML application. A new one (with namespace support) would implement a new application module and register to the lowlevel parser. Now where do I see the NS IDs precisely ? [Class-ID][Size][Data ****] / \_______________________________ / \ [Length Descriptor][Namespace][Class Value] The lowlevel parser reads the length descriptor and extracts the value bits (NS+CLASS). In these bits are the namespace of the element and the class ID value. Before the parser tries to decode the NS, all the possible NS values that this element may use must be known to the parser (that's somewhat obvious, but it doesn't mean that NS definitions have to physically be written before the element using them -- think XML ;). The possible NS values must be prefix codes, this allows the parser to scan the values bits from left to right and exactly know when the NS ID ends. The rest of the value bits are the Class-ID value which is forwarded to the corresponding application. Examples with 3 namespaces : 0 : EBML 10 : NSX 11 : NSY 0x8F = 0b10001111 Namespace : EBML LNVVVVVV Class-ID value : 15(dec) 0xDB = 0b11011011 Namespace : NSX LNNVVVVV Class-ID value : 27(dec) 0x7D09 = 0b01111101 00001001 Namespace : NSY LLNNVVVV VVVVVVVV Class-ID value : 3337(dec) > Now instead of using another byte (or more) instead of > splitting the IDs we could reuse bits in the data length. It > would be no more backward compatible as using bits in the IDs > but there would be more room for improvement (as the size is > known to be encoded in different byte sizes). Yes, this is more or less what I've thought for class-IDs. But unlike class-ids, the size field is vital to parse the file (a 'dumb' parser that doesn't interpret the meaning of the elements wouldn't be affected by a change in the IDs but would stop working if the encoding of sizes changed). > > EBML is supposed to be a byte-aligned format and it would > > require at least 1 extra byte for each element. This is not > > bad in itself, but it would waste a great amount of bits, > > since I do not expect files with more than 5 mixed namespaces > > to be frequent. Therefore, I expect the namespace value to take > > up to 3 bits in most cases, this is why I try to pack it into > > an existing field. > > Well, what happens when you need 6 or 7 ? You don't have any > more bits left. Adding another byte or 2 gives room for When I say that I do not expect files using more than 5 namespaces to be frequent it doesn't mean that I want to disallow files which need to use more... It only means that we should be able to encode the NS on less than 1 byte (preferably only 2 or 3 bits) for the most frequent case. Cases requiring 300+ simultaneous namespaces will still be allowed, but they will use more bits (8,9,10,more) to express the namespace of the various contained elements. Moreover, even if we have files using many namespaces, I really doubt they will massively mix them -- a localized use of some namespace is more likely. Therefore we should definitely have some namespace switching feature, and namespace IDs should not be globally fixed for a given file. > I'm not too concerned about the overhead because right now if > you need a lot of IDs you need to use 2 octets long ones. > While with a namespace most IDs for each namespace won't need > a lot of room (127 possibilities for Class A IDs). So in the > end there should be a good balance. EBML is a wonderful structured format. I don't mean to restrain format writers from defining the format they like, but using a lot of IDs in some flat space is not what I would call good engineering. As for XML, I think the tree-like encapsulating structure of EBML must be used. Instead of defining 16k 2-byte IDs, one had better categorizing and grouping them : 1100 2-byte IDs, each containing 15 1-byte elements would definitely do the job. And from the overhead POV, it's better since 1-byte elements are much more used than 2-byte ones. Of course, if some format demands a flat space for any technical reason, it's possible, but one should be warned about a possible overhead issue. > Using 3 bits in the ID header would reduce the number of > possible Class A IDs of a format to 2^4-1 = 15 ! That's too > small IMO. So I think adding another bit will give us more > freedom and space and almost no cost. With a variable-width NS-ID embedded in the ID header, a separation in classes is less relevant. In cases where only 2 namespaces are used, only 1 bit is needed, thus 6 bits are left to encode the element ID value. Format writers should be aware that element IDs ranging from 0 to 63(dec) use at least 1 byte, maybe more, depending on the number of namespaces actually in use. For example, if the namespace ID for one element uses 2 bits, element ID values ranging from 32 to 4095 will need 2 bytes. If the namespace ID 3 bits, element ID values ranging from 16 to 2047 will need 2 bytes, and element ID values ranging from 2048 to 262143 will need 3 bytes, etc... The following chart summarizes how much bytes are required to encode an element ID assuming various NS ID lengths. In fact it could be a little smarter than that (there are still some wasted bits but for the moment I don't know what to do with them). |NS ID Length Bytes Required -------> v 1 2 3 4 5 0 0-127 128-16383 16384-2097151 2097152-268435455 268435456-34359738367 1 0-63 64-8191 8192-1048575 1048576-134217727 134217728-17179869183 2 0-31 32-4095 4096-524287 524288-67108863 67108864-8589934591 3 0-15 16-2047 2048-262143 262144-33554431 33554432-4294967295 4 0-7 8-1023 1024-131071 131072-16777215 16777216-2147483647 5 0-3 4-511 512-65535 65536-8388607 8388608-1073741823 6 0-1 2-255 256-32767 32768-4194303 4194304-536870911 7 * 1-127 128-16383 16384-2097151 2097152-268435455 8 * 1-63 64-8191 8192-1048575 1048576-134217727 Also, I don't intend to change the Reserved Values : these are still 127,16383,2097151,268435455,34359738367, etc, regardless of the inserted NS ID. > > You seem to be prepared to make big changes to the format, > but I don't > > know to what extent whe should break compatibility... > > Again, for current Matroska files it shouldn't be a problem > as matroska would be the default namespace. In that case the > namespace shouldn't be used for such IDs. That means older > files will play without any problem in namespace-aware > parsers. Only newer files containing some namespace will not > be usable by older parsers. Which AFAIK is the same with what > you propose. > > BTW, we still haven't discussed how to define the namespace > in the EBML header, but the DocType existing today will > remain. And that is the way to define the special namespace > that will be used as default... The other namespaces will > probably fall back in an list. It's like the DocType in XML > (like html). At least we got the right name for that field ;) I understand that you are referring to a "default" namespace as in XML. Well, that's not how I see it. If there is some kind of default namespace i.e. not expressed for elements in this namespace we still need a way to tell that the namespace value is present or absent. I intend to have two kinds of files : * Files without namespaces, i.e. NS length = 0. This mathematically forbids the presence of namespaces since the empty code is a prefix for all codes. * Files using namespaces, i.e. NS length >= 1. There can be only 2 namespaces (0 & 1, each using one bit), or more, as seen in the previous examples, but each element holds a namespace ID. > You want to use external files ? I don't really understand. > The namespace 'tagging' of each element/ID has to be done > inside the file... I expressed myself really poorly. I hope that the previous paragraph put that clear. > Yes that would be a limit but seeking in a file format is a > very special feature not used a lot for most formats. It > makes sense in A/V formats where there is a timeline, but > then you need to know the semantic to know what you're > looking for when seeking. A good seeking feature could be quite useful in applications without a timeline : large images can be tiled and zooming to a precise position can be sped up by seeking info indexed by (x,y) coordinates. Databases using EBML would clearly benefit from seeking features. I think I could find many other examples... > When I read that it made me think of XPath in XML. I have no > idea of how it works for XML but it seems an extension used > to define pointers between elements and/or documents (one > direction or bidirectional). And that's something that could > make sense at the EBML level too. That wouldn't be backward > compatible, but depending on the solution we come up with, it > could be a good replacement. First, XML doesn't work with offsets, and I think it is quite impossible to use raw XML files and randomly seek through them. If such functionality is needed, one first has to read the entire document into memory. XPath works on paths (heh ;) but can also number elements if they have the same name. You can get the 3rd foo child node from node bar with bar/foo[3] AFAIR. If another foo element is inserted before, the pointer won't point to the correct foo anymore, unless the program that added it also modified the XPath pointer. JC From seelie at faireal.net Wed Apr 26 11:13:39 2006 From: seelie at faireal.net (Liisachan) Date: Wed, 26 Apr 2006 09:13:39 GMT Subject: [Matroska-devel] matroska plugin for Foobar2000 0.9 In-Reply-To: <444D7C66.4050405@matroska.org> References: <20060425012103qKUW7a@faireal.net> <444D7C66.4050405@matroska.org> Message-ID: <20060426091339fjmFzY@faireal.net> Christian HJ Wiesner wrote: > In any case, we would truely appreciate if you help us establishing a > close communication with her/him. I am a strong believer in a future for > MKA in the audio world, and i am extremely happy to hear that this is > used already in Japan, based on TTA lossless codec. Ive just posted a message in his/her blog, saying something like this in japanese: -------------------------------------- Msg from Christian HJ Wiesner About fb2k plugins (1) "Please join #matroska @ irc.corecodec.com so that I can tell you necessary information (my note--probably about svn l/p)" my note--if you are new to IRC, get mIRC and blah blah my note--i'm in the chan too, so you can communicate in japanese, if you'd like to. (2) "The link to your files is now in the news section, which may increase the bw usage of your server a bit. Matroska.org can host files for you if needed." (3) "I set ayana at matroska.org for you" (my note--i think its a fw address to your addy) That's it. Thank you very much. -------------------------------------- http://reharmonize.net/200001/87#comment-15103 I'm not sure if Ayana will read it soon, but you need to wait for a while anyway. If there's anything more you'd like to tell ayana, i'll post it there again. Regards From alexander.noe at s2001.tu-chemnitz.de Wed Apr 26 11:35:50 2006 From: alexander.noe at s2001.tu-chemnitz.de (Alexander Noe') Date: Wed, 26 Apr 2006 11:35:50 +0200 Subject: [Matroska-devel] Re: AAC and SBR In-Reply-To: <444F26D8.2090704@aaaa.com> References: <444F1E80.3060409@aaaa.com> <444F23B3.3060309@hrz.tu-chemnitz.de> <444F26D8.2090704@aaaa.com> Message-ID: <444F3EF6.2070507@hrz.tu-chemnitz.de> Zen schrieb: > If I understand well, private data for AVI (and MP4) of a HE-AAC are > different from other AAC (for me, the ADTS header is 8 byte long) > But my major problem is for raw AAC (ADTS header), and for them I have > not the size... For raw AAC or ADTS files you can't easily detect SBR. That is way mkvmerge as well as AVI-Mux GUI require the user to select whether or not such a file is SBR. Alex From haessije at eps.e-i.com Wed Apr 26 17:07:28 2006 From: haessije at eps.e-i.com (HAESSIG Jean-Christophe) Date: Wed, 26 Apr 2006 17:07:28 +0200 Subject: [Matroska-devel] EBML Namespaces Message-ID: <2684397F36DC8849A9BF842433B784360201ECED@GZI-VM01.cm-cic.fr> I realize that I am totally f*cking up the list threading. I don't know what I am doing wrong. Well, using Outlook may be the reason but I can't help it for the moment. My apologies JC From moritz at bunkus.org Thu Apr 27 09:54:42 2006 From: moritz at bunkus.org (Moritz Bunkus) Date: Thu, 27 Apr 2006 09:54:42 +0200 Subject: [Matroska-devel] Re: AAC and SBR In-Reply-To: <444F26D8.2090704@aaaa.com> References: <444F1E80.3060409@aaaa.com> <444F23B3.3060309@hrz.tu-chemnitz.de> <444F26D8.2090704@aaaa.com> Message-ID: <200604270954.46803.moritz@bunkus.org> Hey, On Wednesday 26 April 2006 09:52, Zen wrote: > If I understand well, private data for AVI (and MP4) of a HE-AAC are > different from other AAC (for me, the ADTS header is 8 byte long) Most "proper" container formats (MP4, AVI, Matroska, RMFF) contain the private data for AAC. This is usually 2 bytes long for normal AAC and 5 bytes for HE-AAC. Raw AAC files with ADTS headers (or even worse, ADIF headers) do NOT contain these 2 or 5 bytes at all. For such files only decoding the AAC data actually gives you the information whether or not the file uses HE-AAC. Neither AVI Mux GUI nor mkvmerge contains an AAC decoder and never will. Regards, Mosu -- If Darl McBride was in charge, he'd probably make marriage unconstitutional too, since clearly it de-emphasizes the commercial nature of normal human interaction, and probably is a major impediment to the commercial growth of prostitution. - Linus Torvalds -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: not available URL: From paul at msn.com Thu Apr 27 17:56:27 2006 From: paul at msn.com (Atamido) Date: Thu, 27 Apr 2006 10:56:27 -0500 Subject: [Matroska-devel] Re: EBML Namespaces In-Reply-To: <2684397F36DC8849A9BF842433B7843601EB7E57@GZI-VM01.cm-cic.fr> References: <2684397F36DC8849A9BF842433B7843601EB7E57@GZI-VM01.cm-cic.fr> Message-ID: This all seems a little on the complex side. Why not just define an element that indicates a change of name space for all formats? Something without much chance of random collision like: [12][34][56][78] If you want to be able to indicate *which* name space you are changing to, then include a child element that includes the format's DocType: [13][87][65][43] Either have that as the first element, with other elements from the new name space following later, or put all of the elements from the new name space in their own element like: [14][87][12][87] So, format XYZ that wants to contain a Matroska file would contain this: [12][34][56][78] (size) [13][87][65][43] (size) {matroska} [14][87][12][87] (size) [18][53][80][67] (size) This seems pretty strait forward and backwards compatible to me. Am I missing something? Atamido HAESSIG Jean-Christophe wrote: > Hello, > > > > I would like to know if there are any plans to include some namespacing > feature into EBML. > > I think namespaces are an important feature to enable compositing of > EBML documents and > > transparent extension of existing formats. > > > > Just in case, if nobody ever seriously thought about it, please consider > the following > > proposal : > > my idea is to replace the high order bits (not the ones coding for the > ID length) of > > the Class-IDs with a namespace ID. First of all, a few new level 1+ > Class-IDs are > > defined : 4288, an integer element that defines how many bits are used > for namespacing. > > 4289 is the namespace declaration element. It has two sub-elements : 81 > is an integer > > representing the namespace ID, 82 is a string containing the namespace > key, which can > > be a URL, as in traditional XML namespaces, or a public key fingerprint. > > > > When a Class-ID has high-order set bits that would conflict with the > namespace ID, > > that Class-ID is simply represented as a larger class (it would be > coherent with the > > EBML RFC section 2.2, which states that Class-IDs are always encoded in > their shortest > > Form, therefore no ID clashes can happen) > > > > The namespace ID is always 0 for EBML elements, so for files using up to > 7 additional > > namespaces, the header elements wouldn't change at all. Another > advantage of my > > approach is that the lowest level of EBML parsers (which do not > interpret Class-IDs) > > would not be confused by the files using namespaces. > > > > I welcome any comments and would be pleased to answer if further detail > is required > > JC Haessig. > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Matroska-devel mailing list > Matroska-devel at lists.matroska.org > http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel > Read Matroska-Devel on GMane: http://dir.gmane.org/gmane.comp.multimedia.matroska.devel From haessije at eps.e-i.com Thu Apr 27 18:43:50 2006 From: haessije at eps.e-i.com (HAESSIG Jean-Christophe) Date: Thu, 27 Apr 2006 18:43:50 +0200 Subject: [Matroska-devel] Re: EBML Namespaces Message-ID: <2684397F36DC8849A9BF842433B784360201EE14@GZI-VM01.cm-cic.fr> > This all seems a little on the complex side. Why not just It is somewhat complicated indeed. > define an element that indicates a change of name space for > all formats? This is a feature I want to have, as you can see in my last Post about namespaces > Something without much chance of random collision like: > [12][34][56][78] > > If you want to be able to indicate *which* name space you are > changing to, then include a child element that includes the > format's DocType: > [13][87][65][43] As you describe it, it seems that the EBML generic elements should be valid in all namespaces. I thought it would be much cleaner for EBML to have its own namespace, which would improve the extensibility of EBML. Otherwise further additions of new class-ids used by EBML may collide with older formats. > Either have that as the first element, with other elements > from the new name space following later, or put all of the > elements from the new name space in their own element like: > [14][87][12][87] > > So, format XYZ that wants to contain a Matroska file would > contain this: > [12][34][56][78] (size) > [13][87][65][43] (size) {matroska} > [14][87][12][87] (size) > [18][53][80][67] (size) > > > This seems pretty strait forward and backwards compatible to > me. Am I missing something? I think it would lead to too much overhead. For one namespace switch you would have at least 10 to 16, if not more extra bytes. Formats intensively mixing 3 or 4 vocabularies would yield huge files... The feature you describe (namespace switching) is definitely a must-have, but is not sufficient to suit all use cases IMO. JC From paul at msn.com Thu Apr 27 19:24:11 2006 From: paul at msn.com (Paul Bryson) Date: Thu, 27 Apr 2006 12:24:11 -0500 Subject: [Matroska-devel] Re: EBML Namespaces In-Reply-To: <2684397F36DC8849A9BF842433B784360201EE14@GZI-VM01.cm-cic.fr> References: <2684397F36DC8849A9BF842433B784360201EE14@GZI-VM01.cm-cic.fr> Message-ID: HAESSIG Jean-Christophe wrote: >> Something without much chance of random collision like: >> [12][34][56][78] >> >> If you want to be able to indicate *which* name space you are >> changing to, then include a child element that includes the >> format's DocType: >> [13][87][65][43] > > As you describe it, it seems that the EBML generic elements > should be valid in all namespaces. I thought it would be > much cleaner for EBML to have its own namespace, which would > improve the extensibility of EBML. Otherwise further additions > of new class-ids used by EBML may collide with older formats. I'm not sure if I understand this comment. There are global EBML IDs defined that would be valid for any EBML format. There are 8 for the EBML header. There CRC-32 and VOID elements are another two which I /think/ are valid for all EBML files. That is 10 altogether. That shouldn't be to many to keep track of. I am suggesting one more, and with a 4 byte ID to boot. Between [80] and [1F][FF][FF][FF] there are more than 500 million different IDs, so it makes a random collision pretty unlikely, even if someone weren't aware of the pre-existing IDs. (Although I don't know how you could not be as you need them to make the header. >> So, format XYZ that wants to contain a Matroska file would >> contain this: >> [12][34][56][78] (size) >> [13][87][65][43] (size) {matroska} >> [14][87][12][87] (size) >> [18][53][80][67] (size) > I think it would lead to too much overhead. For one namespace > switch you would have at least 10 to 16, if not more extra bytes. > Formats intensively mixing 3 or 4 vocabularies would yield > huge files... The ID's I suggested make at least 16 bytes overhead for embedding another format (1 byte DocType, though in this case "matroska" is 8 bytes). You could certainly make two of those single byte IDs to drop it to 10 byte minimum. I would leave the parent as a long ID to avoid collision. I would think that if a format is low-bitrate enough for it to matter and wants to have that many instances of another format, it should just include all of the sub-format's elements into it's own specifications. For instance, if I wanted to include some Matroska Tags, excluding the overhead for a name space breakout, you will need at least 25 bytes just to do a single Tag. And that ignores the oddity of such a task. It would be more efficient to just have a part of the specs for your format say, "Matroska element 'Tags' and all of it's children are valid at this point." On the other hand, if you were going to include 20 Matroska Tags, and had them all grouped together in a single name space switch, you will be storing maybe 1000 bytes of data, of which the 23 bytes to change the name space would be pretty insignificant. Atamido From haessije at eps.e-i.com Fri Apr 28 14:30:31 2006 From: haessije at eps.e-i.com (HAESSIG Jean-Christophe) Date: Fri, 28 Apr 2006 14:30:31 +0200 Subject: [Matroska-devel] Re: EBML Namespaces Message-ID: <2684397F36DC8849A9BF842433B784360201EEC6@GZI-VM01.cm-cic.fr> > I'm not sure if I understand this comment. There are global > EBML IDs defined that would be valid for any EBML format. > There are 8 for the EBML header. There CRC-32 and VOID > elements are another two which I /think/ are valid for all > EBML files. That is 10 altogether. That shouldn't be to Well, I thought adding namespace support would be a good time to clean this up. I think EBML will evolve and new IDs will be used for new features. > many to keep track of. I am suggesting one more, and with a The need to keep track of things is always a hassle, especially if another solution is available. Namespaces provide isolation between different vocabularies and EBML is no exception. Within its own private namespace, EBML could evolve without worrying about which IDs the other formats have taken. > 4 byte ID to boot. Between [80] and [1F][FF][FF][FF] there > are more than 500 million different IDs, so it makes a random > collision pretty unlikely, even if someone weren't aware of > the pre-existing IDs. Yes, it's unlikely, until one runs into a collison. Not long ago, it was also highly impossible for the 4GB memory limit on 32-bit processors to be reached. Moreover I do not think format writers choose their IDs really randomly, they're humans after all... If there is a way to completely avoid such problems, I would rather take it. > The ID's I suggested make at least 16 bytes overhead for > embedding another format (1 byte DocType, though in this case > "matroska" is 8 bytes). You could certainly make two of > those single byte IDs to drop it to 10 byte minimum. I would > leave the parent as a long ID to avoid collision. The use of namespaces goes way beyond embedding other formats. For that sole purpose, your proposal is valid, but it doesn't really suit other uses such as annotation. > I would think that if a format is low-bitrate enough for it > to matter and wants to have that many instances of another > format, it should just include all of the sub-format's > elements into it's own specifications. That's not the right way to do it IMO. If you have a very good multimedia container ;) and you want to add some features to it, (e.g. for a video editing program, adding video data to play the stream backwards effectively) you don't want to rewrite it from scratch. You just have to add new elements in your own namespace. The very purpose of this is that all the video players for that format still will be able to play the file and will just ignore the extra elements from the namespaces unknown to them. > For instance, if I wanted to include some Matroska Tags, > excluding the overhead for a name space breakout, you will > need at least 25 bytes just to do a single Tag. And that > ignores the oddity of such a task. It would be more > efficient to just have a part of the specs for your format > say, "Matroska element 'Tags' and all of it's children are > valid at this point." This is correct if your format is designed to embed the features from the beginning. In this case you don't even need to use matroska's IDs. However namespaces become useful for extensions that the original engineering didn't include or are not relevant in the original design. For example an hypothetic document format could be defined with EBML. Editors may want to include revision data, so they can put into the different elements some info about who wrote them, when they were written and if applicable, how they were altered. This information is irrelevant to basic document readers, and should not be included in the original format. This opens the possibility to write formats that are not usable by themselves, but are designed to be embedded as extensions in other formats. JC From steve.lhomme at free.fr Fri Apr 28 14:35:41 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Fri, 28 Apr 2006 14:35:41 +0200 Subject: [Matroska-devel] Re: [mosu] r3215 - in trunk/prog/video/mkvtoolnix: . debian In-Reply-To: <20060428121936.08DBD440004@p15097576.pureserver.info> References: <20060428121936.08DBD440004@p15097576.pureserver.info> Message-ID: <44520C1D.6040306@free.fr> Why 0.8.0 ? We'll probably add namespaces in a few weeks so we will have a major release by then. moritz at bunkus.org wrote: > Author: mosu > Date: 2006-04-28 14:19:35 +0200 (Fri, 28 Apr 2006) > New Revision: 3215 > > Modified: > trunk/prog/video/mkvtoolnix/configure.in > trunk/prog/video/mkvtoolnix/debian/control > Log: > Bumped the required libebml and libmatroska versions to 0.7.7 and 0.8.0 respectively. > > Diffstat: > configure.in | 2 +- > debian/control | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > Modified: trunk/prog/video/mkvtoolnix/configure.in > =================================================================== > --- trunk/prog/video/mkvtoolnix/configure.in 2006-04-28 12:02:02 UTC (rev 3214) > +++ trunk/prog/video/mkvtoolnix/configure.in 2006-04-28 12:19:35 UTC (rev 3215) > @@ -534,7 +534,7 @@ > dnl > ebml_ver_req_major=0 > ebml_ver_req_minor=7 > - ebml_ver_req_micro=5 > + ebml_ver_req_micro=7 > > AC_CACHE_CHECK([for libEBML headers version >= ${ebml_ver_req_major}.${ebml_ver_req_minor}.${ebml_ver_req_micro}], > [ac_cv_ebml_found],[ > > Modified: trunk/prog/video/mkvtoolnix/debian/control > =================================================================== > --- trunk/prog/video/mkvtoolnix/debian/control 2006-04-28 12:02:02 UTC (rev 3214) > +++ trunk/prog/video/mkvtoolnix/debian/control 2006-04-28 12:19:35 UTC (rev 3215) > @@ -2,7 +2,7 @@ > Section: graphics > Priority: optional > Maintainer: Moritz Bunkus > -Build-Depends: debhelper (>> 4.0.0), libebml-dev (>= 0.7.3-1), libmatroska-dev (>= 0.7.5-1), libogg-dev, libvorbis-dev, libwxgtk2.4-dev|libwxgtk2.5-dev|libwxgtk2.6-dev, libexpat1-dev, zlib1g-dev, liblzo-dev, libbz2-dev, libflac-dev, groff, libmagic-dev > +Build-Depends: debhelper (>> 4.0.0), libebml-dev (>= 0.7.7-1), libmatroska-dev (>= 0.8.0-1), libogg-dev, libvorbis-dev, libwxgtk2.4-dev|libwxgtk2.5-dev|libwxgtk2.6-dev, libexpat1-dev, zlib1g-dev, liblzo-dev, libbz2-dev, libflac-dev, groff, libmagic-dev > Standards-Version: 3.5.8 > > Package: mkvtoolnix-mb > -- robUx4 on blog From steve.lhomme at free.fr Fri Apr 28 14:56:48 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Fri, 28 Apr 2006 14:56:48 +0200 Subject: [Matroska-devel] Re: [mosu] r3215 - in trunk/prog/video/mkvtoolnix: . debian In-Reply-To: <44520C1D.6040306@free.fr> References: <20060428121936.08DBD440004@p15097576.pureserver.info> <44520C1D.6040306@free.fr> Message-ID: <44521110.7070901@free.fr> Steve Lhomme wrote: > Why 0.8.0 ? > > We'll probably add namespaces in a few weeks so we will have a major > release by then. Never mind, I just realised it's the mkvmerge dependencies... And these are the current versions. Doh! Steve > moritz at bunkus.org wrote: >> Author: mosu >> Date: 2006-04-28 14:19:35 +0200 (Fri, 28 Apr 2006) >> New Revision: 3215 >> >> Modified: >> trunk/prog/video/mkvtoolnix/configure.in >> trunk/prog/video/mkvtoolnix/debian/control >> Log: >> Bumped the required libebml and libmatroska versions to 0.7.7 and >> 0.8.0 respectively. >> >> Diffstat: >> configure.in | 2 +- >> debian/control | 2 +- >> 2 files changed, 2 insertions(+), 2 deletions(-) >> >> Modified: trunk/prog/video/mkvtoolnix/configure.in >> =================================================================== >> --- trunk/prog/video/mkvtoolnix/configure.in 2006-04-28 12:02:02 >> UTC (rev 3214) >> +++ trunk/prog/video/mkvtoolnix/configure.in 2006-04-28 12:19:35 >> UTC (rev 3215) >> @@ -534,7 +534,7 @@ >> dnl >> ebml_ver_req_major=0 >> ebml_ver_req_minor=7 >> - ebml_ver_req_micro=5 >> + ebml_ver_req_micro=7 >> >> AC_CACHE_CHECK([for libEBML headers version >= >> ${ebml_ver_req_major}.${ebml_ver_req_minor}.${ebml_ver_req_micro}], >> [ac_cv_ebml_found],[ >> >> Modified: trunk/prog/video/mkvtoolnix/debian/control >> =================================================================== >> --- trunk/prog/video/mkvtoolnix/debian/control 2006-04-28 12:02:02 >> UTC (rev 3214) >> +++ trunk/prog/video/mkvtoolnix/debian/control 2006-04-28 12:19:35 >> UTC (rev 3215) >> @@ -2,7 +2,7 @@ >> Section: graphics >> Priority: optional >> Maintainer: Moritz Bunkus >> -Build-Depends: debhelper (>> 4.0.0), libebml-dev (>= 0.7.3-1), >> libmatroska-dev (>= 0.7.5-1), libogg-dev, libvorbis-dev, >> libwxgtk2.4-dev|libwxgtk2.5-dev|libwxgtk2.6-dev, libexpat1-dev, >> zlib1g-dev, liblzo-dev, libbz2-dev, libflac-dev, groff, libmagic-dev >> +Build-Depends: debhelper (>> 4.0.0), libebml-dev (>= 0.7.7-1), >> libmatroska-dev (>= 0.8.0-1), libogg-dev, libvorbis-dev, >> libwxgtk2.4-dev|libwxgtk2.5-dev|libwxgtk2.6-dev, libexpat1-dev, >> zlib1g-dev, liblzo-dev, libbz2-dev, libflac-dev, groff, libmagic-dev >> Standards-Version: 3.5.8 >> >> Package: mkvtoolnix-mb >> > -- robUx4 on blog From haessije at eps.e-i.com Fri Apr 28 17:38:23 2006 From: haessije at eps.e-i.com (HAESSIG Jean-Christophe) Date: Fri, 28 Apr 2006 17:38:23 +0200 Subject: [Matroska-devel] Unknown elements in Matroska Message-ID: <2684397F36DC8849A9BF842433B784360201EF0B@GZI-VM01.cm-cic.fr> Hello, I didn't have time to inspect that, but I wonder how libebml (in the current Matroska implementations) reacts when it encounters unexpected (unknown) elements. I don't remember directives about that in the spec, so if someone knows the answer it would save me some time. Thanks, JC From paul at msn.com Fri Apr 28 18:38:20 2006 From: paul at msn.com (Paul Bryson) Date: Fri, 28 Apr 2006 11:38:20 -0500 Subject: [Matroska-devel] Re: Unknown elements in Matroska In-Reply-To: <2684397F36DC8849A9BF842433B784360201EF0B@GZI-VM01.cm-cic.fr> References: <2684397F36DC8849A9BF842433B784360201EF0B@GZI-VM01.cm-cic.fr> Message-ID: AFAIK, it ignores them as per the specification. Atamido HAESSIG Jean-Christophe wrote: > Hello, > > I didn't have time to inspect that, but I wonder how > libebml (in the current Matroska implementations) reacts > when it encounters unexpected (unknown) elements. I > don't remember directives about that in the spec, so if > someone knows the answer it would save me some time. > > Thanks, > JC > _______________________________________________ > Matroska-devel mailing list > Matroska-devel at lists.matroska.org > http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel > Read Matroska-Devel on GMane: http://dir.gmane.org/gmane.comp.multimedia.matroska.devel > From musafir_86 at yahoo.com Sat Apr 29 16:29:55 2006 From: musafir_86 at yahoo.com (MusafirKelana_86) Date: Sat, 29 Apr 2006 14:29:55 +0000 (UTC) Subject: [Matroska-devel] Matroska without Thumbnail Support. Message-ID: Hello everyone, -Can I get the latest binary *WITHOUT* thumbnail support for Matroska? -Previously, I've used thumbnail support (& other support like WMP- playlist/burnlist, etc) for OGG/OGM,MP4,MKV using AVI-related settings in registry (copy related keys, CLSIDs). So far, it works okay, until Haali Splitter includes thumbnail support - my method no longer works. Just OGG/OGM & MP4 remains the same. -So, can somebody compile the latest code *WITHOUT* thumbnail support, and send it to me? Thank you. -Regards, MusafirKelana_86. P/S : Sorry for my poor English... From steve.lhomme at free.fr Sat Apr 29 19:19:10 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Sat, 29 Apr 2006 19:19:10 +0200 Subject: [Matroska-devel] Matroska without Thumbnail Support. In-Reply-To: References: Message-ID: <4453A00E.8090903@free.fr> MusafirKelana_86 wrote: > Hello everyone, > > -Can I get the latest binary *WITHOUT* thumbnail support for Matroska? > > -Previously, I've used thumbnail support (& other support like WMP- > playlist/burnlist, etc) for OGG/OGM,MP4,MKV using AVI-related settings in > registry (copy related keys, CLSIDs). So far, it works okay, until Haali > Splitter includes thumbnail support - my method no longer works. Just OGG/OGM > & MP4 remains the same. > > -So, can somebody compile the latest code *WITHOUT* thumbnail support, and > send it to me? I thought Haali's installer had the option to disable thumbnails. Are you using the installer in silent mode ? Steve -- robUx4 on blog From steve.lhomme at free.fr Sat Apr 29 19:23:59 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Sat, 29 Apr 2006 19:23:59 +0200 Subject: [Matroska-devel] Re: [Matroska-users] Problem with mkvmerge In-Reply-To: <44538AD4.8030703@xs4all.nl> References: <44538AD4.8030703@xs4all.nl> Message-ID: <4453A12F.1050402@free.fr> Hi, (this email is probably for matroska-devel) salsaman wrote: > Hi, > I posted this before, but I have yet to receive a reply. > > > > I am having a problem running mkvmerge: > > mkvmerge: symbol lookup error: mkvmerge: undefined symbol: > _ZN11libmatroska8KaxBlock9SetParentERNS_10KaxClusterE > > I am wondering if the problem is in mkvmerge or in libmatroska. > > Package versions are: > mkvmerge from mkvtoolnix-1.5.6-1mdk (Build Date: Wed 07 Sep 2005 > 08:38:22 PM CEST) > libmatroska from libmatroska0-0.8.0-0.1.101plf (Build Date: Wed 02 Nov > 2005 01:27:31 PM CET) What OS are you using ? Mandriva ? If mkvtoolnix is provided by your OS you'd better use the related libebml & libmatroska. Although, I might be wrong but, mkvmerge might be using libmatroska and libebml with static linking. Anyway, it seems you'd better compile all 3 packages by yourself to avoid such problems. The other option is to get precompiled RPMs: http://bunkus.org/videotools/mkvtoolnix/downloads.html#mandriva > You might also be interested to know that the project which I develop, > LiVES has support for encoding some files in a matroska container. Most > notably we have had success with x264/vorbis in a matroska container. An > example produced by one of the other LiVES developers can be found here: > http://lives.reimeika.ca/cvs/m-apt_03_lo_dle.mkv > http://lives.reimeika.ca/cvs/m-apt_03_hi.mkv Sounds good ! I might try it on Ubuntu when I find the time :) Steve > Regards, > Gabriel "salsaman", > http://lives.sourceforge.net > > > _______________________________________________ > Matroska-users mailing list > Matroska-users at lists.matroska.org > http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-users > Read Matroska-Users on GMane: > http://dir.gmane.org/gmane.comp.multimedia.matroska.user -- robUx4 on blog From steve.lhomme at free.fr Sun Apr 30 17:51:43 2006 From: steve.lhomme at free.fr (Steve Lhomme) Date: Sun, 30 Apr 2006 17:51:43 +0200 Subject: [Matroska-devel] EBML Namespaces In-Reply-To: <2684397F36DC8849A9BF842433B784360201EC27@GZI-VM01.cm-cic.fr> References: <2684397F36DC8849A9BF842433B784360201EC27@GZI-VM01.cm-cic.fr> Message-ID: <4454DD0F.7090406@free.fr> HAESSIG Jean-Christophe wrote: > The lowlevel EBML parser's job will be to demultiplex the > elements from different vocabularies and feed them to the > correct application. Since the namespace information is only > a means to multiplexing, it is not forwarded to the application > and therefore "removed". Technically I think an application should "register" itself as capable of understanding namespace X. This way this application gets the element in the order they appear, while separating the namespaces could mean a 'desynchronisation' of the level where the ID was found. Then if the "host" app wants to handle namespace separately in the code, it's its responsability. > Below is a schematic of my idealized vision of the EBML logical > infrastructure : > _____ > Lowlevel / \ > parser --------->| | X application > ______ __ /elements | | > | | / \ / in NS X \_____/ > | |----->| |---- _____ > | | | |---- elements / \ > |______| \__/ \in NS Y | | Y application > EBML File ^| ---------->| | > ||elements \_____/ > Notification||in EBML NS ^ > about NS || | > registration|| _____ |EBML API > &state after|| / \ |e.g. seeking > seeking |*-->| |<--------* > *----| | EBML application > \_____/ > > A plain old EBML application (e.g. Matroska) without namespaces > Would extend the EBML application. > A new one (with namespace support) would implement a new > application module and register to the lowlevel parser. Yes, we probably meant the same thing then ;) > Now where do I see the NS IDs precisely ? > > [Class-ID][Size][Data ****] > / \_______________________________ > / \ > [Length Descriptor][Namespace][Class Value] > > The lowlevel parser reads the length descriptor and extracts > the value bits (NS+CLASS). In these bits are the namespace of > the element and the class ID value. Before the parser tries > to decode the NS, all the possible NS values that this element > may use must be known to the parser (that's somewhat obvious, > but it doesn't mean that NS definitions have to physically be > written before the element using them -- think XML ;). > > The possible NS values must be prefix codes, this allows the > parser to scan the values bits from left to right and exactly > know when the NS ID ends. The rest of the value bits are the > Class-ID value which is forwarded to the corresponding > application. > > Examples with 3 namespaces : > 0 : EBML > 10 : NSX > 11 : NSY Would you have a variable length size or use 00, 01, 10, 11 ? The number of namespaces in the file is written in the EBML which is mandatory to read. So it's known beforehand how many bits will be needed to read the namespace. And therefore I think it's better to have a fixed length, it will use less space. The impact on backward compatibility (matroska) is about the same as some Class-ID will need to be extended by 1 octet to remain valid. Now as we're probably heading for this compatibility issue (only for files including more than 1 namespace) I'd also like to introduce the EbmlMaster bit in the ID header. So that it's possible for an EBML parser to parse the whole tree without knowing anything about the semantic. > 0x8F = 0b10001111 Namespace : EBML > LNVVVVVV Class-ID value : 15(dec) > > 0xDB = 0b11011011 Namespace : NSX > LNNVVVVV Class-ID value : 27(dec) > > 0x7D09 = 0b01111101 00001001 Namespace : NSY > LLNNVVVV VVVVVVVV Class-ID value : 3337(dec) > >>> EBML is supposed to be a byte-aligned format and it would >>> require at least 1 extra byte for each element. This is not >>> bad in itself, but it would waste a great amount of bits, >>> since I do not expect files with more than 5 mixed namespaces >>> to be frequent. Therefore, I expect the namespace value to take >>> up to 3 bits in most cases, this is why I try to pack it into >>> an existing field. >> Well, what happens when you need 6 or 7 ? You don't have any >> more bits left. Adding another byte or 2 gives room for > > When I say that I do not expect files using more than 5 namespaces > to be frequent it doesn't mean that I want to disallow files which > need to use more... It only means that we should be able to encode > the NS on less than 1 byte (preferably only 2 or 3 bits) for the most > frequent case. Cases requiring 300+ simultaneous namespaces will > still be allowed, but they will use more bits (8,9,10,more) to > express the namespace of the various contained elements. OK that's fine. Unfortunately the matroska IDs were designed to make use of as many different bits as possible (less false alarm in case of errors). And we needed as varied elements as possible in a format where all tags would be global. Now with namespaces that constraint will fall. > Moreover, even if we have files using many namespaces, I really > doubt they will massively mix them -- a localized use of some > namespace is more likely. Therefore we should definitely have > some namespace switching feature, and namespace IDs should not > be globally fixed for a given file. Yes, there is no point at a level X to use different namespaces in random order as their semantic is orthogonal. So the elements will probably end up being grouped by namespace. Given that it means Atamido's proposition for a new special tag might be enough ! But it should be a Class-A tag to avoid overhead. In this case it would be 3 octets for each added namespace: the ID, the size of the namespace (could be more than one if that element is an EbmlMaster), the namespace value. Another option would be to use a new EBML type, similar to EbmlMaster but with a namespace ID before the other elements. Using such an element/type could be problematic though. It can keep matroska compatibility easily only if we revert to the default/main namespace in absence of that container element. That means, as proposed before, we always assume, in the absence of a namespace, that we use the default. That proposition also allow to chain namespaces (like std::iterator::int). Seeking anywhere in the file (at the EBML level) is still a problem (in all cases proposed) as we are unable to recover the complete namespace context. It could only work for 0 or 1 namespace in the chain. Another problem if we don't have the notion of default namespace is to seek in matroska (semantic level) because that means the level 0 would need to be contained in the default namespace, and therefore would need to be prepended with that new ID. That means it's not backward compatible at all. Now chaining namespaces may not be so clean. Imagine you have a namespace for "comments" and a namespace for "signature". You can put comments anywhere in your file (discarded in matroska) and you can put signature anywhere in the file. But if what you add a comment in a signature or a signature in a comment ? How to interpret "signature::comment" or "comment::signature" IDs ? In that case we only need to use "comment" or "signature"... So is namespace chaining a feature we want ? Or we always revert to the last seen namespace ? (in which case seeking becomes easy) >> I'm not too concerned about the overhead because right now if >> you need a lot of IDs you need to use 2 octets long ones. >> While with a namespace most IDs for each namespace won't need >> a lot of room (127 possibilities for Class A IDs). So in the >> end there should be a good balance. > > EBML is a wonderful structured format. I don't mean to restrain > format writers from defining the format they like, but using a > lot of IDs in some flat space is not what I would call good > engineering. As for XML, I think the tree-like encapsulating You're right. Even matroska could do with a better element mapping. Especially to show how matroska is modular. We could easily see if a parser supports a module and not the others... (like tags or chapters) >> Using 3 bits in the ID header would reduce the number of >> possible Class A IDs of a format to 2^4-1 = 15 ! That's too >> small IMO. So I think adding another bit will give us more >> freedom and space and almost no cost. > > With a variable-width NS-ID embedded in the ID header, a > separation in classes is less relevant. In cases where only 2 > namespaces are used, only 1 bit is needed, thus 6 bits are left > to encode the element ID value. Format writers should be aware > that element IDs ranging from 0 to 63(dec) use at least 1 byte, > maybe more, depending on the number of namespaces actually in > use. For example, if the namespace ID for one element uses 2 > bits, element ID values ranging from 32 to 4095 will need 2 > bytes. If the namespace ID 3 bits, element ID values ranging > from 16 to 2047 will need 2 bytes, and element ID values > ranging from 2048 to 262143 will need 3 bytes, etc... > > The following chart summarizes how much bytes are required to > encode an element ID assuming various NS ID lengths. In fact > it could be a little smarter than that (there are still some > wasted bits but for the moment I don't know what to do with > them). > > |NS ID Length Bytes Required -------> > v 1 2 3 4 5 > 0 0-127 128-16383 16384-2097151 2097152-268435455 268435456-34359738367 > 1 0-63 64-8191 8192-1048575 1048576-134217727 134217728-17179869183 > 2 0-31 32-4095 4096-524287 524288-67108863 67108864-8589934591 > 3 0-15 16-2047 2048-262143 262144-33554431 33554432-4294967295 > 4 0-7 8-1023 1024-131071 131072-16777215 16777216-2147483647 > 5 0-3 4-511 512-65535 65536-8388607 8388608-1073741823 > 6 0-1 2-255 256-32767 32768-4194303 4194304-536870911 > 7 * 1-127 128-16383 16384-2097151 2097152-268435455 > 8 * 1-63 64-8191 8192-1048575 1048576-134217727 > > Also, I don't intend to change the Reserved Values : these are still > 127,16383,2097151,268435455,34359738367, etc, regardless of the > inserted NS ID. > >>> You seem to be prepared to make big changes to the format, >> but I don't >>> know to what extent whe should break compatibility... >> Again, for current Matroska files it shouldn't be a problem >> as matroska would be the default namespace. In that case the >> namespace shouldn't be used for such IDs. That means older >> files will play without any problem in namespace-aware >> parsers. Only newer files containing some namespace will not >> be usable by older parsers. Which AFAIK is the same with what >> you propose. >> >> BTW, we still haven't discussed how to define the namespace >> in the EBML header, but the DocType existing today will >> remain. And that is the way to define the special namespace >> that will be used as default... The other namespaces will >> probably fall back in an list. It's like the DocType in XML >> (like html). At least we got the right name for that field ;) > > I understand that you are referring to a "default" namespace as > in XML. Well, that's not how I see it. If there is some kind of > default namespace i.e. not expressed for elements in this > namespace we still need a way to tell that the namespace value > is present or absent. I intend to have two kinds of files : > * Files without namespaces, i.e. NS length = 0. This > mathematically forbids the presence of namespaces since the > empty code is a prefix for all codes. > * Files using namespaces, i.e. NS length >= 1. There can be > only 2 namespaces (0 & 1, each using one bit), or more, as > seen in the previous examples, but each element holds a > namespace ID. This is only necessary for your proposal (and mine). As seen above having multiple namespaces nested might not be a problem. And we need to have a default fall back value (when we're out of the new element/type). >> Yes that would be a limit but seeking in a file format is a >> very special feature not used a lot for most formats. It >> makes sense in A/V formats where there is a timeline, but >> then you need to know the semantic to know what you're >> looking for when seeking. > > A good seeking feature could be quite useful in applications > without a timeline : large images can be tiled and zooming to > a precise position can be sped up by seeking info indexed by > (x,y) coordinates. Databases using EBML would clearly benefit > from seeking features. I think I could find many other > examples... Yes, I've been dreaming about a DB that would be EBML based or a file system. And given custom attributes are present in modern file systems it could be an option. Seeking in a file system would also be very important. Steve (thinks we're getting somewhere) -- robUx4 on blog