Welcome, Guest.
Please login or register.
Unzip 6.0 is missing option -O
Forum Login
Login Name: Create a new account
Password:     Forgot password

Info-ZIP Discussion Forum    Info-ZIP Bugs    UnZip Bugs  ›  Unzip 6.0 is missing option -O

Unzip 6.0 is missing option -O  This thread currently has 6719 views. Print
4 Pages « 1 2 3 4 » All Recommend Thread
chkm
June 22, 2010, 3:31pm Report to Moderator
Baby Member
Posts: 3
For us it would be also extremly useful to have such an option or to get unzip to extract CP437-encoded filepaths correctly (as CP437 is the default encoding for DOS/Windows-zipped archives).
Logged
Private Message Reply: 15 - 47
Taurus
July 3, 2010, 11:06am Report to Moderator
Baby Member
Posts: 3
Hello. I is mainteiner unzip package for Russian Fedora Remix distribution. I learn situation with non-ascii encoding. It's very actually. Many people (non only from russian) use it. Look at
http://www.google.ru/search?q=unzip+encoding+problem
https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/580961
http://www.linuxfromscratch.or.....ale-assumed-encoding
http://www.linuxfromscratch.org/blfs/view/cvs/general/unzip.html

info-zip support many rare and old architectures which very few people uses but it not support non-ascii encoding which widespread. I think what it not right.
And we have so situation in which even distribution must include patch which solve problem with encoding.

What says about it maintainer?
Logged
Private Message Reply: 16 - 47
EG
July 13, 2010, 3:13am Report to Moderator
Info-ZIP Team
Posts: 463
Sorry for the delay getting back to this.

Things might have changed here since the original posting and it may now be possible to get this patch implemented.  No guarantees, but it looks possible.

If there is still interest in this patch, it might get added to UnZip 6.1a, the next public beta in the works.  It needs to be specific to UnZip 6.0 and have enough context that the changes can be made to internal beta UnZip 6.1a03, which already has significant changes.  Note that we generally make patch changes by hand, doing sanity checks as we go.

Unless someone has a better choice, we'll start with unzip60-alt-iconv-utf8.patch and go from there.

Looks like libiconv is under the LGPL, so it might be workable as long as the user is required to get libiconv themselves or somehow the library is already available.
Logged
Private Message Reply: 17 - 47
Taurus
July 14, 2010, 9:25am Report to Moderator
Baby Member
Posts: 3
Yes we are very intresting. cool ))
Logged
Private Message Reply: 18 - 47
csa
July 14, 2010, 10:12am Report to Moderator
Baby Member
Posts: 7
The proposed patch has support for only a few hardcoded encodings. I have a better solution for unzip. It is based on libraries from RusXMMS project and is multilanguage by design. New languages/encodings can be added using configuration file without rebuild. More you don't need to use '-O' option, but for most languages the correct encoding is autodected. If there is interest I can port patch to the latest alpha.
http://dside.dyndns.org/darklin/portage/app-arch/unzip/files/unzip-ds-lazyrcc.patch
Logged
Private Message Reply: 19 - 47
EG
July 14, 2010, 5:06pm Report to Moderator
Info-ZIP Team
Posts: 463
Any patches to the actual code must be distributable under the Info-ZIP license, which is similar to other open licenses but allows commercial use.  Generally any patch that adds restrictions on distribution or use will likely be rejected.

We also generally reject any patches that require configuration tables or other similar things.  Once compiled, the code needs to run independent of other files on the system, with the exception being the environment variables.

The thought was to use a library like libiconv that someone else maintains.  Internally we discussed creating our own tables, but that is way too much work for us to maintain.  It should be as simple as installing a library on your system and linking to it.

I seem to remember some patch out there somewhere for autodetecting character set encodings.  That was years ago, though.  I know it can be done with some level of success.

I haven't looked yet at the RussXMMS patch, so don't know if that meets the need.

If you all can weigh the various patches out there against the above requirements and make some suggestions, would appreciate that.  We got too much going on right now.  This needs to be an easy patch to get done.
Logged
Private Message Reply: 20 - 47
csa
July 14, 2010, 6:59pm Report to Moderator
Baby Member
Posts: 7
RusXMMS libraries are under LGPL, so there should be no problem with licensing. The configuration files are optional: there is predefined configuration (autodection for some languages and static rules for others) which can be overriden if the config files are provided.  Since the job is done by RusXMMS libraries, the patch is pretty functional and small. The 'iconv' patch provides more changes to the code and brings much less functionality.
Btw. The libraries are included at least in Debian/Ubuntu and OpenSuSe. And OpenSuSe adopts RusXMMS patch for unzip package.
Logged
Private Message Reply: 21 - 47
EG
July 14, 2010, 7:43pm Report to Moderator
Info-ZIP Team
Posts: 463
So this does not use the -O option, but should automatically detect the current codepage and display the characters appropriately.  Hmm.  Sounds good.  It would have to bow out and let Unicode do its thing when that is enabled.  There should also be an option to disable it.  It also should be listed in the unzip -v list when present.

By the way, last night I think I got the ? issue with filenames fixed when Unicode is enabled.  The old check was not using wide characters for the checks.

There are quite a few changes in the UnZip 6.10a beta.  You probably should work with that.  Probably getting about that time to post it as a public beta, though we need to prepare it for that (like updating the documentation) and we got some things in the works we should finish before it goes out.

How does one get and install this library on Windows as well as Linux?  Is it available for Mac OS X (which it should as that might just use a Unix version).  (Given all the other things going on, figure I'd let you do the looking.)
Logged
Private Message Reply: 22 - 47
EG
July 14, 2010, 7:47pm Report to Moderator
Info-ZIP Team
Posts: 463
Before we get too carried away, if anyone has any other suggestions or gripes, please post them.  Our time is limited, so if a chosen solution doesn't work out, our first thought may be to just pull it and move on to other things, and there's quite a few of them pending.
Logged
Private Message Reply: 23 - 47
Al Dunsmuir
July 15, 2010, 2:26am Report to Moderator
Info-ZIP Team
Posts: 94
Ed,
I'd like to see the latest unzip changes all rolled up, so I have a better basis to know how to comment for z/OS USS and MVS.   It will also help to help me understand the latest zip codebase.   The code for this area of support will need to be available in both zip and unzip on z/OS for both USS and MVS.

I'm going to be getting further up to speed on the various Unicode issues and runtime support capabilities on z/OS for my main work project. The intersection between various ASCII and EBCDIC code pages, plus the historic zip/upzip translation tables is messy... adding Unicode is not going to make it any more pleasant.  Tactical solutions applied over the years will require additional care to support transparently.

There should be attributes per file that describe the character set(s) used to encode each file, since archives can contain files processed on multiple platforms in separate zip invocations. I'm not sure if that is currently the case.

zip and unzip should always attempt to do the right thing, but user options should support overrides to both the source and target character sets during character translation processing.
Al
Logged
Private Message Reply: 24 - 47
EG
July 15, 2010, 6:27am Report to Moderator
Info-ZIP Team
Posts: 463
Quoted from Al Dunsmuir
I'd like to see the latest unzip changes all rolled up, so I have a better basis to know how to comment for z/OS USS and MVS.   It will also help to help me understand the latest zip codebase.   The code for this area of support will need to be available in both zip and unzip on z/OS for both USS and MVS.

Not quite sure what changes you're referring to.  Development always continues (at least for now), so there never really is a fully rolled up version of unzip or zip.  Just snapshots that are the betas.  (Unless we get distracted by other things and nothing happens for awhile.)  Part of what we do is integrate changes into the moving targets.  That said, it looks like UnZip 6.10a is getting closer to the door.  However, Zip 3.1d has some new stuff.

Quoted from Al Dunsmuir
I'm going to be getting further up to speed on the various Unicode issues and runtime support capabilities on z/OS for my main work project. The intersection between
various ASCII and EBCDIC code pages, plus the historic zip/upzip translation tables is messy... adding Unicode is not going to make it any more pleasant.  Tactical solutions applied over the years will require additional care to support transparently.

There should be attributes per file that describe the character set(s) used to encode
each file, since archives can contain files processed on multiple platforms in separate zip invocations. I'm not sure if that is currently the case.

zip and unzip should always attempt to do the right  thing, but user options should support overrides to both the source and target character sets during character translation processing.

The current standard, as negotiated about two years ago, is to use the UTF-8 file name as the file name if the archive entry has one.  This overrides the standard path field.  There are two ways to specify a UTF-8 path, either using an extra field or the standard path field.  There is a new flag bit that tells an unzip that the standard path is in UTF-8.  If the extra field is used, the standard path usually contains a local code page version of the file name.

If there is no UTF-8 for that entry, then the standard path is supposed to use a standard DOS code page according to the standard.  However, in practice the standard path uses the local code page so that zipping and unzipping on the same platform works as expected.  The problem with this is moving archives to other platforms messes up the file names.  That's why we added the UTF-8 encoding that supports all encodings in one character set.

Older archives still have the issue of which code page was used to encode the file name.  That's in fact the problem trying to be solved in this thread.  There have been discussions regarding including the code page encoding, but the bottom line is the UTF-8 field captures the file name without knowing the encoding.  Now that libraries like libiconv are available, conversion between code pages is not so bad, but it still seems unnecessary except for supporting older archives.  Zip in particular will automatically include a UTF-8 path with the entry if the file name is not plain ASCII.

Extra fields are used to capture file attributes specific to a platform or port.

Most users probably don't know of or prefer not to care about code pages.  Also, supporting all possible code pages probably means integrating a library like libiconv into the code, which was an issue until libraries with LGPL licenses became available for many platforms.

PKZip apparently has been thinking of adding a language encoding extra field that might include the code page, but they haven't had a need so it hasn't been done.  The AppNote has had a spot reserved for it for awhile though.

Anyway, this stuff can get complicated.  Sounds like you'll have some fun.
Logged
Private Message Reply: 25 - 47
Taurus
July 15, 2010, 7:02am Report to Moderator
Baby Member
Posts: 3
I think what applying RusXMMS patch it the best solve. Really if there is a possiblity to refuse -O option that why not use such possiblity?
Logged
Private Message Reply: 26 - 47
Lazy_Kent
July 15, 2010, 9:36am Report to Moderator
Baby Member
Posts: 1
We fixed encoding problem in openSUSE with RusXMMS patch and librcd/librcc libraries.
https://bugzilla.novell.com/show_bug.cgi?id=540598
Logged
Private Message Reply: 27 - 47
EG
July 15, 2010, 4:53pm Report to Moderator
Info-ZIP Team
Posts: 463
There seems a couple possibilities on the table, including the iconv patch and the RusXMMS patch.  If you all can do some Google searches and post links to appropriate documentation, it might help other readers of this thread.

In the long run, it seems everyone should be migrating to zipping tools that include UTF-8 and unzipping tools that can read it.  Then all this is not needed.  Currently unzip on Windows has problems restoring file names in other character sets, but this is being worked and could be fixed in the next UnZip 6.1 beta.  So anything done here should be seen as "temporary" and could go away later.  Ironically, the full support of UTF-8 by UnZip may happen in the same beta that this translation feature gets added to, making the translation feature almost obsolete out of the box.

I guess what we can do is add support for the approach you all select as an unsupported feature.  (So you need to agree on something.  Go look at the different patches and go to the web sites of the libraries.)  This means we add support for the code changes needed to call the library, putting those changes into an #ifdef block, so they won't be included by default.  We would not be debugging issues with using the library other than being able to call it and we would not be distributing any library.  We would not distribute executables with this code either.

If that sounds acceptable, we might be able to move forward on this, assuming no one else in the Info-ZIP development group has issues with this.
Logged
Private Message Reply: 28 - 47
Al Dunsmuir
July 15, 2010, 9:39pm Report to Moderator
Info-ZIP Team
Posts: 94
The z/OS Language Environment (C runtime) already have extensive and fully functional iconv support.
It is the recommended method of doing character translation for many years.  Recent z/OS releases have
added native support for Unicode, and iconv is part of that solution,

Character translation is one of the core issues when adapting ASCII-centric tool such as zip and unzip to
an EBCDIC-centric platform.  It is also important for translating between code pages within ASCII or EBCDIC
where "extended" characters have different mappings.  For EBCDIC, these variant characters include such
basic characters as '@', '$', [' and ']'.  This is why we need to be able to control translation on each zip or
unzip invocation on the mainframe.  This is for both file names, and file contents.


For text data, it makes sense to have an optional per-file attribute that identifies the character set of the
file data. I suspect that a separate attribute identifying the line termination - LF, CR LF, NEL, or
the previously discussed per-record length attribute is required. 

Since the name for the file within the zip archive can be either plain ASCII or UTF-8, it appears that yet another
per-file attribute is mandated.

Al
Logged
Private Message Reply: 29 - 47
4 Pages « 1 2 3 4 » All Recommend Thread
Print

Info-ZIP Discussion Forum    Info-ZIP Bugs    UnZip Bugs  ›  Unzip 6.0 is missing option -O