Option -O that allows you to set an encoding for filenames is missing in the latest release.
To test I made a small zip file in Windows XP that has filenames encoded in shift-jis and tried to open it in Linux in UTF8 environment. I have attached the .zip to this post.
The following example will show what happens with and without -O.
Code
$ unzip -l Zip_Test.zip
Archive: Zip_Test.zip
Length Date Time Name
-------- ---- ---- ----
0 07-20-09 11:21 Zip_Test/
37 07-20-09 11:21 Zip_Test/РVЛKГeГLГXГg ГhГLГЕГБГУГg.txt
-------- -------
37 2 files
$ unzip -O shift-jis -l Zip_Test.zip
Archive: Zip_Test.zip
Length Date Time Name
-------- ---- ---- ----
0 07-20-09 11:21 Zip_Test/
37 07-20-09 11:21 Zip_Test/新規テキスト ドキュメント.txt
-------- -------
37 2 files
What version of unzip are you using (unzip -v should list it) and what are you using to create the archive?
The tendency in the zip community is to store UTF-8 in the archive now so that issues of character set conversion (like knowing the names of the from and to character sets) are mostly gone. Currently Zip 3.0 and later support Unicode encoding of paths and UnZip 6.0 and later mostly can handle recreating Unicode paths.
Ah now I get it, I was using the 5.52 Archlinux package and was about to post the package build, but now I see that the -O is from a patch. In the first release of 6.0 the maintainer forgot to include the patch. Now I see the latest version has the patch again. I had not bothered with unzip updates sticking with 5.52, I just updated to try it out and -O works. So thanks, I can enjoy your 6.0 release now. And thanks patch writers.
I used only 5.52 and its patch unzip-5.52-alt-natspec.patch has the effect. My archive with cyrillic file names is extracted and viewed correctly without some special "-O" option. In this patch - the encoding is chosen automatically.
Briefly going through the (sisyphus) patch it appears to be a version of a similar iconv patch for UnZip that was proposed a while back but the UnZip maintainer rejected. Myself, I don't have any problem with including this feature, but the current industry trend is to move to total UTF-8 paths. We've been a bit more sluggish, trying to maintain backward compatibility with existing archives as we move to that.
That said, it's hard to say if we should include this patch in the main release. Given the UnZip maintainer probably would not accept the patch anyway (since he rejected it before), it might be an uphill battle.
It might be worth putting a pointer on the web site to the patch though. If you all had to pick the primary place to download the patch, which would it be? Also, are there instructions anywhere?
Any updates on this? The lack of being able to unzip Windows archives is really critical for my daily use; I went as far was port (poorly) the old altlinux patch to 6.0 here: http://bugs.archlinux.org/task/15256
but I'd much rather have someone who actually knows the code implement this functionality. If the "-O charset" patch isn't accepted, does InfoZip have a "recommended" method of working around these non-unicode zip files? Is there something like a converter from legacy zip files to the new UTF-8 zip files? An endless sequence of: inflating: ?-??+- ??+??? -?10+?s_.txt is extremely annoying.
EG// You can apply the patch like this (The altlinux sisphus patch requires libnatspec): $ cd unzip60 $ patch -Np1 -i unzip60-alt-iconv-utf8.patch
Any updates on this? The lack of being able to unzip Windows archives is really critical for my daily use; I went as far was port (poorly) the old altlinux patch to 6.0 here: http://bugs.archlinux.org/task/15256
but I'd much rather have someone who actually knows the code implement this functionality. If the "-O charset" patch isn't accepted, does InfoZip have a "recommended" method of working around these non-unicode zip files? Is there something like a converter from legacy zip files to the new UTF-8 zip files? An endless sequence of: inflating: ?-??+- ??+??? -?10+?s_.txt is extremely annoying.
EG// You can apply the patch like this (The altlinux sisphus patch requires libnatspec): $ cd unzip60 $ patch -Np1 -i unzip60-alt-iconv-utf8.patch
Thanks. However, it's the decision of the UnZip maintainer and so far he hasn't accepted adding this capability. Could try again, though.
Another possibility is to add the patch to our site, after looking it over and doing some testing. That assumes there are no issues with us distributing the patch and any required files.
I haven't looked at the license issues on either patch. To use the code it would have to be distributable under the Info-ZIP license. What are the license restrictions on your patch (which I assume inherits the restrictions of the patch you modified)?
Sorry, EG it took a while. I've received a response from the AltLinux maintainer and he says that the license of the patch is identical to the original unzip license. Have you checked with the UnZip maintainer? Thanks.
I would like to ask you about adding a patch to support national character sets of filenames. Since it is impossible to read or restore the file names with the unpacked archive.
When I asked one of the maintainers of package unzip in archlinux bugtrack, he explained to me that you are not interested in adding this patch and without your support it would lead to conflicts with other programs.
Briefly going through the (sisyphus) patch it appears to be a version of a similar iconv patch for UnZip that was proposed a while back but the UnZip maintainer rejected. Myself, I don't have any problem with including this feature, but the current industry trend is to move to total UTF-8 paths. We've been a bit more sluggish, trying to maintain backward compatibility with existing archives as we move to that.
That said, it's hard to say if we should include this patch in the main release. Given the UnZip maintainer probably would not accept the patch anyway (since he rejected it before), it might be an uphill battle.
Why do you think that the libnatspec based patch will be rejected too? Can you tell us about the original reason the previous patch was rejected? I tried searching for more information, but I have only found comments made by people other than the maintainer. Currently I all I know is that a previous patch existed, and it was rejected. I could not find any specific rationale for not accepting the patch.
It would be very helpful to know what the unzip maintainer thinks about this. (Or at least what he originally stated as the reason)