|
|
Nodame |
|
Baby Member 
Posts: 8
|
Option -O that allows you to set an encoding for filenames is missing in the latest release. To test I made a small zip file in Windows XP that has filenames encoded in shift-jis and tried to open it in Linux in UTF8 environment. I have attached the .zip to this post. The following example will show what happens with and without -O.
|
Code
$ unzip -l Zip_Test.zip
Archive: Zip_Test.zip
Length Date Time Name
-------- ---- ---- ----
0 07-20-09 11:21 Zip_Test/
37 07-20-09 11:21 Zip_Test/РVЛKГeГLГXГg ГhГLГЕГБГУГg.txt
-------- -------
37 2 files
$ unzip -O shift-jis -l Zip_Test.zip
Archive: Zip_Test.zip
Length Date Time Name
-------- ---- ---- ----
0 07-20-09 11:21 Zip_Test/
37 07-20-09 11:21 Zip_Test/新規テキスト ドキュメント.txt
-------- -------
37 2 files
|
|
|
|
Logged |
|
|
|
|
EG |
|
|
Posts: 331
|
What version of unzip are you using (unzip -v should list it) and what are you using to create the archive?
The tendency in the zip community is to store UTF-8 in the archive now so that issues of character set conversion (like knowing the names of the from and to character sets) are mostly gone. Currently Zip 3.0 and later support Unicode encoding of paths and UnZip 6.0 and later mostly can handle recreating Unicode paths. |
|
Logged |
|
|
|
|
Nodame |
|
Baby Member 
Posts: 8
|
The one I did the above with was the last one where -O was still in which is 5.52. The one where -O is missing is 6.0.
This archive was made with 7z on a Japanese Windows XP with shift-jis filesystem encoding and the commands were run on Linux with UTF8 encoding.
UTF8 to UTF8 works, of course. But non-UTF8 files are plentiful. |
|
Logged |
|
|
|
|
sms |
|
|
Posts: 371
|
> The one I did the above with was the last one where -O was still in > which is 5.52. The one where -O is missing is 6.0.
I don't see a "-O" option in the normal UnZip 5.52 code. Where did you get your UnZip program? Is the source available?
Actual "unzip -v" output might be interesting. |
|
Logged |
|
|
|
|
Nodame |
|
Baby Member 
Posts: 8
|
Ah now I get it, I was using the 5.52 Archlinux package and was about to post the package build, but now I see that the -O is from a patch. In the first release of 6.0 the maintainer forgot to include the patch. Now I see the latest version has the patch again. I had not bothered with unzip updates sticking with 5.52, I just updated to try it out and -O works. So thanks, I can enjoy your 6.0 release now. And thanks patch writers. |
|
Logged |
|
|
|
|
sms |
|
|
Posts: 371
|
Where did you get the patch? |
|
Logged |
|
|
|
|
Nodame |
|
Baby Member 
Posts: 8
|
|
Logged |
|
|
|
|
noldor |
| September 14, 2009, 2:17pm |
|
Baby Member 
Posts: 1
|
|
Logged |
|
|
|
|
EG |
| September 18, 2009, 6:34am |
|
|
Posts: 331
|
Briefly going through the (sisyphus) patch it appears to be a version of a similar iconv patch for UnZip that was proposed a while back but the UnZip maintainer rejected. Myself, I don't have any problem with including this feature, but the current industry trend is to move to total UTF-8 paths. We've been a bit more sluggish, trying to maintain backward compatibility with existing archives as we move to that.
That said, it's hard to say if we should include this patch in the main release. Given the UnZip maintainer probably would not accept the patch anyway (since he rejected it before), it might be an uphill battle.
It might be worth putting a pointer on the web site to the patch though. If you all had to pick the primary place to download the patch, which would it be? Also, are there instructions anywhere? |
|
Logged |
|
|
|
|
darehanl |
| September 24, 2009, 3:20am |
|
Baby Member 
Posts: 2
|
Any updates on this? The lack of being able to unzip Windows archives is really critical for my daily use; I went as far was port (poorly) the old altlinux patch to 6.0 here: http://bugs.archlinux.org/task/15256but I'd much rather have someone who actually knows the code implement this functionality. If the "-O charset" patch isn't accepted, does InfoZip have a "recommended" method of working around these non-unicode zip files? Is there something like a converter from legacy zip files to the new UTF-8 zip files? An endless sequence of: inflating: ?-??+- ??+??? -?10+?s_.txt is extremely annoying. EG// You can apply the patch like this (The altlinux sisphus patch requires libnatspec): $ cd unzip60 $ patch -Np1 -i unzip60-alt-iconv-utf8.patch |
|
Logged |
|
|
|
|
EG |
| September 26, 2009, 4:23am |
|
|
Posts: 331
|
Any updates on this? The lack of being able to unzip Windows archives is really critical for my daily use; I went as far was port (poorly) the old altlinux patch to 6.0 here: http://bugs.archlinux.org/task/15256but I'd much rather have someone who actually knows the code implement this functionality. If the "-O charset" patch isn't accepted, does InfoZip have a "recommended" method of working around these non-unicode zip files? Is there something like a converter from legacy zip files to the new UTF-8 zip files? An endless sequence of: inflating: ?-??+- ??+??? -?10+?s_.txt is extremely annoying.
Agree.
EG// You can apply the patch like this (The altlinux sisphus patch requires libnatspec): $ cd unzip60 $ patch -Np1 -i unzip60-alt-iconv-utf8.patch
Thanks. However, it's the decision of the UnZip maintainer and so far he hasn't accepted adding this capability. Could try again, though. Another possibility is to add the patch to our site, after looking it over and doing some testing. That assumes there are no issues with us distributing the patch and any required files. I haven't looked at the license issues on either patch. To use the code it would have to be distributable under the Info-ZIP license. What are the license restrictions on your patch (which I assume inherits the restrictions of the patch you modified)? |
|
Logged |
|
|
|
|
darehanl |
| November 23, 2009, 8:02pm |
|
Baby Member 
Posts: 2
|
Sorry, EG it took a while. I've received a response from the AltLinux maintainer and he says that the license of the patch is identical to the original unzip license. Have you checked with the UnZip maintainer? Thanks. |
|
Logged |
|
|
|
|
|