on zip31b when ommit -l in parm card wrong Unix end-of-line char X'85' is used. On zip232 the correct Unix end-of-line char X'0A' is used. Is there a way to set a correct default ?. currently we use -ll to get the correct result. Josef
It is still a problem with zip 3.1c. to recreate the problem I tried following steps with new zip 3.1c: 1.) compiled the new zip3.1c verision with make -f mvs.mki command 2.) executed following zip batch job with -a option: //ZIPTEST JOB (#ACCT,IZ),'ZIPTEST',CLASS=P,MSGCLASS=T,NOTIFY=&SYSUID //ZIP EXEC PGM=ZIP, // PARM='/ -a dd:archive [url=mailto:-@']-@'[/url] //STEPLIB DD DSN=IZN.XMITIP.LOAD,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //CEEDUMP DD SYSOUT=* //ARCHIVE DD DSN=TEST.UNZIP.ARCHIV07.ZIP,DISP=(,CATLG,DELETE), // SPACE=(CYL,(1,1),RLSE), // MGMTCLAS=DEL30T //SYSIN DD * 'TEST.UNZIP.ZIPIN.KOERNER.VAR' //* 3.) copied the archive "TEST.UNZIP.ARCHIV07.ZIP" to my PC 4.) extracted the file /TEST/UNZIP/ZIPIN/KÖRNER.VAR 5.) Put the extracted file /TEST/UNZIP/ZIPIN/KÖRNER.VAR in binary to z/OS 6.) viewed the file in TSO in HEX with DISPLAY ASCII Command with following result: ------------------------------------------------------------------------------ MarzipanNr;StatusDatum;Sp.Datum;Primõrbetreuer;Referenznummer;UnterNr;Kontoart;K 467767664735767774677635724677635766E7667767673566676676766673567674734667667734 D12A901EE2B3414534145DB30E4145DB029D4225425552B256525EAE5DD52B5E452E2BBFE4F124BB ------------------------------------------------------------------------------ chaden;Bruttoschadenà91001654;20091016;20091016;90243/Fr.Auer;1/0063/110/00;0;6; 66666634777767666666833333333333333333333333333333333247247673323333233323333333 38145EB22544F338145E591001654B20091016B20091016B90243F62E1552B1F0063F110F00B0B6B ------------------------------------------------------------------------------ 7.) Looking after the word "Bruttoschaden", we can see Line delimeter is X'85' 8.) doing the same procedure with ZIP232 we can see the correct result of X'0A' ------------------------------------------------------------------------------ MarzipanNr;StatusDatum;Sp.Datum;Primõrbetreuer;Referenznummer;UnterNr;Kontoart;K 467767664735767774677635724677635766E7667767673566676676766673567674734667667734 D12A901EE2B3414534145DB30E4145DB029D4225425552B256525EAE5DD52B5E452E2BBFE4F124BB ------------------------------------------------------------------------------ chaden;Bruttoschaden.91001654;20091016;20091016;90243/Fr.Auer;1/0063/110/00;0;6; 66666634777767666666033333333333333333333333333333333247247673323333233323333333 38145EB22544F338145EA91001654B20091016B20091016B90243F62E1552B1F0063F110F00B0B6B So I think the old ZIP232 Verstion uses X'0A' Line delimeter with option -a which is the correct value on unix. It makes sense to use the same delimeter on the new ZIP31c version. Currently I could only cirumvent this problem when using the option -all on zip31c. But this means someone who upgrades from ZIP232 to ZIP31C has to change in all his ZIP Jobs the -a option to -all. So it seems to me easier to use the old correct Line Delimeter as it was in ZIP232
4.) extracted the file /TEST/UNZIP/ZIPIN/KÖRNER.VAR 5.) Put the extracted file /TEST/UNZIP/ZIPIN/KÖRNER.VAR in binary to z/OS
Is this the same file that was added to the archive above? Seems the names are different. How was the file moved? Any chance that what moved it did any line end conversions?
6.) viewed the file in TSO in HEX with DISPLAY ASCII Command with following result: ------------------------------------------------------------------------------
chaden;Bruttoschadenà91001654;20091016;20091016;90243/Fr.Auer;1/0063/110/00;0;6; 66666634777767666666833333333333333333333333333333333247247673323333233323333333 38145EB22544F338145E591001654B20091016B20091016B90243F62E1552B1F0063F110F00B0B6B ------------------------------------------------------------------------------ 7.) Looking after the word "Bruttoschaden", we can see Line delimeter is X'85' 8.) doing the same procedure with ZIP232 we can see the correct result of X'0A' ------------------------------------------------------------------------------
So I think the old ZIP232 Verstion uses X'0A' Line delimeter with option -a which is the correct value on unix. It makes sense to use the same delimeter on the new ZIP31c version. Currently I could only cirumvent this problem when using the option -all on zip31c. But this means someone who upgrades from ZIP232 to ZIP31C has to change in all his ZIP Jobs the -a option to -all. So it seems to me easier to use the old correct Line Delimeter as it was in ZIP232
I'm not sure where the X'85' is coming from?
Can you put together a very small text file that recreates this problem? Just lines like:
123456789a 123456789b 123456789c
That should be enough to verify the line ends. Then attach the original file, the archive with it, and the extracted file. Note where the zipping and the unzipping was done and all parameters.
Hello, have created a three Line test file attached as ebcdic_117.txt. I have it also attached in ascii as ascii_1535.txt. This input file was zipped with zip31c to the output archive attached as ebcdic_6161.zip. ebcdic_6161.zip was then binary downloaded to PC and extracted with winzip to attached file extract_9834.txt. The extracted file was then uploaded in binary to z/os and displayed in hex and looks like this: ------------------------------------------------------------------------------ Line1àLine2àEndà................................................................ 46663846663846680000000000000000000000000000000000000000000000000000000000000000 C9E515C9E5255E450000000000000000000000000000000000000000000000000000000000000000 ------------------------------------------------------------------------------
in the display we can see, the line delimeter is X'85' The Job for zip was used looks like this: //IZ007601 JOB (520000,IZ),'BERGER',CLASS=P,MSGCLASS=T,NOTIFY=&SYSUID //DELDSN EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * DEL IZ00760.UNZIP.INPUT.EBCDIC.ZIP SET MAXCC=0 /* //* NOTE THE PARAMETER LINE IS LOWERCASE, EXCEPT FOR -B //* -V VERBOSE MODE //* -A COMPRESS THE FILE AS ASCII //* -L TRANSLATE THE UNIX END-OF-LINE CHAR LF INTOMSDOS CRLF //* -J INDICATES NOT TO SAVE THE PATH, JUST THE FILENAME //* -K ATTEMPT TO MAKE NAMES MSDOS COMPATIBLE //* -B COMPRESS THE FILE AS BINARY, NOT EBCDIC //* DD:ARCHIVE INDICATES USE DD STATEMENT //ARCHIVE AS OUTPUT ZIP FILE //* -@ INDICATES TO READ THE NAMES OF THE FILE TO ZIP FROM //SYSIN //ZIP EXEC PGM=ZIP, // PARM='/ -a dd:archive [url=mailto:-@']-@'[/url] //STEPLIB DD DSN=IZ00760.INFOZIP.LOAD,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //CEEDUMP DD SYSOUT=* //ARCHIVE DD DSN=IZ00760.UNZIP.INPUT.EBCDIC.ZIP,DISP=(,CATLG,DELETE), // SPACE=(CYL,(1,1),RLSE), // MGMTCLAS=DEL30T //SYSIN DD * 'IZ00760.UNZIP.INPUT.EBCDIC.TXT' //*
Can you attach the file KOERNER.VAR, assuming that's the input file and there's nothing sensitive about its contents?
Is this the same file that was added to the archive above? Seems the names are different. How was the file moved? Any chance that what moved it did any line end conversions?
I'm not sure where the X'85' is coming from?
Can you put together a very small text file that recreates this problem? Just lines like:
123456789a 123456789b 123456789c
That should be enough to verify the line ends. Then attach the original file, the archive with it, and the extracted file. Note where the zipping and the unzipping was done and all parameters.
Thanks for providing the extra data. That was helpful!
I see in archive ebcdic_6161.zip the following file data (from my home grown utility): --- file data --- 4c 69 6e 65 31 0a 4c 69 6e 65 32 0a 45 6e 64 0a L i n e 1 L i n e 2 E n d --- end of file data --- The line ends here look correct. So it looks like the process of getting the file back to z/OS is suspect.
As it looks like you've found the problem, I guess we can consider this one solved.
Translations from EBCDIC to ASCII using the z/OS Language Environment (C runtime) facilities (especially iconv for translations) are defined to return the ASCII NEL (0x85) character.
I would expect the same thing to happen on a Linux system where an interface returns a UTF-8 character string with an End-Of-Line.
If you want specific character sequences at End-Of-Line then code has to be added to look for the NEL and translate to LF (or CR+LF) as required.
A lot of this stuff should be platform-specific, and in the case of z/OS the default behaviours may differ between the MVS and USS environments.
By the way, on z/OS there are two different line termination conventions for bytestream data from historical heritages: - CR (x25) + LF (x0A) from Bisync terminals (TTYs) - NL (x15) from VM and C/C++ compiler.
Defaults should be the standard character sets expected - ISO-8859-1 for ASCII, IBM-037 for MVS EBCDIC and IBM-1047 for USS EBCDIC. Hard-coding any other character sets is a recipe for disaster... by all means add options to support them, but ensure that other locale-specific code is also dynamically changed to match.
BTW, I remember one thing discussed that made me cringe - depending on or setting the _EBCDIC preprocessor symbol. This (like anything else with a leading '_') is a internal control symbol between the C/C++ compiler and runtime. It is generated by the compiler, and used to pass the default character set from the ASCII|NOASCII compiler swiitch. It is used to also pick LE runtime routines... which may or not have the support that ZIP/UNZIP need and that may or may not assume that the current character set is static. Now that the compiler has added support for Unicode literals, it gets even more complex.
z/OS is a really rough platform for to play with character sets and runtime functions... lots of mine fields and pitfalls. Al
Al, is there a possible way in a future release of zip/unzip to replace the ebcdic.c with a call of iconv ?. if one of the future releases of zip/unzip supports a keyword FROMCODE and TOCODE in Parm card like iconv, someone can use translation for his needs. change errors of ebcdic.c are in this case not possible. Sample of iconv in a z/OS Batch environment: //ICONVPRO EXEC PGM=EDCICONV,REGION=2048K, // PARM=('FROMCODE(IBM-1047),TOCODE(ISO8859-1)')
I see that as being an extremely general requirement - requesting iconv translation of data with specific "from" and "to" code pages. It's just that with ASCII<>EBCDIC translation us mainframe folks need it more frequently.
I think we can come up with a syntax that is a little closer to the UNIX standard, so that Ed and crew will approve. Al