I have created the following PHP script using Magic Parser with file HR-20061228-3698944.xml:
<?php
require("MagicParser.php");
function myRecordHandler($record)
{
// This is where you write your code to process each record, such as loading a database
// You can display the record contents using PHP's internal print_r() function:
print_r($record);
// The following code will print out each field in your sample data:
print $record["HR01"];
print $record["HR01-LANG"];
print $record["HR01-NOTICE.PUB.NR"];
print $record["HR01-SHAB.NR"];
print $record["HR01-SHAB.PUB.DATE"];
print $record["HR01-SHAB.START.PAGE"];
print $record["NOTICE.REF"];
print $record["PUB.HEAD"];
print $record["PUB.HEAD/CANTON.NAME"];
print $record["PUB.HEAD/PUB.DATE"];
print $record["HR01.SPEC"];
print $record["HR01.SPEC/HRA.LOG"];
print $record["HR01.SPEC/HRA.LOG-EHRA.NOTICE.ID"];
print $record["HR01.SPEC/HRA.LOG/HRA.OFFICE"];
print $record["HR01.SPEC/HRA.LOG/HRA.OFFICE/HRA.OFFICE.ID"];
print $record["HR01.SPEC/HRA.LOG/HRA.OFFICE/HRA.OFFICE.NAME"];
print $record["HR01.SPEC/HRA.LOG/HRA.LOG.DATE"];
print $record["HR01.SPEC/HRA.LOG/HRA.LOG.NUM"];
print $record["HR01.SPEC/HR.FIRMS"];
print $record["HR01.SPEC/HR.FIRMS/FIRM.ID"];
print $record["HR01.SPEC/HR.FIRMS/FIRM"];
print $record["HR01.SPEC/HR.FIRMS/FIRM-INFO.VER"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/NAME"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM/LEG.FORM.ID"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM/LEG.FORM.DESCR"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE/CITY"];
print $record["HR01.SPEC/HR.FIRMS/FIRM@1"];
print $record["HR01.SPEC/HR.FIRMS/FIRM@1-INFO.VER"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/NAME@1"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM@1"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM/LEG.FORM.ID@1"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM/LEG.FORM.DESCR@1"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE@1"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE/BFS.NUM"];
print $record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE/CITY@1"];
print $record["HR01.SPEC/HR.FIRM.ACT"];
print $record["HR01.SPEC/HR.FIRM.ACT/STATUS.CHANGED"];
print $record["HR01.SPEC/HR.FIRM.ACT/STATUS.CHANGED-TYPE"];
print $record["HR01.SPEC/HR.PUB.CONTENT"];
print $record["HR01.SPEC/HR.PUB.CONTENT/FT"];
print $record["HR01.SPEC/HR.PUB.CONTENT/FT-TYPE"];
print $record["HR01.SPEC/HR.PUB.CONTENT/FT@1"];
print $record["HR01.SPEC/HR.PUB.CONTENT/FT@1-TYPE"];
print $record["HR01.SPEC/HR.PUB.CONTENT/FT@2"];
print $record["HR01.SPEC/HR.PUB.CONTENT/FT@2-TYPE"];
print $record["SUBMITION"];
print $record["SUBMITION/ZIPCODE"];
print $record["SUBMITION/CITY"];
print $record["SUBMITION/SUBMIT.DATE"];
print $record["SUBMITION/SUBMITOR"];
}
MagicParser_parse("HR-20061228-3698944.xml","myRecordHandler","xml|HR01/");
?>
The XML file HR-20061228-3698944.xml is as follows:
<?xml version="1.0" encoding="UTF-8" ?>
- <HR01 LANG="DE" NOTICE.PUB.NR="3698944" SHAB.NR="251" SHAB.PUB.DATE="28.12.2006" SHAB.START.PAGE="26">
<NOTICE.REF>ts061221172333</NOTICE.REF>
- <PUB.HEAD>
<CANTON.NAME>ZH</CANTON.NAME>
<PUB.DATE>28.12.2006</PUB.DATE>
</PUB.HEAD>
- <HR01.SPEC>
- <HRA.LOG EHRA.NOTICE.ID="1630975">
- <HRA.OFFICE>
<HRA.OFFICE.ID>20</HRA.OFFICE.ID>
<HRA.OFFICE.NAME>Handelsregisteramt des Kantons Zürich</HRA.OFFICE.NAME>
</HRA.OFFICE>
<HRA.LOG.DATE>20.12.2006</HRA.LOG.DATE>
<HRA.LOG.NUM>34418</HRA.LOG.NUM>
</HRA.LOG>
- <HR.FIRMS>
<FIRM.ID>CH02010513648</FIRM.ID>
- <FIRM INFO.VER="OLD">
<NAME>Autoteile Bülach Lorenzo Paolucci</NAME>
- <LEG.FORM>
<LEG.FORM.ID>1</LEG.FORM.ID>
<LEG.FORM.DESCR>Einzelfirma</LEG.FORM.DESCR>
</LEG.FORM>
- <SH.REG.OFFICE>
<CITY>Bülach</CITY>
</SH.REG.OFFICE>
</FIRM>
- <FIRM INFO.VER="NEW">
<NAME>Autoteile Bülach Lorenzo Paolucci</NAME>
- <LEG.FORM>
<LEG.FORM.ID>1</LEG.FORM.ID>
<LEG.FORM.DESCR>Einzelfirma</LEG.FORM.DESCR>
</LEG.FORM>
- <SH.REG.OFFICE>
<BFS.NUM>53</BFS.NUM>
<CITY>Bülach</CITY>
</SH.REG.OFFICE>
</FIRM>
</HR.FIRMS>
- <HR.FIRM.ACT>
<STATUS.CHANGED TYPE="01" />
</HR.FIRM.ACT>
- <HR.PUB.CONTENT>
<FT TYPE="F">Autoteile Bülach Lorenzo Paolucci</FT>
, in
<FT TYPE="S">Bülach,</FT>
CH-020.1.051.364-8, Feldstrasse 60, 8180 Bülach, Einzelfirma (Neueintragung). Zweck: Autoteilehandel für alle Fahrzeugmarken. Eingetragene Personen: Paolucci, Lorenzo, italienischer Staatsangehöriger, in Bülach, Inhaber, mit Einzelunterschrift.
</HR.PUB.CONTENT>
</HR01.SPEC>
- <SUBMITION>
<ZIPCODE>3003</ZIPCODE>
<CITY>Bern</CITY>
<SUBMIT.DATE>21.12.2006</SUBMIT.DATE>
<SUBMITOR>EHRA</SUBMITOR>
</SUBMITION>
</HR01>
My question:
It seems that all the fields are shown properly except the string after the FTs: CH-020.1.051.364-8, Feldstrasse ... Einzelunterschrift.
Can you please tell me how I can handle this? The idea is that part of the fields content including this string shall be stored in a mySQL database.
Thank you for your help!
With kindest regards
Gabriel Schneider
Hi Again,
Sorry, ignore my previous post - I noticed that the XML is actually not "properly" formed - notice how orphan character data appears between the FT tags. However I have made a small modification to the script in order to handle this scenario for you.
I will email you the new version to use, with which you will see the missing text in the HR01.SPEC/HR.PUB.CONTENT field.
Cheers,
David.
Hi David
obviously my xml file does not follow completely the rules; after a lot of extra support, that you have given me via email, it runs really fine now.
Thank you
Gabriel
Hello Gabriel,
It may just be that the output you are generating is defaulting to ISO-8859-1 character set, whereas your XML is in utf-8. If this is the problem, you can fix it by adding the following line at the very top of the script:
header("Content-Type: text/html; charset=utf-8");
That should make the string display properly!
Cheers,
David.