You are here:  » Content missing?


Content missing?

Submitted by Gabriel Schneider on Sun, 2009-03-15 13:05 in

I have created the following PHP script using Magic Parser with file HR-20061228-3698944.xml:

<?php
  
require("MagicParser.php");
  function 
myRecordHandler($record)
  {
    
// This is where you write your code to process each record, such as loading a database
    // You can display the record contents using PHP's internal print_r() function:
    
print_r($record);
    
// The following code will print out each field in your sample data:
    
print $record["HR01"];
    print 
$record["HR01-LANG"];
    print 
$record["HR01-NOTICE.PUB.NR"];
    print 
$record["HR01-SHAB.NR"];
    print 
$record["HR01-SHAB.PUB.DATE"];
    print 
$record["HR01-SHAB.START.PAGE"];
    print 
$record["NOTICE.REF"];
    print 
$record["PUB.HEAD"];
    print 
$record["PUB.HEAD/CANTON.NAME"];
    print 
$record["PUB.HEAD/PUB.DATE"];
    print 
$record["HR01.SPEC"];
    print 
$record["HR01.SPEC/HRA.LOG"];
    print 
$record["HR01.SPEC/HRA.LOG-EHRA.NOTICE.ID"];
    print 
$record["HR01.SPEC/HRA.LOG/HRA.OFFICE"];
    print 
$record["HR01.SPEC/HRA.LOG/HRA.OFFICE/HRA.OFFICE.ID"];
    print 
$record["HR01.SPEC/HRA.LOG/HRA.OFFICE/HRA.OFFICE.NAME"];
    print 
$record["HR01.SPEC/HRA.LOG/HRA.LOG.DATE"];
    print 
$record["HR01.SPEC/HRA.LOG/HRA.LOG.NUM"];
    print 
$record["HR01.SPEC/HR.FIRMS"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM.ID"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM-INFO.VER"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/NAME"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM/LEG.FORM.ID"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM/LEG.FORM.DESCR"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE/CITY"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM@1"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM@1-INFO.VER"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/NAME@1"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM@1"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM/LEG.FORM.ID@1"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/LEG.FORM/LEG.FORM.DESCR@1"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE@1"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE/BFS.NUM"];
    print 
$record["HR01.SPEC/HR.FIRMS/FIRM/SH.REG.OFFICE/CITY@1"];
    print 
$record["HR01.SPEC/HR.FIRM.ACT"];
    print 
$record["HR01.SPEC/HR.FIRM.ACT/STATUS.CHANGED"];
    print 
$record["HR01.SPEC/HR.FIRM.ACT/STATUS.CHANGED-TYPE"];
    print 
$record["HR01.SPEC/HR.PUB.CONTENT"];
    print 
$record["HR01.SPEC/HR.PUB.CONTENT/FT"];
    print 
$record["HR01.SPEC/HR.PUB.CONTENT/FT-TYPE"];
    print 
$record["HR01.SPEC/HR.PUB.CONTENT/FT@1"];
    print 
$record["HR01.SPEC/HR.PUB.CONTENT/FT@1-TYPE"];
    print 
$record["HR01.SPEC/HR.PUB.CONTENT/FT@2"];
    print 
$record["HR01.SPEC/HR.PUB.CONTENT/FT@2-TYPE"];
    print 
$record["SUBMITION"];
    print 
$record["SUBMITION/ZIPCODE"];
    print 
$record["SUBMITION/CITY"];
    print 
$record["SUBMITION/SUBMIT.DATE"];
    print 
$record["SUBMITION/SUBMITOR"];
  }
  
MagicParser_parse("HR-20061228-3698944.xml","myRecordHandler","xml|HR01/");
?>

The XML file HR-20061228-3698944.xml is as follows:

<?xml version="1.0" encoding="UTF-8" ?>
- <HR01 LANG="DE" NOTICE.PUB.NR="3698944" SHAB.NR="251" SHAB.PUB.DATE="28.12.2006" SHAB.START.PAGE="26">
  <NOTICE.REF>ts061221172333</NOTICE.REF>
- <PUB.HEAD>
  <CANTON.NAME>ZH</CANTON.NAME>
  <PUB.DATE>28.12.2006</PUB.DATE>
  </PUB.HEAD>
- <HR01.SPEC>
- <HRA.LOG EHRA.NOTICE.ID="1630975">
- <HRA.OFFICE>
  <HRA.OFFICE.ID>20</HRA.OFFICE.ID>
  <HRA.OFFICE.NAME>Handelsregisteramt des Kantons Zürich</HRA.OFFICE.NAME>
  </HRA.OFFICE>
  <HRA.LOG.DATE>20.12.2006</HRA.LOG.DATE>
  <HRA.LOG.NUM>34418</HRA.LOG.NUM>
  </HRA.LOG>
- <HR.FIRMS>
  <FIRM.ID>CH02010513648</FIRM.ID>
- <FIRM INFO.VER="OLD">
  <NAME>Autoteile Bülach Lorenzo Paolucci</NAME>
- <LEG.FORM>
  <LEG.FORM.ID>1</LEG.FORM.ID>
  <LEG.FORM.DESCR>Einzelfirma</LEG.FORM.DESCR>
  </LEG.FORM>
- <SH.REG.OFFICE>
  <CITY>Bülach</CITY>
  </SH.REG.OFFICE>
  </FIRM>
- <FIRM INFO.VER="NEW">
  <NAME>Autoteile Bülach Lorenzo Paolucci</NAME>
- <LEG.FORM>
  <LEG.FORM.ID>1</LEG.FORM.ID>
  <LEG.FORM.DESCR>Einzelfirma</LEG.FORM.DESCR>
  </LEG.FORM>
- <SH.REG.OFFICE>
  <BFS.NUM>53</BFS.NUM>
  <CITY>Bülach</CITY>
  </SH.REG.OFFICE>
  </FIRM>
  </HR.FIRMS>
- <HR.FIRM.ACT>
  <STATUS.CHANGED TYPE="01" />
  </HR.FIRM.ACT>
- <HR.PUB.CONTENT>
  <FT TYPE="F">Autoteile Bülach Lorenzo Paolucci</FT>
  , in
  <FT TYPE="S">Bülach,</FT>
  CH-020.1.051.364-8, Feldstrasse 60, 8180 Bülach, Einzelfirma (Neueintragung). Zweck: Autoteilehandel für alle Fahrzeugmarken. Eingetragene Personen: Paolucci, Lorenzo, italienischer Staatsangehöriger, in Bülach, Inhaber, mit Einzelunterschrift.
  </HR.PUB.CONTENT>
  </HR01.SPEC>
- <SUBMITION>
  <ZIPCODE>3003</ZIPCODE>
  <CITY>Bern</CITY>
  <SUBMIT.DATE>21.12.2006</SUBMIT.DATE>
  <SUBMITOR>EHRA</SUBMITOR>
  </SUBMITION>
  </HR01>

My question:
It seems that all the fields are shown properly except the string after the FTs: CH-020.1.051.364-8, Feldstrasse ... Einzelunterschrift.

Can you please tell me how I can handle this? The idea is that part of the fields content including this string shall be stored in a mySQL database.

Thank you for your help!
With kindest regards
Gabriel Schneider

Submitted by support on Sun, 2009-03-15 13:15

Hello Gabriel,

It may just be that the output you are generating is defaulting to ISO-8859-1 character set, whereas your XML is in utf-8. If this is the problem, you can fix it by adding the following line at the very top of the script:

  header("Content-Type: text/html; charset=utf-8");

That should make the string display properly!

Cheers,
David.

Submitted by support on Sun, 2009-03-15 13:44

Hi Again,

Sorry, ignore my previous post - I noticed that the XML is actually not "properly" formed - notice how orphan character data appears between the FT tags. However I have made a small modification to the script in order to handle this scenario for you.

I will email you the new version to use, with which you will see the missing text in the HR01.SPEC/HR.PUB.CONTENT field.

Cheers,
David.

Submitted by Gabriel Schneider on Mon, 2009-03-16 19:06

Hi David

obviously my xml file does not follow completely the rules; after a lot of extra support, that you have given me via email, it runs really fine now.

Thank you
Gabriel