Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

Geek Culture / XML Advice appreciated

Author
Message
Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 3rd Jan 2006 15:44 Edited at: 3rd Jan 2006 16:18
I'm not one for using techonology for the sake of it, but I have a hurdle...

On one side, I have this datasource which has been kindly exported into Excel for me. It contains about 12 columns which, at their most simple, are Page ID's, Titles and Date/Time's. The difficult column is the Page Content column which, at its "peak" is about 26,000 characters!. This is effectively a database of articles.

Now I want to get this into a MySQL database to be used with Drupal (a CMS). I have the PHP code to take the data and thump it in as needed - that was well within my range

Initially, I exported it all as a CSV and went to parse it that way... Hurdle 1 - the articles contain comma's!! Ok, lets try something else? Tabs.. Ah ha!... Ah, no... They contain tabs too (partial HTML). I have now got it exporting into neat XML (even with a nice XML schema too!), BUT... VERY kindly, Excel has decided to tidy up the HTML tags and has converted symbols such as '<', '>' and '&' into their HTML entities (eg, '& g t ;', '& l t ;', etc.. Had to put in spaces as the forum converts them otherwise!). Its also got its knickers in a twist with some other tags too, such as line breaks. I have tried using PHP to replace all the entities and this worked partially - but not entirely.

Has anyone got any idea of how to get this data out of Excel? The "worst" bit is this is just a 100 article test file. The actual file I will need to work with is going to be abuot 1000 articles, so any processes must be automatic. I'm not going to manually look through 1000 articles!!!!

Cheers guys

BatVink
Moderator
21
Years of Service
User Offline
Joined: 4th Apr 2003
Location: Gods own County, UK
Posted: 3rd Jan 2006 16:31
Why not inject it into your database using an ODBC datasource and a small VB script?

Or into MS Access as a stepping stone, and from there into MySQL?

Nicholas Thompson
20
Years of Service
User Offline
Joined: 6th Sep 2004
Location: Bognor Regis, UK
Posted: 3rd Jan 2006 16:40
Quote: "Why not inject it into your database using an ODBC datasource and a small VB script?"

There are quite a few ways of wording this, but I think this is probably the best:
'Because I have no idea how'


I have an ODBC connection on my machine directly to the MySQL server (its offsite).

David T
Retired Moderator
22
Years of Service
User Offline
Joined: 27th Aug 2002
Location: England
Posted: 3rd Jan 2006 17:35
I can't imagine that Excel would allow columns that contained commands to break the CSV output - are you sure it doesn't add some king of espace character to columns with a "," in?

That, or you could write a simple macro to replace "," with "%comma%" or something in the columns, then export to csv, then once in PHP use str_replace("%comma%",",",$string).

Login to post a reply

Server time is: 2024-11-16 08:52:25
Your offset time is: 2024-11-16 08:52:25