Intro to XML and JSON #5: XML/CSV Conversions
03 Apr 2019
Now that you can read XML and write it by hand, let’s talk about converting between XML and CSV.
(Hint: CSV to XML is always easy. XML to CSV requires you to be clever.)
Posts In This Series
- Part 1 - Intro to XML and JSON
- Part 2 - Intro to XML and JSON #2: Data's Shape
- Part 3 - Intro to XML and JSON #3: XML Items & Keys
- Part 4 - Intro to XML and JSON #4: XML Values
- Part 5 - This Article
- Part 6 - Intro to XML and JSON #6: JSON
- Part 7 - Intro to XML and JSON #7: Recap & Real World Use
- Part 8 - Intro to XML, JSON, & YAML: the book
Converting XML to CSV
Some of the examples we’ve looked have been so “flat” and “repetitive” that they’d make great CSV files.
For example, remember our fleet of cars?
<RootElement>
<Car VIN="First car's Vehicle Identification Number here"/>
<Car VIN="Second car's Vehicle Identification Number here"/>
</RootElement>
- Each item has the same keys as each other item.
- Each item has just 1 value per key.
That’d make a perfectly good 1-column CSV file, wouldn’t it?
VIN |
---|
First car’s Vehicle Identification Number here |
Second car’s Vehicle Identification Number here |
Look for repetition
Every time you look at XML, look for repetition.
Even if you never learn to program, recognizing data’s shape can help you estimate, as a business analyst, whether you think it’s worth the time to ask a programmer to write code for you.
Coding
If you do want to code an XML to CSV conversion script yourself, let me tell you a secret:
- It’s not “learning to code” that’s the hardest part about converting XML files to CSV.
- It’s knowing what to Google. It’s deciding if it’s worth attempting.
Recognizing the “shape” of data is a huge part of that process.
I used a lot of Google to write the following 5-line Python program that takes the XML file above as input and produces the CSV file above as output.
import xml.etree.ElementTree
import pandas
car_details = [x.attrib for x in xml.etree.ElementTree.parse(r'C:\example\fleet.xml').findall('Car')]
df = pandas.DataFrame.from_dict(car_details)
df.to_csv(r'C:\example\fleet.csv', index=False)
(You can run it yourself here.)
Once I Googled “XML to CSV Python,” the tough parts of finding the correct search results were:
- conceptual 💭:
- figuring out that “
Car
” items were “flat” & “repetitive” enough to make a good “loop” - figuring out that the “attributes of each
Car
” were repetitive enough to make a good “loop”
- figuring out that “
- technical 💻:
- reading the documentation and Googling to learn that Python’s “ElementTree” plugin came with a way to list all the key-value attribute pairs for an item (“
.attrib
“) in a way that was compatible with the “Pandas” package’s “.from_dict()
” command - understanding Python “lists,” “dicts,” looping, ElementTree, and Pandas well enough to play around, trial-and-error, until I got the output I was looking for
- reading the documentation and Googling to learn that Python’s “ElementTree” plugin came with a way to list all the key-value attribute pairs for an item (“
The important takeaway is that it’s the conceptual work that helped me realize that this transformation was “easy to do in my head.”
Once I knew it was “easy to do in my head,” that’s when I could formulate a game plan for “doing it in code.”
Converting CSV to XML
As I mentioned in the second article in this series, about the shape of data, converting “table-shaped” data to “list-of-list-shaped” data isn’t conceptually difficult.
Therefore, going from CSV to XML is more of a purely technical task.
There really aren’t any big conceptual questions to ask.
Business Analysts: The answer to “could my programmer do this?” is always “Yes!!!” for CSV to XML.
Coding
For those who are code-curious, here’s one way to do it in Python with a plugin called “lxml” that I like for writing XML because of its “pretty_print
” settings that facilitate line breaks and tabs for readability by humans:
import lxml.etree
import pandas
xmlroot = lxml.etree.Element('RootElement')
df = pandas.read_csv(r'C:\example\fleet.csv')
for row in df.to_dict(orient='records'):
current_car = lxml.etree.Element('Car')
xmlroot.append(current_car)
for colname, cellvalue in row.items():
current_car.set(colname, cellvalue)
with open(r'C:\example\fleet2.xml', 'wb') as filepath:
filepath.write(lxml.etree.tostring(xmlroot, pretty_print=True))
(You can run it yourself here. We’ll have to leave explaining how it works for another day.)
Takeaways
- XML to CSV can be conceptually difficult to code, for the same reason that the “birthday gift list” from the 2nd article in this series made for a very ugly spreadsheet.
- XML to CSV is typically difficult to code when it’s also difficult to do with paper and pen.
- To figure out whether such a conversion can be done, it’s important to watch what the data in an XML file looks like while you read it.
- CSV to XML, on the other hand, is conceptually easy, for the same reason that our “bookkeeper’s ledger” was easy to jot down with paper and pen as a relatively flat, repetitive bulleted list.
Once you’ve done the conceptual work, the rest of your problem can be solved by asking your developer or hacking away at the technical aspects of writing your own code.
Happy converting! 💱