Intro to XML and JSON #2: Data's Shape
25 Mar 2019
Table of Contents
I mentioned that XML & JSON are for lists of lists, while CSVs are for tables. Let’s explore that concept.
Posts In This Series
- Part 1 - Intro to XML and JSON
- Part 2 - This Article
- Part 3 - Intro to XML and JSON #3: XML Items & Keys
- Part 4 - Intro to XML and JSON #4: XML Values
- Part 5 - Intro to XML and JSON #5: XML/CSV Conversions
- Part 6 - Intro to XML and JSON #6: JSON
- Part 7 - Intro to XML and JSON #7: Recap & Real World Use
- Part 8 - Intro to XML, JSON, & YAML: the book
Viewing “Pretty” XML & JSON
We’ll have a lot of examples in this series. I recommend that you edit them and play with seeing them in a “pretty” format! XML, JSON – paste & click “Tree View”.
Warning: only put sample data into the “beautifier” links above. Never put your company’s confidential data into a stranger’s web site.
Table Data
The key properties of table-style data are that:
- 🔑 The “table” has a specific number of columns (“keys”) that are the same for every row (“item”) in the table.
- That is, every row has the exact same columns (“keys”) as every other row.
- 😶 Not only is every row-column intersection capable of storing a value … but if there isn’t a value, a blank placeholder still needs to be indicated.
- ☝️ There can only be one value for a row-column intersection.
- That is, if you only have one “First Name” column, no row can have two first names.
- ♭ Because there can only be one value for a row (“item”) + column (“key”) intersection, every row (“item”) is very “flat.”
- There isn’t a natural way to “nest” data at multiple “levels.”
Example
Here’s a nice little “bookkeeper’s ledger” that represents “naturally table-shaped” data:
Purchase ID | Date | Vendor | Tax Category | Price |
---|---|---|---|---|
000001 | 01/01/2010 | Apple | Office Supplies | $60,000.00 |
000002 | 01/01/2010 | Applebee’s | Meals | $50.87 |
000003 | 01/02/2010 | Apple | Repairs | $928.20 |
000004 | 01/10/2010 | Steelcase | Office Supplies | $20,289.98 |
Varied / Nested-List Data
Data becomes less “table”-like and more “bulleted-list”-like as either of two things happen:
- Different “items” in the list have “details” completely unrelated to the details of other items in the list.
- You “nest” lists inside each other.
Let’s say you maintain a list of your best friends to facilitate buying them birthday presents (that’s so nice of you!)
You will want to write down a few facts about each friend that help you remember what they like.
The “facts” that influence your birthday presents for each friend will be different, because every friend’s life is different.
Example
For example, you might write:
- Uwe
- Bday: Nov. 8
- Kids:
- Johnny (4)
- Matilda (2)
- Collection: cat memes
- Dan
- Bday: Jan. 27
- Food:
- wine
- pickles
- Ridhi
- Bday: Sep. 16
- Collection: frogs
- Hani
- Bday: Apr. 9
- Job: nurse
- Gift: help / babysitter
- Kids: Lee (8)
Uwe and Hani have kids, but Dan and Ridhi don’t. Uwe and Ridhi collect things, but Dan and Hani don’t. Dan’s a foodie. Hani is busy with her job and aging parents and kid and doesn’t want things; she wants free time. But … if you’re busy, you might be able to get away with a coffee mug with a cute slogan about nurses on it.
Although you have some overlap, it’s very normal for everyone on your list to have different characteristics from everyone else.
Look below at how much space it would take to store your “friends list” as a table and how many cells end up blank. It just doesn’t make sense to store your friends list as a table!
You would never write it this way on paper:
Name | Bday | Collection | Kid 1 | Kid 2 | Food 1 | Food 2 | Job | Gift |
---|---|---|---|---|---|---|---|---|
Uwe | Nov. 8 | cat memes | Johnny (4) | Matilda (2) | ||||
Dan | Jan. 27 | wine | pickles | |||||
Ridhi | Sep. 16 | frogs | ||||||
Hani | Apr. 9 | Lee (8) | nurse | Help / Babysitter |
So why should you write it that way in a computer?
The “birthday facts list” is naturally “list-of-lists” like, and not naturally “table-like.” Remember our two key properties, either of which can influence the decision to move from “table” to “list of lists” structures:
- Different “items” in the list have “details” completely unrelated to the details of other items in the list.
- You “nest” lists inside each other.
Items, Keys & Values
From here on out, we’re going to refer to data as having “items,” “keys,” & “values.”
In a “table-shaped” (CSV) file
- The “items” are the rows in their entirety
- They are separated from each other by line breaks
- e.g.
"000001", "01/01/2010", "Apple", "Office Supplies", "$60,000.00"
- The “keys” are the column headings
- e.g. “
Purchase ID
” or “Vendor
” or “Tax Category
” - They are separated from each other by commas
- e.g. “
- The “values” are the data in the individual cells at the intersections of rows and columns.
- e.g. “
01/02/2010
” or “Steelcase
” or “$50.87
” - They are separated from each other by commas
- e.g. “
In a “list-of-lists-shaped” (XML/JSON) file
Items
It’s easy to show, but hard to define, what the “items” are. For now, let’s say that they’re “clumps” representing real-world things or ideas.
- Examples are:
- The existence of Uwe
- The existence of Ridhi
- The fact that we know Uwe’s birthday
- The fact that we know what Ridhi likes to collect
- The fact that Uwe has kids
- The existence of Uwe’s kid Johnny
- The fact that we know Uwe’s kid Johnny’s age
- In XML, an “item” is called an “element” and is set apart from other items by surrounding it in a “tagset”:
<item_name> ... </item_name>
- In JSON, an “item” is called an “object” and is set apart from other items by surrounding it in curly braces:
{ ... }
Keys
The “keys” within XML/JSON items are also easier to show as we go into the details, but they might be concepts like:
- friend
- bday
- name
- collection
- food
- job
- gift
We’ll wait to talk about what punctuation we use to indicate “keys” until we take a deeper dive into XML & JSON.
Values
The “values” within XML/JSON items are also easier to show as we go into the details, but they might be concepts like:
- Uwe
- Nov. 8
- frogs
- 4
- pickles
- Help / Babysitter
We’ll wait to talk about what punctuation we use to indicate “values” until we take a deeper dive into XML & JSON.
Example: XML vs. JSON
We have to make choices about exactly how to structure our data when writing XML or JSON (just like I had to decide, in my example “gift list,” whether to put the kids’ ages next to their names in parentheses or make the ages sub-bullets).
But here’s one way I might represent this “gift list” in XML, and in JSON.
<AllFriends>
<friend name="Uwe" bday="Nov. 8">
<kid name="Johnny">
<age>4</age>
</kid>
<kid name="Matilda">
<age>2</age>
</kid>
<collection>
cat memes
</collection>
</friend>
<friend name="Dan" bday="Jan. 27">
<food>wine</food>
<food>pickles</food>
</friend>
<friend name="Ridhi" bday="Sep. 16">
<collection>
frogs
</collection>
</friend>
<friend name="Hani" bday="Apr. 9">
<Job>nurse</Job>
<Gift>help / babysitter</Gift>
<kid name="Lee">
<age>8</age>
</kid>
</friend>
</AllFriends>
[
{
"Name" : "Uwe",
"Bday" : "Nov. 8",
"Kids" :
[
{
"Name" : "Johnny",
"Age" : 4
},
{
"Name" : "Matilda",
"Age" : 2
}
],
"Collection" : "cat memes"
},
{
"Name" : "Dan",
"Bday" : "Jan. 27",
"Food" : ["wine","pickles"]
},
{
"Name" : "Ridhi",
"Bday" : "Sep. 16",
"Collection" : "frogs"
},
{
"Name" : "Hani",
"Bday" : "Apr. 9",
"Job" : "nurse",
"Gift" : "help / babysitter",
"Kids" :
[
{
"Name" : "Lee",
"Age" : 8
}
]
}
]
Note that the line breaks and indentations are for human convenience only.
A computer would be fine with everything all smooshed together in one line, in both XML and JSON.
Want proof?
- Paste both examples of the XML into this beautifier and click “Tree View”.
- Do the same with both examples of the JSON in this beautifier.
They should produce the same “tree” as their “human-friendly” counterparts.
<AllFriends><friend name="Uwe" bday="Nov. 8"><kid name="Johnny"><age>4</age></kid><kid name="Matilda"><age>2</age></kid><collection>cat memes</collection></friend><friend name="Dan" bday="Jan. 27"><food>wine</food><food>pickles</food></friend><friend name="Ridhi" bday="Sep. 16"><collection>frogs</collection></friend><friend name="Hani" bday="Apr. 9"><Job>nurse</Job><Gift>help / babysitter</Gift><kid name="Lee"><age>8</age></kid></friend></AllFriends>
[{"Name":"Uwe","Bday":"Nov. 8","Kids":[{"Name":"Johnny","Age":4},{"Name":"Matilda","Age":2}],"Collection":"cat memes"},{"Name":"Dan","Bday":"Jan. 27","Food":["wine","pickles"]},{"Name":"Ridhi","Bday":"Sep. 16","Collection":"frogs"},{"Name":"Hani","Bday":"Apr. 9","Job":"nurse","Gift":"help / babysitter","Kids":[{"Name":"Lee","Age":8}]}]
Example: Table data in CSV, XML, & JSON
Remember, although “list-of-lists” data doesn’t translate well to tables, table data translates perfectly respectably to XML or JSON.
Remember our bookeeper’s ledger? Let’s see it in all 3 formats:
CSV:
"Purchase ID","Date","Vendor","Tax Category","Price"
"000001","01/01/2010","Apple","Office Supplies","$60,000.00"
"000002","01/01/2010","Applebee's","Meals","$50.87"
"000003","01/02/2010","Apple","Repairs","$928.20"
"000004","01/10/2010","Steelcase","Office Supplies","$20,289.98"
XML:
<Ledger>
<Entry><Purchase_ID>000001</Purchase_ID><Date>01/01/2010</Date><Vendor>Apple</Vendor><Tax_Category>Office Supplies</Tax_Category><Price>$60,000.00</Price></Entry>
<Entry><Purchase_ID>000002</Purchase_ID><Date>01/01/2010</Date><Vendor>Applebee's</Vendor><Tax_Category>Meals</Tax_Category><Price>$50.87</Price></Entry>
<Entry><Purchase_ID>000003</Purchase_ID><Date>01/02/2010</Date><Vendor>Apple</Vendor><Tax_Category>Repairs</Tax_Category><Price>$928.20</Price></Entry>
<Entry><Purchase_ID>000004</Purchase_ID><Date>01/10/2010</Date><Vendor>Steelcase</Vendor><Tax_Category>Office Supplies</Tax_Category><Price>$20,289.98</Price></Entry>
</Ledger>
JSON:
[
{"Purchase ID":"000001", "Date":"01/01/2010", "Vendor":"Apple", "Tax Category":"Office Supplies", "Price":"$60,000.00"},
{"Purchase ID":"000002", "Date":"01/01/2010", "Vendor":"Applebee's", "Tax Category":"Meals", "Price":"$50.87"},
{"Purchase ID":"000003", "Date":"01/02/2010", "Vendor":"Apple", "Tax Category":"Repairs", "Price":"$928.20"},
{"Purchase ID":"000004", "Date":"01/10/2010", "Vendor":"Steelcase", "Tax Category":"Office Supplies", "Price":"$20,289.98"}
]
- What’s your favorite?
- Why?
- Can you see how repetitive the XML and JSON are?
- That’s always a strong hint that data could be easily translated back into a CSV or Excel table.
Takeaways
- It’s just text!
- XML, JSON, and CSV are all standardized ways of typing data in plain text to give it structure and meaning.
- XML ≈ JSON
- XML and JSON are meant to represent the same kind of data as each other.
- Their punctuation differences result in subtle strengths and weaknesses, but they’re more alike than different.
- CSV = Tables
- Ideally, you would use CSV to represent “extremely similar” “table-shaped” data, like the “bookkeeper’s ledger” example above.
- XML/JSON = Lists
- Ideally, you would use XML or JSON to represent a “list of lists” and/or “extremely varied” data, like the “birthday tracker” example above.
- In a pinch, such as when you’re transmitting data in a way that doesn’t like “line breaks,” you can also use XML or JSON to represent data like the “bookkeeper’s ledger.”
- 👀 = ↔
- If you learn to distinguish the two shapes of data, you are well on your way to using tools that translate between CSV/Excel and XML/JSON.