Blog: The easy way to work with CSV, JSON, and XML in Python
Python’s superior flexibility and ease of use are what make it one of the most popular programming language, especially for Data Scientists. A big part of that is how simple it is to work with large datasets.
Every technology company today is building up a data strategy. They’ve all realised that having the right data: insightful, clean, and as much of it as possible, gives them a key competitive advantage. Data, if used effectively, can offer deep, beneath the surface insights that can’t be discovered anywhere else.
Over the years, the list of possible formats that you can store your data in has grown significantly. But, there are 3 that dominate in their everyday usage: CSV, JSON, and XML. In this article, I’m going to share with you the easiest ways to work with these 3 popular data formats in Python!
A CSV file is the most common way to store your data. You’ll find that most of the data coming from Kaggle competitions is stored in this way. We can do both read and write of a CSV using the built-in Python csv library. Usually, we’ll read the data into a list of lists.
Check out the code below. When we run
csv.reader() all of our CSV data becomes accessible. The
csvreader.next() function reads a single line from the CSV; every time you call it, it moves to the next line. We can also loop through every row of the csv using a for-loop as with
for row in csvreader . Make sure that you have the same number of columns in each row, otherwise, you’ll likely end up running into some errors when working with your list of lists.
Writing to CSV in Python is just as easy. Set up your field names in a single list, and your data in a list of lists. This time we’ll create a
writer() object and use it to write our data to file very similarly to how we did the reading.
Of course, installing the wonderful Pandas library will make working with your data far easier once you’ve read it into a variable. Reading from CSV is a single line as is writing it back to file!
We can even use Pandas to convert from CSV to a list of dictionaries with a quick one-liner. Once you have the data formatted as a list of dictionaries, we’ll use the
dicttoxml library to convert it to XML format. We’ll also save it to file as a JSON!
JSON provides a clean and easily readable format because it maintains a dictionary-style structure. Just like CSV, Python has a built-in module for JSON that makes reading and writing super easy! When we read in the CSV, it will become a dictionary. We then write that dictionary to file.
And as we saw before, once we have our data you can easily convert to CSV via pandas or use the built-in Python CSV module. When converting to XML, the
dicttoxml library is always our friend.
XML is a bit of a different beast from CSV and JSON. Generally, CSV and JSON are widely used due to their simplicity. They’re both easy and fast to read, write, and interpret as a human. There’s no extra work involved and parsing a JSON or CSV is very lightweight.
XML on the other hand tends to be a bit heavier. You’re sending more data, which means you need more bandwidth, more storage space, and more run time. But XML does come with a few extra features over JSON and CSV: you can use namespaces to build and share standard structures, better representation for inheritance, and an industry standardised way of representing your data with XML schema, DTD, etc.
To read in the XML data, we’ll use Python’s built-in XML module with sub-module ElementTree. From there, we can convert the ElementTree object to a dictionary using the
xmltodictlibrary. Once we have a dictionary, we can convert to CSV, JSON, or Pandas Dataframe like we saw above!
Want to learn more about coding in Python? The Python Crash Course book is the best resource out there for learning how to code in Python!
And just a heads up, I support this blog with Amazon affiliate links to great books, because sharing great books helps everyone! As an Amazon Associate I earn from qualifying purchases.