In this post, we will take a look into how we can generate Extensible Markup Language (XML) files from an Excel file with Python. We will be using the Yattag package to generate our XML file and the OpenPyXL package for reading our Excel data.
Yattag is described in its documentation as following:
Yattag is a Python library for generating HTML or XML in a pythonic way.
That pretty much sums Yatttag up, I find it as a simple, easy to use library that just works. I had been searching for this kind of library in order to more easily generate different XML files.
To install Yattag with pip:
pip install yattag
Adding a tag with Yattag is as easy as using the
with tag('h1'): text('Hello world!')
Tags are automatically closed. To start using Yattag we need to import Doc from Yattag and create our Doc, tag and text with Doc().tagtext().
from yattag import Doc doc, tag, text = Doc().tagtext() with tag('h1'): text('Hello world!') doc.getvalue() Output: '<h1>Hello world!</h1>'
OpenPyXL is a library for interacting with Excel 2010 files. OpenPyXL can read and write to .xlsx and .xlsm files.
To install OpenPyXL with pip:
pip install openpyxl
To load an existing workbook in OpenPyXl we need to use the
load_workbook method. We also need to select the sheet we are reading the data from. In our example, we are using popular baby names in New York City. You can access the dataset from the link at the bottom of this post.
I have created a workbook named NY_baby_names.xlsx with one sheet of data, Sheet1. The worksheet has the following headers: Year of Birth, Gender, Child's First Name, Count, Rank. You can download the Excel file from my website here.
To access the data with OpenPyXL, do the following:
from openpyxl import load_workbook wb = load_workbook("NY_baby_names.xlsx") ws = wb.worksheets for row in ws.iter_rows(min_row=1, max_row=2, min_col=1, max_col=4): print([cell.value for cell in row]) Output: ['Year of Birth', 'Gender', "Child's First Name", 'Count'] [2011, 'FEMALE', 'GERALDINE', 13]
First, we load the workbook with
load_workbook, and then select the first worksheet. We then iterate through the first two rows with the
After the imports, we load the workbook and the worksheet. We then create our Yattag document. We fill the headers with Yattags
asis() method. The asis method enables us to input any string as the next line.
We then create our main tag, Babies. We start looping through our sheet with the
iter_rows method. The
iter_rows method returns a generator with all the cells. We use a list comprehension to get all the values from the cells.
Next, we are adding the babies. Notice the use of the With tag and text. When we are finished we indent our result with Yattags indent method.
Finally, we save our file. The output should look like below. Notice that I only included two babies in the output.
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"></xs:schema> <Babies> <Baby> <Name> GERALDINE </Name> <Gender> FEMALE </Gender> <year> 2011 </year> <count> 13 </count> <rank> 75 </rank> </Baby> <Baby> <Name> GIA </Name> <Gender> FEMALE </Gender> <year> 2011 </year> <count> 21 </count> <rank> 67 </rank> </Baby> </Babies>
from openpyxl import load_workbook from yattag import Doc, indent wb = load_workbook("NY_baby_names.xlsx") ws = wb.worksheets # Create Yattag doc, tag and text objects doc, tag, text = Doc().tagtext() xml_header = '<?xml version="1.0" encoding="UTF-8"?>' xml_schema = '<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"></xs:schema>' doc.asis(xml_header) doc.asis(xml_schema) with tag('Babies'): # Use ws.max_row for all rows for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=5): row = [cell.value for cell in row] with tag("Baby"): with tag("Name"): text(row) with tag("Gender"): text(row) with tag("year"): text(row) with tag("count"): text(row) with tag("rank"): text(row) result = indent( doc.getvalue(), indentation = ' ', indent_text = True ) with open("baby_names.xml", "w") as f: f.write(result)
Popular Baby Names dataset: https://catalog.data.gov/dataset/most-popular-baby-names-by-sex-and-mothers-ethnic-group-new-york-city-8c742