Processing Excel files using Python
At work, I had to deal with the task of processing xls files using python. A little googling, I came across several libraries with which you can work with excel files.
- xlrd - makes it possible to read Excel files
- xlwt - creates and populates Excel files
- xlutils - a set of utilities for expanding the capabilities of the previous two libraries
- pyExcelerator - also makes it possible to work with Excel files, but has not been updated for a long time.
For my task, I used the first three libraries.
The task was divided into several parts: reading a file with the xls extension; creating a new one and filling it out; creating a copy of the file based on the input file; deleting the necessary lines in the output file.
Reading input file
This task is not very difficult. The documentation and examples that came with xlrd helped solve it quickly.
rb = xlrd.open_workbook('d:/final.xls',formatting_info=True)
sheet = rb.sheet_by_index(0)
for rownum in range(sheet.nrows):
row = sheet.row_values(rownum)
for c_el in row:
Create a new file and fill it
This task was no more difficult than the previous one. Documentation and examples helped.
from datetime import datetime
font0 = xlwt.Font()
font0.name = 'Times New Roman'
font0.colour_index = 2
font0.bold = True
style0 = xlwt.XFStyle()
style0.font = font0
style1 = xlwt.XFStyle()
style1.num_format_str = 'D-MMM-YY'
wb = xlwt.Workbook()
ws = wb.add_sheet('A Test Sheet')
ws.write(0, 0, 'Test', style0)
ws.write(1, 0, datetime.now(), style1)
ws.write(2, 0, 1)
ws.write(2, 1, 1)
ws.write(2, 2, xlwt.Formula("A3+B3"))
Creating a copy of the file based on the input file
This problem can be solved in two ways. Option one: open the input file for reading, create a new file and overwrite all the data from one file to another in a loop. Such a solution is not difficult to implement, so it makes no sense to lay out sample code. Option two: use the xlutils library. This library has a lot of interesting and useful things, but xlutils.copy will be interesting for our task .
And so, an example of code to create a file based on the input using xlutils.copy:
from xlutils.copy import copy
rb = open_workbook('final.xls',on_demand=True,formatting_info=True)
wb = copy(rb)
Here is such a small code. In order for it to work, the on_demand = True flag must be on . By using the formatting_info flag, the output file is obtained with the same design styles as the input. For my task, this turned out to be the right option.
Deleting rows by a specified condition
To solve this problem, it was decided to use a filter. One option is to rewrite from one file to another, excluding those options that do not fulfill a given condition. But there is one catch, if you need to preserve the design style of the document, then this approach will not work (Unless of course you know the design style in advance and can set it programmatically). The solution to this problem was achieved through the use of xlutils.filter. Task: to leave in the output Excel file only those entries that are contained in the transferred list.
Code that solves this problem:
from xlutils.filter import GlobReader,BaseFilter,DirectoryWriter,process
goodlist = None
self.goodlist = goodlist
self.wtw = 0
self.wtc = 0
def workbook(self, rdbook, wtbook_name):
def row(self, rdrowx, wtrowx):
def cell(self, rdrowx, rdcolx, wtrowx, wtcolx):
value = self.rdsheet.cell(rdrowx,rdcolx).value
if value in self.goodlist:
data = """somedata1
goodlist = data.split("\n")
Using a set of three libraries, the tasks were solved. The following was noticed: if there are graphic elements in the input Excel file (pictures, etc.), they are not transferred to the output file. Perhaps by studying these libraries it will be possible to solve this part of the problem.
www.python-excel.org - the first three libraries.
groups.google.com/group/python-excel is a group that discusses the use of the xlrd, xlwt, and xlutils libraries.