Processing Excel files using Python

At work, I had to deal with the task of processing xls files using python. A little googling, I came across several libraries with which you can work with excel files.
Libraries:
- xlrd - makes it possible to read Excel files
- xlwt - creates and populates Excel files
- xlutils - a set of utilities for expanding the capabilities of the previous two libraries
- pyExcelerator - also makes it possible to work with Excel files, but has not been updated for a long time.
For my task, I used the first three libraries.
The task was divided into several parts: reading a file with the xls extension; creating a new one and filling it out; creating a copy of the file based on the input file; deleting the necessary lines in the output file.
Reading input file
This task is not very difficult. The documentation and examples that came with xlrd helped solve it quickly.
Code example:
import xlrd
rb = xlrd.open_workbook('d:/final.xls',formatting_info=True)
sheet = rb.sheet_by_index(0)
for rownum in range(sheet.nrows):
row = sheet.row_values(rownum)
for c_el in row:
print c_el
Create a new file and fill it
This task was no more difficult than the previous one. Documentation and examples helped.
Code example:
import xlwt
from datetime import datetime
font0 = xlwt.Font()
font0.name = 'Times New Roman'
font0.colour_index = 2
font0.bold = True
style0 = xlwt.XFStyle()
style0.font = font0
style1 = xlwt.XFStyle()
style1.num_format_str = 'D-MMM-YY'
wb = xlwt.Workbook()
ws = wb.add_sheet('A Test Sheet')
ws.write(0, 0, 'Test', style0)
ws.write(1, 0, datetime.now(), style1)
ws.write(2, 0, 1)
ws.write(2, 1, 1)
ws.write(2, 2, xlwt.Formula("A3+B3"))
wb.save('example.xls')
Creating a copy of the file based on the input file
This problem can be solved in two ways. Option one: open the input file for reading, create a new file and overwrite all the data from one file to another in a loop. Such a solution is not difficult to implement, so it makes no sense to lay out sample code. Option two: use the xlutils library. This library has a lot of interesting and useful things, but xlutils.copy will be interesting for our task .
And so, an example of code to create a file based on the input using xlutils.copy:
import xlrd
import xlwt
from xlutils.copy import copy
rb = open_workbook('final.xls',on_demand=True,formatting_info=True)
wb = copy(rb)
wb.save("final_complete.xls")
Here is such a small code. In order for it to work, the on_demand = True flag must be on . By using the formatting_info flag, the output file is obtained with the same design styles as the input. For my task, this turned out to be the right option.
Deleting rows by a specified condition
To solve this problem, it was decided to use a filter. One option is to rewrite from one file to another, excluding those options that do not fulfill a given condition. But there is one catch, if you need to preserve the design style of the document, then this approach will not work (Unless of course you know the design style in advance and can set it programmatically). The solution to this problem was achieved through the use of xlutils.filter. Task: to leave in the output Excel file only those entries that are contained in the transferred list.
Code that solves this problem:
from xlutils.filter import GlobReader,BaseFilter,DirectoryWriter,process
myfile='final2.xls'
mydir='d:/'
class MyFilter(BaseFilter):
goodlist = None
def __init__(self,elist):
self.goodlist = goodlist
self.wtw = 0
self.wtc = 0
def workbook(self, rdbook, wtbook_name):
self.next.workbook(rdbook, 'filtered_'+wtbook_name)
def row(self, rdrowx, wtrowx):
pass
def cell(self, rdrowx, rdcolx, wtrowx, wtcolx):
value = self.rdsheet.cell(rdrowx,rdcolx).value
if value in self.goodlist:
self.wtc=self.wtc+1
self.next.row(rdrowx,wtrowx)
else:
return
self.next.cell(rdrowx,rdcolx,self.wtc,wtcolx)
data = """somedata1
somedata2
somedata3
somedata4
somedata5
"""
goodlist = data.split("\n")
process(GlobReader(os.path.join(mydir,myfile)),MyFilter(goodlist),DirectoryWriter(mydir))
Conclusion
Using a set of three libraries, the tasks were solved. The following was noticed: if there are graphic elements in the input Excel file (pictures, etc.), they are not transferred to the output file. Perhaps by studying these libraries it will be possible to solve this part of the problem.
References
sourceforge.net/projects/pyexcelerator
www.python-excel.org - the first three libraries.
groups.google.com/group/python-excel is a group that discusses the use of the xlrd, xlwt, and xlutils libraries.