Processing Excel files using Python

    image
    At work, I had to deal with the task of processing xls files using python. A little googling, I came across several libraries with which you can work with excel files.

    Libraries:
    - xlrd - makes it possible to read Excel files
    - xlwt - creates and populates Excel files
    - xlutils - a set of utilities for expanding the capabilities of the previous two libraries
    - pyExcelerator - also makes it possible to work with Excel files, but has not been updated for a long time.

    For my task, I used the first three libraries.
    The task was divided into several parts: reading a file with the xls extension; creating a new one and filling it out; creating a copy of the file based on the input file; deleting the necessary lines in the output file.

    Reading input file


    This task is not very difficult. The documentation and examples that came with xlrd helped solve it quickly.
    Code example:
    import xlrd
    rb = xlrd.open_workbook('d:/final.xls',formatting_info=True)
    sheet = rb.sheet_by_index(0)
    for rownum in range(sheet.nrows):
    row = sheet.row_values(rownum)
    for c_el in row:
    print c_el


    Create a new file and fill it


    This task was no more difficult than the previous one. Documentation and examples helped.
    Code example:
    import xlwt
    from datetime import datetime

    font0 = xlwt.Font()
    font0.name = 'Times New Roman'
    font0.colour_index = 2
    font0.bold = True

    style0 = xlwt.XFStyle()
    style0.font = font0

    style1 = xlwt.XFStyle()
    style1.num_format_str = 'D-MMM-YY'

    wb = xlwt.Workbook()
    ws = wb.add_sheet('A Test Sheet')

    ws.write(0, 0, 'Test', style0)
    ws.write(1, 0, datetime.now(), style1)
    ws.write(2, 0, 1)
    ws.write(2, 1, 1)
    ws.write(2, 2, xlwt.Formula("A3+B3"))

    wb.save('example.xls')


    Creating a copy of the file based on the input file


    This problem can be solved in two ways. Option one: open the input file for reading, create a new file and overwrite all the data from one file to another in a loop. Such a solution is not difficult to implement, so it makes no sense to lay out sample code. Option two: use the xlutils library. This library has a lot of interesting and useful things, but xlutils.copy will be interesting for our task .
    And so, an example of code to create a file based on the input using xlutils.copy:
    import xlrd
    import xlwt
    from xlutils.copy import copy

    rb = open_workbook('final.xls',on_demand=True,formatting_info=True)
    wb = copy(rb)
    wb.save("final_complete.xls")


    Here is such a small code. In order for it to work, the on_demand = True flag must be on . By using the formatting_info flag, the output file is obtained with the same design styles as the input. For my task, this turned out to be the right option.

    Deleting rows by a specified condition


    To solve this problem, it was decided to use a filter. One option is to rewrite from one file to another, excluding those options that do not fulfill a given condition. But there is one catch, if you need to preserve the design style of the document, then this approach will not work (Unless of course you know the design style in advance and can set it programmatically). The solution to this problem was achieved through the use of xlutils.filter. Task: to leave in the output Excel file only those entries that are contained in the transferred list.
    Code that solves this problem:
    from xlutils.filter import GlobReader,BaseFilter,DirectoryWriter,process

    myfile='final2.xls'
    mydir='d:/' 

    class MyFilter(BaseFilter): 

        goodlist = None
        
        def __init__(self,elist): 
            self.goodlist = goodlist
            self.wtw = 0
            self.wtc = 0
             

        def workbook(self, rdbook, wtbook_name): 
            self.next.workbook(rdbook, 'filtered_'+wtbook_name) 

        def row(self, rdrowx, wtrowx):
            pass

        def cell(self, rdrowx, rdcolx, wtrowx, wtcolx): 
            value = self.rdsheet.cell(rdrowx,rdcolx).value
            if value in self.goodlist:
                self.wtc=self.wtc+1 
                self.next.row(rdrowx,wtrowx)
            else:
                return
            self.next.cell(rdrowx,rdcolx,self.wtc,wtcolx)
            
            
    data = """somedata1
    somedata2
    somedata3
    somedata4
    somedata5
    """

    goodlist = data.split("\n")

    process(GlobReader(os.path.join(mydir,myfile)),MyFilter(goodlist),DirectoryWriter(mydir))


    Conclusion


    Using a set of three libraries, the tasks were solved. The following was noticed: if there are graphic elements in the input Excel file (pictures, etc.), they are not transferred to the output file. Perhaps by studying these libraries it will be possible to solve this part of the problem.

    References


    sourceforge.net/projects/pyexcelerator
    www.python-excel.org - the first three libraries.
    groups.google.com/group/python-excel is a group that discusses the use of the xlrd, xlwt, and xlutils libraries.

    PS I think it would be nice to transfer this post to a thematic blog.

    Also popular now: