Integration of MS Excel and Python

    Good afternoon, dear readers.

    In today's article, I would like, as much as possible, to consider the integration of Python and MS Excel applications . A given question may arise, for example, when creating some kind of online reporting system, which should upload the results to a generally accepted format, well, or some other tasks. Also in the article I will show the reverse integration, i.e. how to use a function written in python in Excel, which can also be useful for automating reports.

    Working with MS Excel files in Python

    To work with Excel files from Python, I know 2 options:

    1. Using libraries like xlrd, xlwt, xlutils or openpyxl
    2. Work with com object

    Consider working with these methods in more detail. As an example, we will use the finished excel file from which we first read the data from the first cell, and then write it to the second. Such simple examples will be enough for the first acquaintance.

    Using libraries

    So, the first method is quite simple and well described. For example, there is an excellent article to describe how to work with xlrd, xlwt, xlutils . Therefore, in this article I will give a small piece of code with their use.

    To get started, download the necessary libraries and open the xls file for reading and select the
    desired data sheet:

    import xlrd, xlwt
    #открываем файл
    rb = xlrd.open_workbook('../ArticleScripts/ExcelPython/xl.xls',formatting_info=True)
    #выбираем активный лист
    sheet = rb.sheet_by_index(0)

    Now let's see how to read the values ​​from the desired cells:

    #получаем значение первой ячейки A1
    val = sheet.row_values(0)[0]
    #получаем список значений из всех записей
    vals = [sheet.row_values(rownum) for rownum in range(sheet.nrows)]

    As you can see, reading data is not difficult. Now write them to another file. To do this, create a new excel file with a new workbook:

    wb = xlwt.Workbook()
    ws = wb.add_sheet('Test')

    We will write the previously obtained data to a new file and save the changes:

    #в A1 записываем значение из ячейки A1 прошлого файла
    ws.write(0, 0, val[0])
    #в столбец B запишем нашу последовательность из столбца A исходного файла
    i = 0
    for rec in vals:
        i =+ i
    #сохраняем рабочую книгу'../ArticleScripts/ExcelPython/xl_rec.xls')

    The example above shows that the xlrd library is responsible for reading data, and xlwt is responsible for writing, so there is no way to make changes to an already created book without copying it to a new one. In addition, these libraries work only with xls format files (Excel 2003) and they do not have support for the new xlsx format (Excel 2007 and higher).

    To successfully work with the xlsx format, you will need the openpyxl library . To demonstrate its work, we will perform the actions that were shown for previous libraries.

    To get started, load the library and select the desired book and worksheet:

    import openpyxl
    wb = openpyxl.load_workbook(filename = '../ArticleScripts/ExcelPython/openpyxl.xlsx')
    sheet = wb['test']

    As can be seen from the above listing, this is not difficult to do. Now let's see how the data can be read:

    #считываем значение определенной ячейки
    val = sheet['A1'].value
    #считываем заданный диапазон
    vals = [v[0].value for v in sheet.range('A1:A2')]

    The difference from previous libraries is that openpyxl makes it possible to display to cells and sequences through their names, which is quite convenient and understandable when reading a program.

    Now let's see how we can record and save the data:

    #записываем значение в определенную ячейку
    sheet['B1'] = val
    #записываем последовательность
    i = 0
    for rec in vals:
        sheet.cell(row=i, column=2).value = rec
        i =+ 1
    # сохраняем данные'../ArticleScripts/ExcelPython/openpyxl.xlsx')

    The example shows that recording is also quite easy. In addition, in the code above, you can see that, in addition to cell names , openpyxl can also work with their indices.

    The disadvantages of this library include, as in the previous example, there is no way to save changes without creating a new book.

    As shown above, for more or less complete work with excel files, in this case, you need 4 libraries, and this is not always convenient. In addition, you will probably need access to VBA (let's say for any subsequent processing) and you can’t get it using these libraries.

    However, working with these libraries is quite simple and convenient for quickly creating Excel files for formatting them, but if you need more features, the next sub-item is for you.

    Work with com object

    In my reports, I prefer to use the second method, namely using the Excel file through a com object using the win32com library . Its advantage is that you can perform all the operations with a file that normal Excel can do using VBA.

    We illustrate this with the same problem as the previous examples.

    First, load the desired library and create a COM object.

    import win32com.client
    Excel = win32com.client.Dispatch("Excel.Application")

    Now we can work using the Excel object, we can access all the features of VBA. Let's first open any book and select the active sheet. This can be done like this:

    wb = Excel.Workbooks.Open(u'D:\\Scripts\\DataScience\\ArticleScripts\\ExcelPython\\xl.xls')
    sheet = wb.ActiveSheet

    Let's get the value of the first cell and sequence:

    #получаем значение первой ячейки
    val = sheet.Cells(1,1).value
    #получаем значения цепочки A1:A2
    vals = [r[0].value for r in sheet.Range("A1:A2")]

    As you can see, we operate here with pure VBA functions. This is very convenient if you have written macros and you want to use them when working with Python at a minimal cost to rework the code.

    Let's see how we can record the obtained values:

    #записываем значение в определенную ячейку
    sheet.Cells(1,2).value = val
    #записываем последовательность
    i = 1
    for rec in vals:
        sheet.Cells(i,3).value = rec
        i = i + 1
    #сохраняем рабочую книгу
    #закрываем ее
    #закрываем COM объект

    The example shows that these operations are also quite simple to implement. In addition, it can be noted that we saved the changes in the same book that we opened for reading, which is quite convenient.

    However, an attentive reader will pay attention to the variable i , which is not initialized to 0, as accepted by python, but 1. This is due to the fact that we work with cell indices as from VBA, and there the numbering starts not from 0, but from 1.

    This completes the analysis of how to work with excel files in python and proceed to the inverse problem.

    Call Python functions from MS Excel

    Such a situation may arise that you already have some function that processes data in python, and you need to transfer its functionality to Excel. Of course, you can rewrite it to VBA, but why?

    There is a great ExcelPython add-in for using python functions in Excel . With it, you can call functions written in python directly from Excel, though you still have to write a small wrapper in VBA, and all this will be shown below.

    So, suppose we have a function written in python that we want to use:

    def get_unique(lists):
        sm = 0
        for i in lists:
            sm = sm + int(i.pop()) 
        return sm

    At the entrance, she is given a list consisting of lists, this is one of the conditions that must be met for this function to work in Excel.

    We will save the function in the file and put it in the same directory where our excel file will be located, with which we will work.

    Now install ExcelPython . Installation occurs through the launch of the exe-file and does not cause difficulties.

    When all preparations are completed, open the excel test file and call the VBA editor (Alt + F11). To work with the above add-on, you need to connect it through Tools-> References , as shown in the figure:

    Well, now you can start writing a wrapper function for our Python module . It will look like this:

    Function sr(lists As Range)
        On Error GoTo do_error
            Set plugin = PyModule("plugin", AddPath:=ThisWorkbook.Path)
            Set result = PyCall(plugin, "get_unique", PyTuple(lists.Value2))
            sr = WorksheetFunction.Transpose(PyVar(result))
            Exit Function
            sr = Err.Description
    End Function

    So what happens in this function?

    To begin with, with the help PyModule, we connect the desired module. To do this, as parameters, it is passed the name of the module without an extension, and the path to the folder in which it is located. At the output of the work, PyModulewe get an object for working with the module.

    Then, with the help PyCall, we call the function we need from the specified module. As parameters it PyCallreceives the following:

    1. The module object obtained in the previous step
    2. Function Name
    3. Parameters passed to the function (passed as a list)

    The function PyTuplereceives any values ​​at the input and converts them into a tuple object of the Python language.
    Well, and accordingly, it PyVarperforms the operation of converting the result of the python function to a type understood by Excel.

    Now, to make sure that our bundle is working, we call our freshly baked function on a worksheet in Excel:

    As you can see from the figure, everything worked correctly.

    It should be noted that this material uses the old version of ExcelPython , and a new version is available on GitHub'e of the author.


    In conclusion, it should be noted that the examples in this article are the simplest and for a more in-depth study of these methods, I recommend that you refer to the
    documentation for the packages you need.

    I also want to note that these packages are not the only ones and the article omits consideration of such packages as xlsxwriter for generating excel files or xlwings , which can work with Excel files on the fly, as well as PyXLL , which performs similar functions of ExcelPython .

    In addition to this, in the article I tried to generalize the material scattered over the network, as such questions often appear on forums and I think it will be useful for some to have such a "cheat sheet" on hand.

    Also popular now: