VBA and Python to automate Excel and MS Office
The reason for the note was an article on Habré in which the author described how he solved the problem of collecting and analyzing metadata from Excel files in Python .
This note reveals in more detail the well-known thesis: For a specific task, you should choose the most suitable tool applicable to office automation.
VBA and Python
VBA (Visual Basic for Applications) is , de facto, the most popular language for automating Microsoft Office. Available out of the box, in addition to Excel, works in PowerPoint, Outlook, Access, Project and other applications.
If you ask the question: “Which programming language to choose first?”, Then somewhere in 90% of all cases, Python will be offered . In practice, there can be any other language, but, based on the popularity of the language and my experience, I will compare it with it.
In general, you can describe the situation through a similar schedule:
There will be no detailed comparison, consider the main killer features, in a situation where a junior programmer / office employee wants to automate something related to MS Office, and he has a choice between languages.
If, for various reasons, there is no choice, then there is nothing to compare.
In favor of VBA
- Excellent work with the object model of Excel and other MS Office applications . Writing VBA code for most internal operations is trivial. Python, in comparison with VBA, has very little support for the Office object model.
- Support for multiple formats MS Office. The biggest problem for external languages is working with different formats of MS Office files. For example, xls, xlsx, xlsm files may require different libraries, since each works well only with its own file format. For VBA, this is all an "Excel file," which is generally equally good at work.
- Work with MS Exchange. If it is necessary to ensure work with corporate mail / calendar on Exchange, then not every language has a normal library for working with Exchange protocol. In VBA, this is solved relatively simply through the use of the MS Outlook object model in a macro.
- Easy installation and distribution. There is no need to install anything to an already established office. In order for a colleague to use the program, it is enough to send him a file with a macro. It is easy to make an add-in that allows you to "install" the macro model in the background of the office.
- Interactivity within MS Office applications. Inside the office programs, you can simply put the macro launch buttons, and (a little harder) make a whole separate UI. This also includes writing your own formulas in Excel and the fact that macros can affect objects inside Office documents in real time.
- Record macros. A handy tool that allows you to record the actions of a person in the finished code, for later editing use.
In favor of Python (and other external programming languages)
- Nice syntax and syntax sugar. In short, the VBA does not differ expressiveness and convenience. This is a matter of personal taste, but for me Python is much more convenient.
- Rich library ecosystem. Huge selection of ready-made libraries for working with the outside world. Trying to make a program on VBA that interacts with some external API is still a pain. Interestingly, just for working with Office files, libraries of the same Python - frankly "at C grade".
- Good development tools. You can choose from a huge selection of programs that facilitate the development process. The standard VBA editor from Office offers very poor functionality and, in comparison with alternatives from the world of Python, is frankly inconvenient. Writing VBA code in an external editor, and then copying it inside the office for debugging is also inconvenient.
- Work speed I did not check the speed of single-threaded work, but I suppose that in the case of single-threaded work, the advantage will be in Python. In any case, multithreaded data / file processing is rather trivially organized, which allows speaking at a higher achievable speed.
Cases
The following are specific tasks that I myself solved or automated, and when I had to choose a stack: VBA or Python. For each task, the selected stack is indicated and an explanation is given why:
- Task: A program to check all Excel files in the directory for hidden sheets
- My choice: VBA . Reasons: the simplicity of working with different formats of Excel files and the lack of external interactions.
- Task: Service that should allow a user to convert PowerPoint files to PDF from a mobile device for viewing.
- The service was implemented as a mail bot, to the address of which the user can send Office files, and in response, the user receives an email with PDF files.
- My choice: VBA + Python logic for monitoring
- First, the internal functions are guaranteed to save PDF that is authentic to the PowerPoint file (external libraries do not do well with the PowerPoint renderer).
- Secondly, the implementation of the mail bot, as a macro in MS Outlook, solved the problems of working with the corporate Exchange mail. So, in Python there is no normal library for working with MS Exchange.
- Python was used to organize monitoring of the service and notification of possible problems.
- Task: A program for merging Powerpoint files with a “subscript” (text for a speaker) into a file for printing
- My choice: VBA . The task was solved through the conversion of two files into PDF and their combination with Riffle Shuffle. Since the quality of converting to PDF is important, the built-in office functions were used to export to PDF.
- Task: A database of industry enterprises is given in the form of cards-individual html files that need to be filtered and merged into 1 Excel file for calculating a number of indicators.
- My choice: Python .
- To retrieve data from html cards, a library for parsing html BeautifulSoup was useful.
- The Excel file is created by the program, so we can decide for ourselves which analytics to calculate already in it, and which even at the data preparation stage in Python.
- My choice: Python .
- Task: Translate all the text in a PowerPoint presentation into another language by a machine translator
- My choice: VBA . It was important to work carefully with the text in the internal objects of the PowerPoint file. The API from Yandex was used for translation, as it is free for small volumes and easy to connect. For example, I couldn’t get the Bing translator API to work in VBA, since I need OAuth with my problems to work. If I had to work with Bing, then, probably, I would do a service intermediary in Python.
- Task: For given exchange tickers, take data from the API with financial indicators (the API provides the CSV file generated by the request) and read a number of benchmarks for analysis based on them
- My choice: Python . Although the API is simple (does not require any signature of requests or authorization) and gives the data to CSV, Python is chosen, since there is no reason to choose VBA, but it is more convenient to write to Python.
- Task: Maintaining the base of orders, sending notifications to performers, generating a report for printing
- Here I chose a very long time, as there are many alternatives:
- Third Party Ready Order System
- Database with some handler
- Access
- Excel
- My choice: VBA
- First, Excel itself is a ready UI for work.
- Secondly, VBA solves the problem of working with corporate Exchange through MS Outlook connection.
- Thirdly, this decision was made with an eye to colleagues for whom Excel was more understandable than anything completely new.
- Here I chose a very long time, as there are many alternatives:
Conclusion
I hope for someone the note will be useful and will allow you to save time on choosing the stack for solving your problems.