ophermit June 14, 2016 at 14:35

Automate code checking or a little more about pre-commit hooks

Tutorial

I think there is no need to tell the user what Git / GitHub, pre-commit is and how to hook it to the right. Let's get right to the point.

There are many examples of hooks on the network, most of them are on shells, but not a single author paid attention to one important point - the hook has to be dragged from project to project. At first glance - it's okay. But suddenly there is a need to make changes to a hook that already lives in 20 projects ... Or suddenly you need to transfer development from Windows to Linux, and the hook to PowerShell ... What should I do? ~~??????? PROFIT~~ ...

“Better like that: 8 pies and one candle!”

The examples, of course, are greatly exaggerated, but with their help inconveniences were identified that we would like to avoid. I want the hook not to be dragged across all the projects, I didn’t have to “finish it” often, but at the same time he could:

check the code sent to the repository for validity (for example: compliance with PEP8 requirements, availability of documentation, etc.);
Perform a comprehensive verification of the project (unit tests, etc.);
abort the commit operation in case of errors and display a detailed log for parsing.

And it looked something like this:

python pre-commit.py --check pep8.py --test tests.py

It is clear that the hook itself is just a starter, and all the ~~special street~~ magic is performed by the script it launches. Let's try to write such a script. To those interested - welcome to cat.

pre-commit.py

But before ~~downloading the finished example from the network to~~ start development, consider the parameters that take them. At the same time, I’ll explain how everything works using their example.

These parameters will define the main behavior of the script:

-c or --check [script1 ... scriptN] - run validation scripts. The script should be located in the same directory as pre-commit.py . Otherwise, you must specify the full path. Each script will be “fed” files from the current commit.
-t or --test [test1 ... testN] - run unit tests and other scripts that do not require current commit files. The test should be located in the current project directory. Otherwise, you must specify the full path.

Both parameters will be optional (for the possibility of leaving only one type of check), but if you do not specify any of them, pre-commit.py will exit with the code “1” (error).

And add auxiliary parameters (all optional):

-e or --exec path_to_interpreter - the full path (with the file name) to the interpreter that will execute scripts from --check and --test . If you do not specify a parameter, the interpreter that executes pre-commit.py will be used .
-v or --verbose - enable verbose logging. If not specified, the console output of those scripts, the execution of which ended with an error code, is written to the log.
-o or --openlog path_to_viewer - the full path (with the file name) to the program, which we will view the log.
-f or --forcelog - force log opening. If not specified, the log is opened only if errors are detected. The parameter is applicable if --openlog is specified .

The logic is clear, now you can start writing the script itself.

Command line options

First, configure the command line parameters parser. Here we will use the argparse module (or "on the fingers" are well explained here and here ), since it is included in the base Python package .

# -*- coding: utf-8 -*-
import sys
import argparse
# Создадим объект парсера
parser = argparse.ArgumentParser()
# Добавим необязательный параметр. Если параметр задан,
#    ему необходимо указать значение: список из 1-N элементов
parser.add_argument('-c', '--check', nargs='+')
# Аналогично параметру --check
parser.add_argument('-t', '--test', nargs='+')
# Добавим параметр-флаг. Если задан, его значение будет равно
#    True. Если не задан - False
parser.add_argument('-v', '--verbose', action='store_true')
# Необязательный параметр с обязательным значением.
#    Если не задан - значение=default
parser.add_argument('-e', '--exec', default=sys.executable)
# Необязательный параметр с обязательным значением.
#    Если не задан - значение=None
parser.add_argument('-o', '--openlog')
# Аналогично параметру --verbose
parser.add_argument('-f', '--forcelog', action='store_true')
# Отсекаем 1-й параметр (имя текущего скрипта), парсим
#    остальные параметры и помещаем результат в dict
params = vars(parser.parse_args(sys.argv[1:]))

Run the script with the following parameters:

c:\python34\python c:\dev\projects\pre-commit-tool\pre-commit.py --check c:\dev\projects\pre-commit-tool\pep8.py --test tests.py

And display the contents of params on the screen:

{'exec': 'c:\\python34\\python.exe', 'forcelog': False, 'test': ['tests.py'], 'check': ['c:\\dev\\projects\\pre-commit-tool\\pep8.py'], 'openlog': None, 'verbose': False}

Now the values of all parameters are in the params dictionary and they can easily be obtained by the key of the same name.
Add a check for the presence of the main parameters:

# Выход в случае отсутствия обоих параметров скриптов проверок
if params.get('check') is None and params.get('test') is None:
    print('Не заданы скрипты проверок')
    exit(1)

Everything is fine, but you can simplify your life a little, without compromising flexibility. We know that in 99% of cases the validation script is one and it is called, for example, 'pep8.py', and the unit test script in our authority is called the same each time (and often it will also be one). Similarly with the display of the log - we will always use the same program (let it be Notepad). Let's make changes to the parser configuration:

# Теперь параметры принимают значением список из 0-N элементов
parser.add_argument('-c', '--check', nargs='*')
parser.add_argument('-t', '--test', nargs='*')
# Если параметру не указывать значение, будет использовано значение из const
parser.add_argument('-o', '--openlog', nargs='?', const='notepad')

And add the default setting:

if params.get('check') is not None and len(params.get('check')) == 0:
    # Добавляем к имени скрипта каталог, в котором pre-commit.py
    params['check'] = [join(dirname(abspath(__file__)), 'pep8.py')]
if params.get('test') is not None and len(params.get('test')) == 0:
    params['test'] = ['tests.py']

After making the changes, the parser configuration code should look like this:

# -*- coding: utf-8 -*-
import sys
import argparse
from os.path import abspath, dirname, join
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--check', nargs='*')
parser.add_argument('-t', '--test', nargs='*')
parser.add_argument('-v', '--verbose', action='store_true')
parser.add_argument('-e', '--exec', default=sys.executable)
parser.add_argument('-o', '--openlog', nargs='?', const='notepad')
parser.add_argument('-f', '--forcelog', action='store_true')
params = vars(parser.parse_args(sys.argv[1:]))
if params.get('check') is None and params.get('test') is None:
    print('Не заданы скрипты проверок')
    exit(1)
if params.get('check') is not None and len(params.get('check')) == 0:
    params['check'] = [join(dirname(abspath(__file__)), 'pep8.py')]
if params.get('test') is not None and len(params.get('test')) == 0:
    params['test'] = ['tests.py']

Now the script launch line has become shorter:

c:\python34\python c:\dev\projects\pre-commit-tool\pre-commit.py --check --test --openlog

params content :

{'check': ['c:\\dev\\projects\\pre-commit-tool\\pep8.py'], 'openlog': 'notepad', 'test': ['tests.py'], 'verbose': False, 'exec': 'c:\\python34\\python.exe', 'forcelog': False}

Parameters won, move on.

Log

Set up the log object. The log file 'pre-commit.log' will be created in the root of the current project. For Git, the working directory is the root of the project, so we do not specify the path to the file. Also, we indicate the mode of creating a new file for each operation (we do not need to store previous logs) and set the format of the log - only a message:

import logging
log_filename = 'pre-commit.log'
logging.basicConfig(
    filename=log_filename, filemode='w', format='%(message)s',
    level=logging.INFO)
to_log = logging.info

The last line of code will simplify our life a bit - create an alias that we will use further in the code instead of logging.info .

Shell

We will need to repeatedly launch the child processes and read their output to the console. To implement this need, we write the shell_command function . Her responsibilities will include:

starting a subprocess (using Popen );
reading data from the console of the subprocess and its conversion;
writing read data to the log if the subprocess terminated with an error code.

The function will take arguments:

command is an argument for Popen . Actually what will run in Shell. But instead of a whole line ("python main.py"), it is recommended to set it with a list (['python', 'main.py']);
force_report - control output to the log. It can take values: True - force output to the log, False - output if an error code is received, None - disable output to the log.

from subprocess import Popen, PIPE
def shell_command(command, force_report=None):
    # Запускаем подпроцесс
    proc = Popen(command, stdout=PIPE, stderr=PIPE)
    # Ожидаем его завершения
    proc.wait()
    # Функция для преобразования данных
    #    (конвертируем в строку, удаляем "\r\n")
    transform = lambda x: ' '.join(x.decode('utf-8').split())
    # Считываем (и преобразуем) поток stdout
    report = [transform(x) for x in proc.stdout]
    # Добавляем поток stderr
    report.extend([transform(x) for x in proc.stderr])
    # Выводим в лог зависимо от значения аргумента force_report
    if force_report is True or (force_report is not None and proc.returncode > 0):
        to_log('[ SHELL ] %s (code: %d):\n%s\n'
               % (' '.join(command), proc.returncode, '\n'.join(report)))
    # Возвращаем код завершения подпроцесса и консольный вывод в виде списка
    return proc.returncode, report

Head revision

The list of files of the current commit is easily obtained using the console command Git - "diff" . In our case, you need modified or new files:

from os.path import basename
# Устанавливаем глобальный код результата
result_code = 0
# Получаем список файлов текущего commit'а
code, report = shell_command(
    ['git', 'diff', '--cached', '--name-only', '--diff-filter=ACM'],
    params.get('verbose'))
if code != 0:
    result_code = code
# Фильтруем файлы по расширению "py"
targets = filter(lambda x: x.split('.')[-1] == "py", report)
# Добавляем каждому файлу путь (текущий каталог проекта)
targets = [join(dirname(abspath(x)), basename(x)) for x in targets]

As a result, targets will contain something like this:

['C:\\dev\\projects\\example\\demo\\daemon_example.py', 'C:\\dev\\projects\\example\\main.py', 'C:\\dev\\projects\\example\\test.py', 'C:\\dev\\projects\\example\\test2.py']

The most painful stage is completed - it will be easier further.

Validation check

Everything is simple here - we’ll go through all the scripts specified in --check and run each with a list of targets :

if params.get('check') is not None:
    for script in params.get('check'):
        code, report = shell_command(
            [params.get('exec'), script] + targets, params.get('verbose'))
        if code != 0:
            result_code = code

Example of the contents of the log on the code that did not pass the validation check:

[ SHELL ] C:\python34\python.exe c:\dev\projects\pre-commit-tool\pep8.py C:\dev\projects\example\demo\daemon_example.py (code: 1):

C:\dev\projects\example\demo\daemon_example.py:8:80: E501 line too long (80 > 79 characters)

Running tests

We do the same with unit tests, only without targets :

if params.get('test') is not None:
    for script in params.get('test'):
        code, report = shell_command(
            [params.get('exec'), script], params.get('verbose'))
        if code != 0:
            result_code = code

[UPD] We display a log

Depending on the global result code and the --openlog and --forcelog parameters , we decide whether to display the log or not:

if params.get('openlog') and (result_code > 0 or params.get('forcelog')):
    # Запускаем независимый процесс
    Popen([params.get('openlog'), log_filename], close_fds=True)

Note. Works in versions of Python 2.6 (and higher) and 3.x. On versions below 2.6, tests were not performed.

And do not forget to return the result code to the Git shell at the end of the script :

exit(result_code)

All. The script is ready to use.

Root of evil

A hook is a file called "pre-commit" (without an extension) that needs to be created in the directory: <project_dir> /. Git / hooks /

There are a couple of important points for Windows to run correctly:
1. The first line of the file should be: # ! / bin / sh
Otherwise, we will see the following error: 2. Using the standard separator when specifying the path leads to a similar error: It is treated in three ways: we use a double backslash, either we take the whole path in double quotes, or use "/". For example, Windows eats this and does not choke:

GitHub.IO.ProcessException: error: cannot spawn .git/hooks/pre-commit: No such file or directory

GitHub.IO.ProcessException: C:\python34\python.exe: can't open file 'c:devprojectspre-commit-toolpre-commit.py': [Errno 2] No such file or directory

#!/bin/sh
c:/python34/python "c:\dev\projects\pre-commit-tool\pre-commit.py" -c -t c:\\dev\\projects\\example\\test.py

Of course, this is not recommended :) Use any method that you like, but one .

Acceptance tests

We will train “on cats”:

The test commit has new, renamed / modified and deleted files. Also included are files that do not contain code; the code itself contains design errors and does not pass one of the unit tests. Let's create a hook with validation, tests and opening a detailed log:

c:/python34/python c:/dev/projects/pre-commit-tool/pre-commit.py -c -t test.py test2.py -vfo

And try to execute commit. After thinking a couple of seconds, Git desktop will signal an error:

And in the next window, the notepad will display the following: Repeat the same commit, only without a detailed log:

[ SHELL ] git diff --cached --name-only --diff-filter=ACM (code: 0):

.gitattributes1

demo/daemon_example.py

main.py

test.py

test2.py


[ SHELL ] C:\python34\python.exe c:\dev\projects\pre-commit-tool\pep8.py C:\dev\projects\example\demo\daemon_example.py C:\dev\projects\example\main.py C:\dev\projects\example\test.py C:\dev\projects\example\test2.py (code: 1):

C:\dev\projects\example\demo\daemon_example.py:8:80: E501 line too long (80 > 79 characters)

C:\dev\projects\example\demo\daemon_example.py:16:5: E303 too many blank lines (2)

C:\dev\projects\example\demo\daemon_example.py:37:5: E303 too many blank lines (2)

C:\dev\projects\example\demo\daemon_example.py:47:5: E303 too many blank lines (2)

C:\dev\projects\example\main.py:46:80: E501 line too long (90 > 79 characters)

C:\dev\projects\example\main.py:59:80: E501 line too long (100 > 79 characters)

C:\dev\projects\example\main.py:63:80: E501 line too long (115 > 79 characters)

C:\dev\projects\example\main.py:69:80: E501 line too long (105 > 79 characters)

C:\dev\projects\example\main.py:98:80: E501 line too long (99 > 79 characters)

C:\dev\projects\example\main.py:115:80: E501 line too long (109 > 79 characters)

C:\dev\projects\example\main.py:120:80: E501 line too long (102 > 79 characters)

C:\dev\projects\example\main.py:123:80: E501 line too long (100 > 79 characters)


[ SHELL ] C:\python34\python.exe test.py (code: 1):

Test 1 - passed

Test 2 - passed

[!] Test 3 FAILED


[ SHELL ] C:\python34\python.exe test2.py (code: 0):

Test 1 - passed

Test 2 - passed

c:/python34/python c:/dev/projects/pre-commit-tool/pre-commit.py -c -t test.py test2.py -fo

Result: We fix the errors, repeat commit, and here it is, the long-awaited result: Git desktop does not swear, and the notepad shows an empty pre-commit.log . PROFIT. You can see the finished example here .

[ SHELL ] C:\python34\python.exe c:\dev\projects\pre-commit-tool\pep8.py C:\dev\projects\example\demo\daemon_example.py C:\dev\projects\example\main.py C:\dev\projects\example\test.py C:\dev\projects\example\test2.py (code: 1):

C:\dev\projects\example\demo\daemon_example.py:8:80: E501 line too long (80 > 79 characters)

C:\dev\projects\example\demo\daemon_example.py:16:5: E303 too many blank lines (2)

C:\dev\projects\example\demo\daemon_example.py:37:5: E303 too many blank lines (2)

C:\dev\projects\example\demo\daemon_example.py:47:5: E303 too many blank lines (2)

C:\dev\projects\example\main.py:46:80: E501 line too long (90 > 79 characters)

C:\dev\projects\example\main.py:59:80: E501 line too long (100 > 79 characters)

C:\dev\projects\example\main.py:63:80: E501 line too long (115 > 79 characters)

C:\dev\projects\example\main.py:69:80: E501 line too long (105 > 79 characters)

C:\dev\projects\example\main.py:98:80: E501 line too long (99 > 79 characters)

C:\dev\projects\example\main.py:115:80: E501 line too long (109 > 79 characters)

C:\dev\projects\example\main.py:120:80: E501 line too long (102 > 79 characters)

C:\dev\projects\example\main.py:123:80: E501 line too long (100 > 79 characters)


[ SHELL ] C:\python34\python.exe test.py (code: 1):

Test 1 - passed

Test 2 - passed

[!] Test 3 FAILED

[UPD] Instead of a conclusion

Of course, this script is not a panacea. It is useful when all necessary checks are limited to running test scripts locally. In complex projects, the concept of Continuous Integration (or CI) is usually used , and here Travis (for Linux and OS X) and its analogue AppVeyor (for Windows) come to the rescue .

[UPD] Another alternative is overcommit . A pretty functional tool for managing Git hooks . But there are nuances - for overcommit to work, you need to deploy the Ruby interpreter locally .

Have a nice coding and correct commits.

Only registered users can participate in the survey. Please come in.

What is missing in the article (as a training material)?

68.7% Everything is clear and in sufficient detail. eleven
25% Some points need to be described in more detail (indicate what exactly in the comments). 4
6.2% Own option (in the comments). 1

Tags: