Vim croquet

Transfer

There is absolutely no translator from me, but I just could not get past this article, because it radiates waves of coolness, and the concentration of Zen in it rolls over. Therefore welcome.

Introduction

I recently discovered an interesting game called VimGolf . The goal of this game is to convert a piece of text from one form to another with the smallest possible number of keystrokes. While I was playing on this site with various puzzles, I was curious - and what kind of text editing habits do I have? I wanted to better understand how to manipulate text in Vim and see if I could find inefficient moments in my workflow. I spend a huge amount of time in my text editor, so eliminating even minor irregularities can lead to a significant increase in productivity. In this post I will talk about my analysis and how I reduced the number of keystrokes when using Vim. I called this game Vim Croquet.

Data collection

I began my analysis with data collection. Text editing on my computer always happens using Vim, so for 45 days I logged any keystroke in it using the scriptout flag. For convenience, I made alias to record clicks in the log:

alias vim='vim -w ~/.vimlog "$@"'

After that, it was necessary to parse the obtained data, but it was not so easy. Vim is a modal editor in which one command can have several different meanings in different modes. In addition, the commands are context-sensitive, where their behavior may differ depending on where they are executed inside the vim buffer. For example, the cib command in normal mode puts the user in edit mode if the command is executed inside the brackets, but leaves the user in normal mode if it is executed outside the brackets. If cib is executed in edit mode, then it will have a completely different behavior - it will write “cib” characters to the current buffer.

I reviewed several candidates for parsing vim commands, including industrial libraries such as antler and parsec , as well as the vim specializing project vimprint . After some thought, I decided to write my own tool, because spending a lot of time on the study of quite complex parsers seemed unreasonable for this task.

I wrote a damp lexer on haskell to split the keystrokes I collected into individual vim commands. My lexer uses monoids to extract normal mode commands from the log for further analysis. Here is the lexer source:

importqualified Data.ByteString.Lazy.Char8 as LC
importqualified Data.List as DL
importqualified Data.List.Split as LS
import Data.Monoid
import System.IO
main = hSetEncoding stdout utf8 >> 
       LC.getContents >>= mapM_ putStrLn . process
process =   affixStrip 
          . startsWith 
          . splitOnMode
          . modeSub
          . capStrings 
          . split mark 
          . preprocess
subs = appEndo . mconcat . map (Endo . sub)
sub (s,r) lst@(x:xs)
    | s `DL.isPrefixOf` lst = sub'
    | otherwise = x:sub (s,r) xs
    where
        sub' = r ++ sub (s,r) (drop (length s) lst)
sub (_,_) [] = []
preprocess =   subs meta 
             . DL.intercalate " "
             . DL.words
             . DL.unwords
             . DL.lines 
             . LC.unpack
splitOnMode = DL.concat $ map (\el -> split mode el)
startsWith = filter (\el -> mark `DL.isPrefixOf` el && el /= mark)
modeSub = map (subs mtsl)
split s r = filter (/= "") $ s `LS.splitOn` r
affixStrip =   clean 
             . concat 
             . map (\el -> split mark el)
capStrings = map (\el -> mark ++ el ++ mark)
clean = filter (not . DL.isInfixOf "[M")
(mark, mode, n) = ("-(*)-","-(!)-", "")
meta = [("\"",n),("\\",n),("\195\130\194\128\195\131\194\189`",n),
        ("\194\128\195\189`",n),("\194\128kb\ESC",n), 
        ("\194\128kb",n),("[>0;95;c",n), ("[>0;95;0c",n),
        ("\ESC",mark),("\ETX",mark),("\r",mark)]
mtsl = [(":",mode),("A",mode), ("a",mode), ("I",mode), ("i",mode),
        ("O",mode),("o",mode),("v", mode),("/",mode),("\ENQ","⌃e"),
        ("\DLE","⌃p"),("\NAK","⌃u"),("\EOT","⌃d"),("\ACK","⌃f"),
        ("\STX","⌃f"),("\EM","⌃y"),("\SI","⌃o"),("\SYN","⌃v"),
        ("\DC2","⌃r")]

And here is an example of data before and after processing:

cut -c 1-42 ~/.vimlog | tee >(cat -v;echo) | ./lexer
`Mihere's some text^Cyyp$bimore ^C0~A.^C:w^M:q
`M
yyp$b
0~

The lexer reads from standard input and sends processed commands to standard output. In the example above, the raw data is located on the second line, and the processing result is on the following. Each line represents a group of normal mode commands executed in the corresponding sequence. The lexer correctly determined that I started in normal mode by going to some buffer using the label `M , then entered here's some text in edit mode, then copied / pasted the line and went to the beginning of the last word in the line with the command yyp $ b . Then he entered additional text and eventually moved to the beginning of the line, replacing the first character with the capital command 0 ~ .

Key usage map

After processing the data zalogirovannyh I forknul wonderful project heatmap-keyboard authored by Patrick Wied , and add your own custom layer for reading lexer output. This project did not define most meta-characters, for example, ESC, Ctrl and Cmd, so I needed to write a data loader in JavaScript and make some other modifications. I translated the meta-characters used in vim into unicode and projected them onto the keyboard. Here's what I got on the number of commands close to 500,000 (the color intensity indicates the frequency of use of the keys).

The resulting map shows that the Ctrl key is most often used - I use it for numerous navigation commands in vim. For example, ^ p forControlP , or loop through open buffers via ^ j ^ k .

Another feature that was struck at the map analysis - is the frequent use of ^ E ^ the Y . Every day I use these commands to navigate up / down the code, although vertical movement using them is inefficient. Each time one of these commands is executed, the cursor moves only a few lines at a time. It would be more efficient to use the ^ U ^ D commands , since they move the cursor half the screen.

Frequency of use of commands

The key usage map gives a good idea of how individual keys are used, but I wanted to know more about how I use different key sequences. I sorted the lines in the lexer output by frequency to see the most used normal mode commands using a single line:

$ sort normal_cmds.txt | uniq -c | sort -nr | head -10 | \
    awk '{print NR,$0}' | column -t
1   2542    j
2   2188    k
3   1927    jj
4   1610    p
5   1602    ⌃j
6   1118    Y
7   987     ⌃e
8   977     zR
9   812     P
10  799     ⌃y

It was amazing for me to see zR in eighth place. After considering this fact, I realized the serious inefficiency in my approach to editing text. The fact is that in my .vimrc it is instructed to automatically collapse blocks of text. But the problem with this configuration was that I almost immediately expanded the entire text, so this made no sense. Therefore, I simply deleted this setting from the config to remove the need for frequent use of zR .

Team difficulty

Another optimization that I wanted to take a look at is the complexity of the normal mode commands. I was curious to see if I could find commands that I use on a daily basis but that require too many keystrokes. Such commands could be replaced with shortcuts that would speed up their execution. As a measure of command complexity, I used entropy , which I measured with the following short Python script:

#!/usr/bin/env pythonimport sys
from codecs import getreader, getwriter
from collections import Counter
from operator import itemgetter
from math import log, log1p
sys.stdin = getreader('utf-8')(sys.stdin)
sys.stdout = getwriter('utf-8')(sys.stdout)
defH(vec, correct=True):"""Calculate the Shannon Entropy of a vector
    """
    n = float(len(vec))
    c = Counter(vec)
    h = sum(((-freq / n) * log(freq / n, 2)) for freq in c.values())
    # impose a penality to correct for sizeif all([correct isTrue, n > 0]):
        h = h / log1p(n)
    return h
defmain():
    k = 1
    lines = (_.strip() for _ in sys.stdin)
    hs = ((st, H(list(st))) for st in lines)
    srt_hs = sorted(hs, key=itemgetter(1), reverse=True)
    for n, i in enumerate(srt_hs[:k], 1):
        fmt_st = u'{r}\t{s}\t{h:.4f}'.format(r=n, s=i[0], h=i[1])
        print fmt_st
if __name__ == '__main__':
    main()

The script reads from the standard input stream and issues the commands with the highest entropy. I used lexer output as data to calculate entropy:

$ sort normal_cmds.txt | uniq -c | sort -nr | sed "s/^[ \t]*//" | \
    awk 'BEGIN{OFS="\t";}{if ($1>100) print $1,$2}' | \
    cut -f2 | ./entropy.py
1 ggvG$"zy 1.2516

I select the teams that have been executed more than 100 times, and then I find among them the team with the highest entropy. As a result of the analysis, the ggvG $ '' zy command was selected , which was executed 246 times in 45 days. The command is executed using 11 rather clumsy keystrokes and copies the entire current buffer to register z . I usually use this command to move the entire contents of one buffer to another. Of course, I added a new shortcut to my config

nnoremap <leader>ya ggvG$"zy

findings

My vim croquet match identified 3 optimizations to reduce the number of keystrokes in vim:

Using ^ U ^ D navigation commands instead of ^ E ^ Y
Prevent auto-collapse text in buffer to avoid zR
Creating a shortcut for the verbose ggvG $ '' zy command

These 3 simple changes saved me from thousands of unnecessary keystrokes every month.

The parts of the code that I presented above are a bit isolated and can be difficult to use. To make the steps of my analysis clearer, I bring in a Makefile that shows how the code contained in my article fits together.

SHELL           := /bin/bash
LOG             := ~/.vimlog
CMDS            := normal_cmds.txt
FRQS            := frequencies.txt
ENTS            := entropy.txt
LEXER_SRC       := lexer.hs
LEXER_OBJS      := lexer.{o,hi}
LEXER_BIN       := lexer
H               := entropy.py
UTF             := iconv -f iso-8859-1 -t utf-8
.PRECIOUS: $(LOG)
.PHONY: all entropy clean distclean
all: $(LEXER_BIN) $(CMDS) $(FRQS) entropy
$(LEXER_BIN): $(LEXER_SRC)
    ghc --make $^
$(CMDS): $(LEXER_BIN)
    cat $(LOG) | $(UTF) | ./$^ > $@
$(FRQS): $(H) $(LOG) $(CMDS)
    sort $(CMDS) | uniq -c | sort -nr | sed "s/^[ \t]*//" | \
      awk 'BEGIN{OFS="\t";}{if ($$1>100) print NR,$$1,$$2}' > $@
entropy: $(H) $(FRQS)
    cut -f3 $(FRQS) | ./$(H)
clean:
    @- $(RM) $(LEXER_OBJS) $(LEXER_BIN) $(CMDS) $(FRQS) $(ENTS)
distclean: clean

Tags: