Retrieving a list of files in a remote repository

    Somehow I needed to view a list of files in a remote repository. I somehow did not really want to clone it at the same time. A search on the Internet, as expected, gave a lot of answers like "it's impossible, make a clone." And I just need to make sure that a link contains a repository corresponding to some archive with source codes. Since “some link” is on the page with the description of the contents of this archive (more precisely, additions to this archive), it seemed to me sufficient to compare only the list of files. How to be
    Of course, Mercurial provides virtually no remote repository capabilities. More precisely, you can do push and pull (well, clone as a special case of the latter). But is it possible to pull without affecting the file system? Answer: you can help us herehg incoming. Actually, the operation algorithm is as follows:
    1. Create somewhere a new empty repository. An empty repository can be made pullfrom any repository.
    2. Using hg incomingget a list of changes. Since it hg incominguses the same functions as it is hg log, we are not limited in the possibilities of changing its output. In particular, you can get a list of all files changed in each revision, or even the changes themselves in the format unified diff(with extensions gitfor binary files). We don’t need Diff, but the list of all changed files is useful.
    3. Since we get all the revisions, along the way, you can add a list of children to each change in addition to the list of rodily. The absence of children who are not the ancestors of the audit, the list of files in which we are interested, does not bother us.
    4. We have a mercurial one revision, which is always present in any repository and that is the only one really does not have any parent: -1:0000000000000000000000000000000000000000. This is a good starting point.
      Starting with this revision, we find the list of files in all other revisions (the list of files in the initial revision is known: it is empty). For this
      1. For each revision, except the initial one, we take a list of files from the first parent. Revision cost from parents to children.
      2. Add a list of added files to this list (you will get it if you use hg incoming --style xml --verbose: in the tag paths).
      3. We remove the list of deleted files from this list (it turns out in the same place).

    5. Now we find a revision that does not have a single descendant. This will be the audit requested with help hg incoming --rev revspec. Having found this revision, we will list the files in it.


    I note that output hg incomingwith the default format cannot be used for such an operation. We must either write your template {file_adds}, {file_mods}and {file_dels}either ready to take: --style xml. The key --templatehere will not help you. Writing your own format will greatly reduce the code compared to using the sax parser for XML, but I preferred to take it --style xml.

    Actually, the code itself
    #!/usr/bin/env python# vim: fileencoding=utf-8from __future__ import unicode_literals, division
    from xml import sax
    from subprocess import check_call, Popen, PIPE
    from shutil import rmtree
    from tempfile import mkdtemp
    classMercurialRevision(object):
        __slots__ = ('rev', 'hex',
                     'tags', 'bookmarks', 'branch',
                     'parents', 'children',
                     'added', 'removed', 'modified',
                     'copies',
                     'files',)
        def__init__(self, rev, hex):
            self.rev = rev
            self.hex = hex
            self.parents = []
            self.children = []
            self.added = set()
            self.removed = set()
            self.modified = set()
            self.copies = {}
            self.tags = set()
            self.bookmarks = set()
            self.branch = None
            self.files = set()
        def__str__(self):return'<revision>'.format(hex=self.hex, rev=self.rev)
        def__repr__(self):return'{0}({rev!r}, {hex!r})'.format(self.__class__.__name__, hex=self.hex, rev=self.rev)
        def__hash__(self):return int(self.hex, 16)
    classMercurialHandler(sax.handler.ContentHandler):defstartDocument(self):
            self.curpath = []
            self.currev = None
            nullrev = MercurialRevision(-1, '0' * 40)
            self.revisions_rev = {nullrev.rev : nullrev}
            self.revisions_hex = {nullrev.hex : nullrev}
            self.tags = {}
            self.bookmarks = {}
            self.characters_fun = None
            self.last_data = Nonedefadd_tag(self, tag):
            self.currev.tags.add(tag)
            self.tags[tag] = self.currev
        defadd_bookmark(self, bookmark):
            self.currev.bookmarks.add(bookmark)
            self.bookmarks[bookmark] = self.currev
        defcharacters(self, data):if self.characters_fun:
                ifnot self.last_data:
                    self.last_data = data
                else:
                    self.last_data += data
        defstartElement(self, name, attributes):if name == 'log':
                assertnot self.curpath
                assertnot self.currev
            elif name == 'logentry':
                assert self.curpath == ['log']
                assertnot self.currev
                self.currev = MercurialRevision(int(attributes['revision']), attributes['node'])
            else:
                assert self.currev
                if name == 'tag':
                    assert self.curpath[-1] == 'logentry'
                    self.characters_fun = self.add_tag
                elif name == 'bookmark':
                    assert self.curpath[-1] == 'logentry'
                    self.characters_fun = self.add_bookmark
                elif name == 'parent':
                    assert self.curpath[-1] == 'logentry'
                    self.currev.parents.append(self.revisions_hex[attributes['node']])
                elif name == 'branch':
                    assert self.curpath[-1] == 'logentry'
                    self.characters_fun = lambda branch: self.currev.__setattr__('branch', branch)
                elif name == 'path':
                    assert self.curpath[-1] == 'paths'if attributes['action'] == 'M':
                        self.characters_fun = self.currev.modified.add
                    elif attributes['action'] == 'A':
                        self.characters_fun = self.currev.added.add
                    elif attributes['action'] == 'R':
                        self.characters_fun = self.currev.removed.add
                elif name == 'copy':
                    assert self.curpath[-1] == 'copies'
                    self.characters_fun = (lambda destination, source=attributes['source']:
                            self.currev.copies.__setitem__(source, destination))
            self.curpath.append(name)
        defendElement(self, name):assert self.curpath or self.curpath[-1] == ['log']
            assert self.curpath[-1] == name
            if name == 'logentry':
                ifnot self.currev.parents:
                    self.currev.parents.append(self.revisions_rev[self.currev.rev - 1])
                for parent in self.currev.parents:
                    parent.children.append(self.currev)
                self.revisions_hex[self.currev.hex] = self.currev
                self.revisions_rev[self.currev.rev] = self.currev
                self.currev = Noneif self.last_data isNone:
                if self.characters_fun:
                    self.characters_fun('')
            else:
                assert self.characters_fun
                self.characters_fun(self.last_data)
                self.characters_fun = None
                self.last_data = None
            self.curpath.pop()
        defexport_result(self):
            heads = {revision for revision in self.revisions_hex.values()
                     ifnot revision.children
                        or all(child.branch != revision.branch for child in revision.children)}
            # heads contains the same revisions as `hg heads --closed`
            tips = {head for head in heads ifnot head.children}
            return {
                'heads': heads,
                'tips': tips,
                'tags': self.tags,
                'bookmarks': self.bookmarks,
                'revisions_hex': self.revisions_hex,
                'revisions_rev': self.revisions_rev,
                'root': self.revisions_rev[-1],
            }
    classMercurialRemoteParser(object):
        __slots__ = ('parser', 'handler', 'tmpdir')
        def__init__(self, tmpdir=None):
            self.parser = sax.make_parser()
            self.handler = MercurialHandler()
            self.parser.setContentHandler(self.handler)
            self.tmpdir = tmpdir or mkdtemp(suffix='.hg')
            self.init_tmpdir()
        definit_tmpdir(self):
            check_call(['hg', 'init', self.tmpdir])
        defdelete_tmpdir(self):if self.tmpdir and rmtree:
                rmtree(self.tmpdir)
        __del__ = delete_tmpdir
        def__enter__(self):return self
        def__exit__(self, *args, **kwargs):
            self.delete_tmpdir()
        @staticmethoddefgenerate_files(parsing_result):
            toprocess = [parsing_result['root']]
            processed = set()
            while toprocess:
                revision = toprocess.pop(0)
                if revision.parents:
                    # Inherit files from the first parentassertnot revision.files
                    if revision.parents[0] notin processed:
                        assert toprocess
                        toprocess.append(revision)
                        continue
                    revision.files.update(revision.parents[0].files)
                    # Then apply delta found in logassertnot (revision.files & revision.added)
                    revision.files.update(revision.added)
                    assert revision.files >= revision.removed
                    revision.files -= revision.removed
                    assert revision.files >= revision.modified, (
                            'Expected to find the following files: ' + ','.join(
                                file for file in revision.modified ifnot file in revision.files))
                processed.add(revision)
                toprocess.extend(child for child in revision.children
                                 ifnot child in processed andnot child in toprocess)
            assert set(parsing_result['revisions_rev'].values()) == processed
            return parsing_result
        defparse_url(self, url, rev_name=None):
            p = Popen(['hg', '--repository', self.tmpdir,
                             'incoming', '--style', 'xml', '--verbose', url,
                      ] + (['--rev', rev_name] if rev_name else []),
                      stdout=PIPE)
            p.stdout.readline()  # Skip “comparing with {url}” header
            self.parser.parse(p.stdout)
            parsing_result = self.handler.export_result()
            self.generate_files(parsing_result)
            return parsing_result
    if __name__ == '__main__':
        import sys
        defprint_files(revision):for file in revision.files:
                print file
        remote_url = sys.argv[1]
        rev_name = sys.argv[2]
        with MercurialRemoteParser() as remote_parser:
            parsing_result = remote_parser.parse_url(remote_url, rev_name=rev_name)
            assert len(parsing_result['tips']) == 1, 'Found more then one head'
            print_files(next(iter(parsing_result['tips'])))
    # vim: tw=100 ft=python ts=4 sts=4 sw=4
    </revision>


    Use: python -O list_hg_files.py bitbucket.org/ZyX_I/aurum tip. Both arguments (remote repository URL and revision designator) are required.


    Also popular now: