Retrieving a list of files in a remote repository
Somehow I needed to view a list of files in a remote repository. I somehow did not really want to clone it at the same time. A search on the Internet, as expected, gave a lot of answers like "it's impossible, make a clone." And I just need to make sure that a link contains a repository corresponding to some archive with source codes. Since “some link” is on the page with the description of the contents of this archive (more precisely, additions to this archive), it seemed to me sufficient to compare only the list of files. How to be
Of course, Mercurial provides virtually no remote repository capabilities. More precisely, you can do push and pull (well, clone as a special case of the latter). But is it possible to pull without affecting the file system? Answer: you can help us here
I note that output
Use:
Of course, Mercurial provides virtually no remote repository capabilities. More precisely, you can do push and pull (well, clone as a special case of the latter). But is it possible to pull without affecting the file system? Answer: you can help us here
hg incoming
. Actually, the operation algorithm is as follows:- Create somewhere a new empty repository. An empty repository can be made
pull
from any repository. - Using
hg incoming
get a list of changes. Since ithg incoming
uses the same functions as it ishg log
, we are not limited in the possibilities of changing its output. In particular, you can get a list of all files changed in each revision, or even the changes themselves in the formatunified diff
(with extensionsgit
for binary files). We don’t need Diff, but the list of all changed files is useful. - Since we get all the revisions, along the way, you can add a list of children to each change in addition to the list of rodily. The absence of children who are not the ancestors of the audit, the list of files in which we are interested, does not bother us.
- We have a mercurial one revision, which is always present in any repository and that is the only one really does not have any parent:
-1:0000000000000000000000000000000000000000
. This is a good starting point.
Starting with this revision, we find the list of files in all other revisions (the list of files in the initial revision is known: it is empty). For this- For each revision, except the initial one, we take a list of files from the first parent. Revision cost from parents to children.
- Add a list of added files to this list (you will get it if you use
hg incoming --style xml --verbose
: in the tagpaths
). - We remove the list of deleted files from this list (it turns out in the same place).
- Now we find a revision that does not have a single descendant. This will be the audit requested with help
hg incoming --rev revspec
. Having found this revision, we will list the files in it.
I note that output
hg incoming
with the default format cannot be used for such an operation. We must either write your template {file_adds}
, {file_mods}
and {file_dels}
either ready to take: --style xml
. The key --template
here will not help you. Writing your own format will greatly reduce the code compared to using the sax parser for XML, but I preferred to take it --style xml
.Actually, the code itself
#!/usr/bin/env python# vim: fileencoding=utf-8from __future__ import unicode_literals, division
from xml import sax
from subprocess import check_call, Popen, PIPE
from shutil import rmtree
from tempfile import mkdtemp
classMercurialRevision(object):
__slots__ = ('rev', 'hex',
'tags', 'bookmarks', 'branch',
'parents', 'children',
'added', 'removed', 'modified',
'copies',
'files',)
def__init__(self, rev, hex):
self.rev = rev
self.hex = hex
self.parents = []
self.children = []
self.added = set()
self.removed = set()
self.modified = set()
self.copies = {}
self.tags = set()
self.bookmarks = set()
self.branch = None
self.files = set()
def__str__(self):return'<revision>'.format(hex=self.hex, rev=self.rev)
def__repr__(self):return'{0}({rev!r}, {hex!r})'.format(self.__class__.__name__, hex=self.hex, rev=self.rev)
def__hash__(self):return int(self.hex, 16)
classMercurialHandler(sax.handler.ContentHandler):defstartDocument(self):
self.curpath = []
self.currev = None
nullrev = MercurialRevision(-1, '0' * 40)
self.revisions_rev = {nullrev.rev : nullrev}
self.revisions_hex = {nullrev.hex : nullrev}
self.tags = {}
self.bookmarks = {}
self.characters_fun = None
self.last_data = Nonedefadd_tag(self, tag):
self.currev.tags.add(tag)
self.tags[tag] = self.currev
defadd_bookmark(self, bookmark):
self.currev.bookmarks.add(bookmark)
self.bookmarks[bookmark] = self.currev
defcharacters(self, data):if self.characters_fun:
ifnot self.last_data:
self.last_data = data
else:
self.last_data += data
defstartElement(self, name, attributes):if name == 'log':
assertnot self.curpath
assertnot self.currev
elif name == 'logentry':
assert self.curpath == ['log']
assertnot self.currev
self.currev = MercurialRevision(int(attributes['revision']), attributes['node'])
else:
assert self.currev
if name == 'tag':
assert self.curpath[-1] == 'logentry'
self.characters_fun = self.add_tag
elif name == 'bookmark':
assert self.curpath[-1] == 'logentry'
self.characters_fun = self.add_bookmark
elif name == 'parent':
assert self.curpath[-1] == 'logentry'
self.currev.parents.append(self.revisions_hex[attributes['node']])
elif name == 'branch':
assert self.curpath[-1] == 'logentry'
self.characters_fun = lambda branch: self.currev.__setattr__('branch', branch)
elif name == 'path':
assert self.curpath[-1] == 'paths'if attributes['action'] == 'M':
self.characters_fun = self.currev.modified.add
elif attributes['action'] == 'A':
self.characters_fun = self.currev.added.add
elif attributes['action'] == 'R':
self.characters_fun = self.currev.removed.add
elif name == 'copy':
assert self.curpath[-1] == 'copies'
self.characters_fun = (lambda destination, source=attributes['source']:
self.currev.copies.__setitem__(source, destination))
self.curpath.append(name)
defendElement(self, name):assert self.curpath or self.curpath[-1] == ['log']
assert self.curpath[-1] == name
if name == 'logentry':
ifnot self.currev.parents:
self.currev.parents.append(self.revisions_rev[self.currev.rev - 1])
for parent in self.currev.parents:
parent.children.append(self.currev)
self.revisions_hex[self.currev.hex] = self.currev
self.revisions_rev[self.currev.rev] = self.currev
self.currev = Noneif self.last_data isNone:
if self.characters_fun:
self.characters_fun('')
else:
assert self.characters_fun
self.characters_fun(self.last_data)
self.characters_fun = None
self.last_data = None
self.curpath.pop()
defexport_result(self):
heads = {revision for revision in self.revisions_hex.values()
ifnot revision.children
or all(child.branch != revision.branch for child in revision.children)}
# heads contains the same revisions as `hg heads --closed`
tips = {head for head in heads ifnot head.children}
return {
'heads': heads,
'tips': tips,
'tags': self.tags,
'bookmarks': self.bookmarks,
'revisions_hex': self.revisions_hex,
'revisions_rev': self.revisions_rev,
'root': self.revisions_rev[-1],
}
classMercurialRemoteParser(object):
__slots__ = ('parser', 'handler', 'tmpdir')
def__init__(self, tmpdir=None):
self.parser = sax.make_parser()
self.handler = MercurialHandler()
self.parser.setContentHandler(self.handler)
self.tmpdir = tmpdir or mkdtemp(suffix='.hg')
self.init_tmpdir()
definit_tmpdir(self):
check_call(['hg', 'init', self.tmpdir])
defdelete_tmpdir(self):if self.tmpdir and rmtree:
rmtree(self.tmpdir)
__del__ = delete_tmpdir
def__enter__(self):return self
def__exit__(self, *args, **kwargs):
self.delete_tmpdir()
@staticmethoddefgenerate_files(parsing_result):
toprocess = [parsing_result['root']]
processed = set()
while toprocess:
revision = toprocess.pop(0)
if revision.parents:
# Inherit files from the first parentassertnot revision.files
if revision.parents[0] notin processed:
assert toprocess
toprocess.append(revision)
continue
revision.files.update(revision.parents[0].files)
# Then apply delta found in logassertnot (revision.files & revision.added)
revision.files.update(revision.added)
assert revision.files >= revision.removed
revision.files -= revision.removed
assert revision.files >= revision.modified, (
'Expected to find the following files: ' + ','.join(
file for file in revision.modified ifnot file in revision.files))
processed.add(revision)
toprocess.extend(child for child in revision.children
ifnot child in processed andnot child in toprocess)
assert set(parsing_result['revisions_rev'].values()) == processed
return parsing_result
defparse_url(self, url, rev_name=None):
p = Popen(['hg', '--repository', self.tmpdir,
'incoming', '--style', 'xml', '--verbose', url,
] + (['--rev', rev_name] if rev_name else []),
stdout=PIPE)
p.stdout.readline() # Skip “comparing with {url}” header
self.parser.parse(p.stdout)
parsing_result = self.handler.export_result()
self.generate_files(parsing_result)
return parsing_result
if __name__ == '__main__':
import sys
defprint_files(revision):for file in revision.files:
print file
remote_url = sys.argv[1]
rev_name = sys.argv[2]
with MercurialRemoteParser() as remote_parser:
parsing_result = remote_parser.parse_url(remote_url, rev_name=rev_name)
assert len(parsing_result['tips']) == 1, 'Found more then one head'
print_files(next(iter(parsing_result['tips'])))
# vim: tw=100 ft=python ts=4 sts=4 sw=4
</revision>
Use:
python -O list_hg_files.py bitbucket.org/ZyX_I/aurum tip
. Both arguments (remote repository URL and revision designator) are required.