Mass recording from cameras in elections - 2
Habr - not for politics. This article deals exclusively with the technical aspects of implementing a specific software solution. For the common good, please refuse any political debate, speeches, agitation, or similar actions in the comments. In addition, please do not use the acquired knowledge for destructive purposes, do not start to backup the entire video archive without special need, and so on. Thanks.
September 8, single voting day. This year, the general public is invited to observe the election of the capital's mayor via the Internet. A number of citizens consider it interesting for them to record a picture from cameras: someone has a politically motivated interest, while most are simply banal curious to look at themselves and those familiar with the eyes of the Internet. This article is intended to demonstrate the principles of the current system and propose working concepts.
Since the last election, the system has changed a little (otherwise there would be no article), so first we recall how everything worked before and how it began to work now. So, each camera has a unique uidand the pool of servers from which the video is streaming. Having formed a special request using these data, you can get a link to a piece of video recorded by the selected camera.
First, find the data on all existing cameras. The following method seemed to me the simplest: we start searching by the site number, from 1 to 3800. To do this, send GET vybory.mos.ru/json/id_search aaa / bbb .json, where bbb is uid and aaa is len ( bbb ). For example, vybory.mos.ru/json/id_search/1/3.json
Get json with information about this site, something like this:
Of particular interest here is id . We will send a GET of the form vybory.mos.ru/account/channels?station_id= id , in this case vybory.mos.ru/account/channels?station_id=7933
In the response we will get a line with crackers that my editor swears, but containing hashes inside cameras and server addresses. We extract the hashes with the regular form
\ $ ([0-9a-h] {8} - [0-9a-h] {4} - [0-9a-h] {4} - [0-9a-h] {4 } - [0-9a-h] {12}) and ip addresses of the regular form . *? (\ D {1,3} \. \ D {1,3} \. \ D {1,3} \. \ d {1,3})
As a result, we obtain the required information about the cameras of the current section:
2e9dd8dc-edd4-11e2-9a6b-f0def1c0f84c 188.254.112.2 188.254.112.3 188.254.112.4
2ea32990-edd4-11e2-9a6b-f0def1c0f84c 188.254.112.2 188.254.112.3 188.254.112.4
Next, the nuances begin. There are three types of cameras: old, new and absent. How they differ, I’ll tell you a little later, first we’ll figure out how to distinguish them, but it’s very easy to distinguish them - you need to send a GET of the form http: // SERVER /master.m3u8?cid= UID The
new camera will return something like
The old camera will return something of this kind:
The missing camera will not return anything but the 404 CID Was Not Found :)
Now that we are able to get information about the cameras in a particular area, we will write a multi-threaded parcel that will collect all the necessary information for us. I prefer to store data in a free Mongolab, but you can get by with the usual shelve. Knowing that there are 3500+ sites in Moscow, we’ll go through a cycle from 1 to 3800. Below is a sketched out leg, but a working code nonetheless. In it, of course, you need to enter your cookie and passwords from the monga server.
Now we have a fully assembled base of cameras. At the time of this writing, there were 544 old cameras, alas, they can only work with them in the old way .
But now we have 5778 new cameras, and they have one feature. Chunks from old cameras fade out after a very short time - you need to constantly download a fresh playlist, tear out links to chunks from there and download them until they are rotten. New cameras lack this flaw. You can download chunks of arbitrary sizes for an arbitrary period of time by sending a GET of the form http: // SERVER /segment.ts?cid= UID & var = orig & ts = BEGIN - END Between BEGIN and ENDmaybe not 15 seconds, but much more. I settled on chunks lasting 5 minutes. In fact, you can specify at least an hour, but in some cases, as far as I can tell, if the broadcast was interrupted during the limits of the chunk, the entire chunk will not be downloaded. Roughly speaking, if you try to download 8 hours from the archive by chunks for an hour and at the same time, within a few minutes, there is practically no broadcast chunk, the entire hour chunk will not be downloaded. Therefore, it is wise to choose a smaller chunk. The gurus of algorithms (which, as we recall, 10%) can write their binary search, so that not a second of video is lost =)
By the way, in order to close the question - the missing camera is a camera that is registered in the portal, but in fact does not work.
We automate the download process. Here you could block your multi-threaded bike on a python, but I decided to use third-party software. We will generate a metafile with links to chunks for aria2c , metafiles for tsmuxer and run them sequentially.
For example, something like this:
Again, the code was written solely for the purpose of checking the concept and is not an example of compliance with PEP8, but it works. Download speed for obvious reasons depends on many factors.
UPD There is an opinion that the old cameras are systematically replaced with new ones. Last night there were 337 old and 5776 new, this morning - 273 old, 5811 new.
UPD It turns out that there is also webvybory2013.ru , there is also a picture coming from other elections. Everything that is written in this article applies to them, only the domain needs to be changed.
UPD Cameras are constantly changing their status, pay attention to this. With the old system are replaced with new ones.
September 8, single voting day. This year, the general public is invited to observe the election of the capital's mayor via the Internet. A number of citizens consider it interesting for them to record a picture from cameras: someone has a politically motivated interest, while most are simply banal curious to look at themselves and those familiar with the eyes of the Internet. This article is intended to demonstrate the principles of the current system and propose working concepts.
Since the last election, the system has changed a little (otherwise there would be no article), so first we recall how everything worked before and how it began to work now. So, each camera has a unique uidand the pool of servers from which the video is streaming. Having formed a special request using these data, you can get a link to a piece of video recorded by the selected camera.
First, find the data on all existing cameras. The following method seemed to me the simplest: we start searching by the site number, from 1 to 3800. To do this, send GET vybory.mos.ru/json/id_search aaa / bbb .json, where bbb is uid and aaa is len ( bbb ). For example, vybory.mos.ru/json/id_search/1/3.json
Get json with information about this site, something like this:
[{"id":7933,"name":"Участок избирательной комиссии №3","num":"3","location_id":1162,"address":"Новый Арбат, 36/9","raw_address":"г.Москва, Новый Арбат ул., дом 36/9","is_standalone":false,"size":null,"location":{"id":1162,"address":"Россия, Москва, улица Новый Арбат, 36/9","raw_address":"г.Москва, Новый Арбат ул., дом 36/9","district_id":1,"area_id":null,"sub_area_id":null,"locality_id":1,"street_id":1590,"lat":55.753266,"lon":37.577301,"max_zoom":17}}]
Of particular interest here is id . We will send a GET of the form vybory.mos.ru/account/channels?station_id= id , in this case vybory.mos.ru/account/channels?station_id=7933
In the response we will get a line with crackers that my editor swears, but containing hashes inside cameras and server addresses. We extract the hashes with the regular form
\ $ ([0-9a-h] {8} - [0-9a-h] {4} - [0-9a-h] {4} - [0-9a-h] {4 } - [0-9a-h] {12}) and ip addresses of the regular form . *? (\ D {1,3} \. \ D {1,3} \. \ D {1,3} \. \ d {1,3})
As a result, we obtain the required information about the cameras of the current section:
2e9dd8dc-edd4-11e2-9a6b-f0def1c0f84c 188.254.112.2 188.254.112.3 188.254.112.4
2ea32990-edd4-11e2-9a6b-f0def1c0f84c 188.254.112.2 188.254.112.3 188.254.112.4
Next, the nuances begin. There are three types of cameras: old, new and absent. How they differ, I’ll tell you a little later, first we’ll figure out how to distinguish them, but it’s very easy to distinguish them - you need to send a GET of the form http: // SERVER /master.m3u8?cid= UID The
new camera will return something like
# EXTM3U
# EXT-X-VERSION: 2
# EXT-X-STREAM-INF: PROGRAM-ID = 777, BANDWIDTH = 3145728
/variant.m3u8?cid=e1164950-0c19-11e3-803b-00163ebf8df9&var=orig
The old camera will return something of this kind:
# EXTM3U
# EXT-X-MEDIA-SEQUENCE: 136
# EXT-X-TARGETDURATION: 15
# EXT-X-ALLOW-CACHE: NO
# EXT-X-PROGRAM-DATE-TIME: 2013-09-04T12: 05: 40Z
#EXTINF: 15,
/segment.ts?cid=2ea32990-edd4-11e2-9a6b-f0def1c0f84c&var=orig&ts=1378296340.93-1378296355.93
#EXTINF: 15,
/segment.ts?cid=2ea32990-eddbf11-11ff-11f-11bf-11f-11fc-11fc-11fcfc = 1378296355.93-1378296370.93
#EXTINF: 15,
/segment.ts?cid=2ea32990-edd4-11e2-9a6b-f0def1c0f84c&var=orig&ts=1378296370.93-1378296385.93
#EXTINF: 15, /
se2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e1e -f0def1c0f84c & var = orig & ts = 1378296385.93-1378296400.93
The missing camera will not return anything but the 404 CID Was Not Found :)
Now that we are able to get information about the cameras in a particular area, we will write a multi-threaded parcel that will collect all the necessary information for us. I prefer to store data in a free Mongolab, but you can get by with the usual shelve. Knowing that there are 3500+ sites in Moscow, we’ll go through a cycle from 1 to 3800. Below is a sketched out leg, but a working code nonetheless. In it, of course, you need to enter your cookie and passwords from the monga server.
# -*- coding: utf-8 -*-
import json, re
import httplib
import threading
from time import sleep
import Queue
from pymongo import MongoClient
client = MongoClient('mongodb://admin:кусь@кусь.mongolab.com:43368/elections')
db = client['elections']
data = db['data']
data.drop()
def get_data(uid):
print uid
headers = {'Origin': 'vybory.mos.ru',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0);',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Accept': '*/*',
'Referer': 'http://vybory.mos.ru/',
'Accept-Encoding': 'deflate,sdch',
'Accept-Language': 'ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4',
'Accept-Charset': 'windows-1251,utf-8;q=0.7,*;q=0.3',
'Cookie': 'rack.session=кусь'
}
try:
conn = httplib.HTTPConnection('vybory.mos.ru')
conn.request('GET', '/json/id_search/%d/%d.json'%(len(str(uid)), uid), None,headers)
resp = conn.getresponse()
try:
content = json.loads(resp.read())[0]
conn.request('GET', '/account/channels?station_id=%s'%content['id'], None,headers)
resp = conn.getresponse()
cont = resp.read()
cnt=0
for i in cont.split('\x00')[1:]:
cnt+=1
uid=re.findall(r'\$([0-9a-h]{8}-[0-9a-h]{4}-[0-9a-h]{4}-[0-9a-h]{4}-[0-9a-h]{12})', i)[0]
ip=re.findall(r'.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', i)
conn2 = httplib.HTTPConnection('%s'%ip[0])
conn2.request('GET', '/master.m3u8?cid=%s'%(uid), None,headers)
info = conn2.getresponse().read()
conn2.close()
if '/segment.ts' in info:
camtype='old'
elif '/variant.m3u8' in info:
camtype='new'
else:
camtype='nil'
#print content
data.insert({
'name':content['name'],
'num':content['num'],
'addr':content['address'],
'uid':uid,
'ip':ip,
'cnt':str(cnt),
'type':camtype
})
except Exception,e:
pass
except Exception,e:
print e
conn.close()
queue = Queue.Queue()
def repeat():
while True:
try:
item = queue.get_nowait()
except Queue.Empty:
break
get_data(item)
sleep(0.01)
queue.task_done()
for i in xrange(1, 3800):
queue.put(i)
for i in xrange(10):
t = threading.Thread(target=repeat)
t.start()
queue.join()
print data.find().count(),'all cams'
print data.find({'type':'nil'}).count(),'offline cams'
print data.find({'type':'old'}).count(),'old cams'
print data.find({'type':'new'}).count(),'new cams'
Now we have a fully assembled base of cameras. At the time of this writing, there were 544 old cameras, alas, they can only work with them in the old way .
But now we have 5778 new cameras, and they have one feature. Chunks from old cameras fade out after a very short time - you need to constantly download a fresh playlist, tear out links to chunks from there and download them until they are rotten. New cameras lack this flaw. You can download chunks of arbitrary sizes for an arbitrary period of time by sending a GET of the form http: // SERVER /segment.ts?cid= UID & var = orig & ts = BEGIN - END Between BEGIN and ENDmaybe not 15 seconds, but much more. I settled on chunks lasting 5 minutes. In fact, you can specify at least an hour, but in some cases, as far as I can tell, if the broadcast was interrupted during the limits of the chunk, the entire chunk will not be downloaded. Roughly speaking, if you try to download 8 hours from the archive by chunks for an hour and at the same time, within a few minutes, there is practically no broadcast chunk, the entire hour chunk will not be downloaded. Therefore, it is wise to choose a smaller chunk. The gurus of algorithms (which, as we recall, 10%) can write their binary search, so that not a second of video is lost =)
By the way, in order to close the question - the missing camera is a camera that is registered in the portal, but in fact does not work.
We automate the download process. Here you could block your multi-threaded bike on a python, but I decided to use third-party software. We will generate a metafile with links to chunks for aria2c , metafiles for tsmuxer and run them sequentially.
For example, something like this:
# -*- coding: utf-8 -*-
from time import sleep, time
from pymongo import MongoClient
import os
import subprocess
import shutil
#Корневая папка, куда будем складировать чанки
directory='e:/dumps'
#Размер чанка
delta=300
#Номер избирательного участка
num='666'
client = MongoClient('mongodb://кусь:кусь@кусь.mongolab.com:43368/elections')
db = client['elections']
data = db['data']
#Качать видео за последние 8 часов
start=int(time())-3600*8
#Создаем папку для дампов с избирательного участка
try:
os.mkdir('%s/%s'%(directory,num))
except:
pass
#Лезем в базу и достаем оттуда информацию о камерах с участка
for i in data.find({'num':num}):
if i['type']=='nil':
print 'Offline camera',i['uid']
elif i['type']=='old':
print 'Old camera',i['uid']
else:
print 'New camera',i['uid']
f=open('links-%s-%s.txt'%(num, i['cnt']),'w')
#Создаем поддиректории для каждой камеры
try:
os.mkdir('%s/%s/%s'%(directory,num,i['cnt']))
except:
pass
cur=start
files=''
#Генерируем ссылки на чанки выбранной длины
while True:
if cur+delta>time():
for ip in i['ip']:
url = 'http://{0}/segment.ts?cid={1}&var=orig&ts={2}.00-{3}'.format(ip,
i['uid'],
cur, time())
f.write('%s\t'%url)
f.write('\n dir={0}/{1}/{2}\n out={3}.ts\n'.format(directory,num,i['cnt'],url[-27:]))
files += '"{0}/{1}/{2}/{3}.ts"+'.format(directory,num,i['cnt'],url[-27:])
break
else:
for ip in i['ip']:
url = 'http://{0}/segment.ts?cid={1}&var=orig&ts={2}.00-{3}.00'.format(ip,
i['uid'],
cur, cur+delta)
f.write('%s\t'%url)
f.write('\n dir={0}/{1}/{2}\n out={3}.ts\n'.format(directory,num,i['cnt'],url[-27:]))
files += '"{0}/{1}/{2}/{3}.ts"+'.format(directory,num,i['cnt'],url[-27:])
cur+=delta
#Генерируем метафайл для склеивания чанков в один большой файл.
m=open('%s-%s.meta'%(num,i['cnt']),'w')
m.write('MUXOPT --no-pcr-on-video-pid --new-audio-pes --vbr --vbv-len=500\n')
m.write('V_MPEG4/ISO/AVC, %s, fps=23.976, insertSEI, contSPS, track=3300\n'%files[:-1])
m.write('A_AAC, %s, timeshift=-20ms, track=3301\n'%files[:-1])
m.close()
f.close()
subprocess.Popen('aria2c.exe -i links-%s-%s.txt -d %s -x 16'%(num, i['cnt'], directory), shell=True).communicate()
subprocess.Popen('tsMuxeR.exe %s-%s.meta %s/%s-%s.ts\n'%(num, i['cnt'], directory, num,i['cnt']), shell=True).communicate()
shutil.rmtree('%s/%s'%(directory,num))
os.remove('%s-%s.meta'%(num, i['cnt']))
os.remove('links-%s-%s.txt'%(num, i['cnt']))
Again, the code was written solely for the purpose of checking the concept and is not an example of compliance with PEP8, but it works. Download speed for obvious reasons depends on many factors.
UPD There is an opinion that the old cameras are systematically replaced with new ones. Last night there were 337 old and 5776 new, this morning - 273 old, 5811 new.
UPD It turns out that there is also webvybory2013.ru , there is also a picture coming from other elections. Everything that is written in this article applies to them, only the domain needs to be changed.
UPD Cameras are constantly changing their status, pay attention to this. With the old system are replaced with new ones.