Capturing video from network cameras, part 2

  • Tutorial

In my first article, "Measuring the distance to an object and its speed," I examined the capture of images from webcams through Video4Linux2 and through DirectX. In the next article, “Capturing Video from Network Cameras, Part 1,” I examined how to work with network Motion-JPEG cameras. Now I’ll tell you about capturing images from RTSP network cameras, in particular, Motion-JPEG over RTSP.

This task is more complicated than Motion-JPEG over HTTP, since more actions, more connections are needed, but in return we get more flexibility, speed, functionality and even some kind of versatility. Honestly, RTSP is redundant for simple tasks, but I have no doubt that there are situations where it will be needed.

What is RTSP?


RTSP stands for Real Time Streaming Protocol - a real-time streaming protocol - in fact it is a broadcast control protocol, it allows you to execute several commands, such as “start”, “stop”, “transition to a specific time”. This protocol is similar to HTTP in the implementation, there are also headers, everything is also transmitted in text form. Here are the main commands from the specification :
  • OPTIONS - returns a list of supported methods (OPTIONS, DESCRIBE, etc.);
  • DESCRIBE - request for a description of the content, describes each track in SDP format ;
  • SETUP - a request to establish connections and transport for streams;
  • PLAY - start broadcasting;
  • TEARDOWN - stop broadcasting.
And the peculiarity of RTSP is that it by itself does not transmit the video data we need! The whole protocol is for communication only. Here you can see the analogy with MVC, there is a separation between the data and its description.

The workhorse is another protocol: RTP- Real-time Transport Protocol - real-time transport protocol. With its help, the data we need is transmitted. It is worth noting that it is very pleasant to work with this protocol, the fact is that it makes it easier for client software to recover data after fragmentation at the data link layer. It also carries a few more useful fields: the format of the transmitted data, the time stamp and the synchronization field (if, for example, audio and video are transmitted simultaneously). Although this protocol can work over TCP, it is usually used with UDP because of its speed-oriented approach. That is, RTP data is a UDP datagram with a header and payload data for media content.

It would seem that we do not need anything else. We are connected via RTSP, we pick up via RTP. But there it was, smart uncles came up with a third protocol: RTCP- Real-time Transport Control Protocol - real-time transport control protocol. This protocol is used to determine the quality of the service, with its help the client and server know how good or bad the content is being transmitted. In accordance with these data, the server, for example, can lower the bitrate or even switch to another codec.

It is accepted that RTP uses an even port number, and RTCP uses the next odd port number.

RTSP Communication Example

I have only one source of RTSP stream - the eVidence APIX Box M1 camera , so all the examples relate to it.

Below is the log of communication between the VLC player (it really helps me a lot in my research) and this camera. The first request from the VLC to port 554 of the camera. The answer is through an empty line and starts with "RTSP / 1.0".

01: OPTIONS rtsp://192.168.0.254/jpeg RTSP/1.0
02: CSeq: 1
03: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24)
04: 
05: RTSP/1.0 200 OK
06: CSeq: 1
07: Date: Fri, Apr 23 2010 19:54:20 GMT
08: Public: OPTIONS, DESCRIBE, SETUP, TEARDOWN, PLAY, PAUSE
09: 
10: DESCRIBE rtsp://192.168.0.254/jpeg RTSP/1.0
11: CSeq: 2
12: Accept: application/sdp
13: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24)
14: 
15: RTSP/1.0 200 OK
16: CSeq: 2
17: Date: Fri, Apr 23 2010 19:54:20 GMT
18: Content-Base: rtsp://192.168.0.254/jpeg/
19: Content-Type: application/sdp
20: Content-Length: 442
21: x-Accept-Dynamic-Rate: 1
22: 
23: v=0
24: o=- 1272052389382023 1 IN IP4 0.0.0.0
25: s=Session streamed by "nessyMediaServer"
26: i=jpeg
27: t=0 0
28: a=tool:LIVE555 Streaming Media v2008.04.09
29: a=type:broadcast
30: a=control:*
31: a=range:npt=0-
32: a=x-qt-text-nam:Session streamed by "nessyMediaServer"
33: a=x-qt-text-inf:jpeg
34: m=video 0 RTP/AVP 26
35: c=IN IP4 0.0.0.0
36: a=control:track1
37: a=cliprect:0,0,720,1280
38: a=framerate:25.000000
39: m=audio 7878 RTP/AVP 0
40: a=rtpmap:0 PCMU/8000/1
41: a=control:track2
42: 
43: 
44: SETUP rtsp://192.168.0.254/jpeg/track1 RTSP/1.0
45: CSeq: 3
46: Transport: RTP/AVP;unicast;client_port=41760-41761
47: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24)
48: 
49: RTSP/1.0 200 OK
50: CSeq: 3
51: Cache-Control: must-revalidate
52: Date: Fri, Apr 23 2010 19:54:20 GMT
53: Transport: RTP/AVP;unicast;destination=192.168.0.4;source=192.168.0.254;client_port=41760-41761;
            server_port=6970-6971
54: Session: 1
55: x-Transport-Options: late-tolerance=1.400000
56: x-Dynamic-Rate: 1
57: 
58: SETUP rtsp://192.168.0.254/jpeg/track2 RTSP/1.0
59: CSeq: 4
60: Transport: RTP/AVP;unicast;client_port=7878-7879
61: Session: 1
62: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24)
63: 
64: RTSP/1.0 200 OK
65: CSeq: 4
66: Cache-Control: must-revalidate
67: Date: Fri, Apr 23 2010 19:54:20 GMT
68: Transport: RTP/AVP;unicast;destination=192.168.0.4;source=192.168.0.254;client_port=7878-7879;
            server_port=6972-6973
69: Session: 1
70: x-Transport-Options: late-tolerance=1.400000
71: x-Dynamic-Rate: 1
72: 
73: PLAY rtsp://192.168.0.254/jpeg/ RTSP/1.0
74: CSeq: 5
75: Session: 1
76: Range: npt=0.000-
77: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24)
78: 
79: RTSP/1.0 200 OK
80: CSeq: 5
81: Date: Fri, Apr 23 2010 19:54:20 GMT
82: Range: npt=0.000-
83: Session: 1
84: RTP-Info: url=rtsp://192.168.0.254/jpeg/track1;seq=20730;
            rtptime=3869319494,url=rtsp://192.168.0.254/jpeg/track2;seq=33509;rtptime=3066362516
85: 
86: # В этот момент начинается передача контента и следующая команда вызывается для остановки вещания
87: 
88: TEARDOWN rtsp://192.168.0.254/jpeg/ RTSP/1.0
89: CSeq: 6
90: Session: 1
91: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24)
92: 
93: RTSP/1.0 200 OK
94: CSeq: 6
95: Date: Fri, Apr 23 2010 19:54:25 GMT

First of all, VLC asks the camera:
- And what can I do with you at all? (OPTIONS)
- Hello to you too. And can you ask me to do any of OPTIONS, DESCRIBE, SETUP, TEARDOWN, PLAY and PAUSE.
- Okay, then tell me what you have for "/ jpeg"? (DESCRIBE)
- Here I have the video in the first track, M-JPEG, and in the second track the audio is simple.
- It is interesting to look at the video, the first track, pour it to me, please in the pocket number 41760, and you can throw off any husks in the pocket number 41761. (SETUP track1)
- OK, at your command ...
- And I also want to listen to the sound, a rash in 7878, 7879 pockets. (SETUP track2)
- Yes, no problem.
- Well, sprinkled. (PLAY)
After a while:
- Okay, enough, I've seen enough. (TEARDOWN)
- As you say.

On this, a small digression ends. In the first request, " OPTIONS rtsp://192.168.0.254/jpeg RTSP/1.0" resembles " GET /jpeg HTTP/1.1" in the sense that the conversation begins with this, and the HTTP protocol also has the OPTIONS method . Here 192.168.0.254 is the IP address of my camera. CSeqreflects the sequence number of the request, the response from the server should contain the same CSeq.

And the response from the server starts with " RTSP/1.0 200 OK", it’s just like " HTTP/1.1 200 OK" - a sign that everything is fine: the request is accepted, the request is clear and there were no problems in its implementation. And in plain text follows a listing of all available methods.

Next, we collect information about what awaits us at the request / jpeg, because we followed him by the link " rtsp://192.168.0.254/jpeg". We also indicate that we want to get the answer in the form of SDP (line 12).

In response, we get an RTSP header with Content-Typeand Content-Length, and after the header through the empty line directly the content itself in SDP format:

v=0
o=- 1272052389382023 1 IN IP4 0.0.0.0
s=Session streamed by "nessyMediaServer"
i=jpeg
t=0 0
a=tool:LIVE555 Streaming Media v2008.04.09
a=type:broadcast
a=control:*
a=range:npt=0-
a=x-qt-text-nam:Session streamed by "nessyMediaServer"
a=x-qt-text-inf:jpeg
m=video 0 RTP/AVP 26
c=IN IP4 0.0.0.0
a=control:track1
a=cliprect:0,0,720,1280
a=framerate:25.000000
m=audio 7878 RTP/AVP 0
a=rtpmap:0 PCMU/8000/1
a=control:track2

Everything is pretty obvious here. We need the following lines:

# Для видео
m=video 0 RTP/AVP 26 # Транспорт потока RTP/AVP, порт любой, видео формат 26, что соответствует Motion-JPEG
a=control:track1 # Название трека
a=cliprect:0,0,720,1280 # Отсюда вытаскиваем разрешение
a=framerate:25.000000 # И частота кадров если нам понадобится
# Для аудио
m=audio 7878 RTP/AVP 0 # Порт 7878, транспорт и формат аудио, 0 - PCM
a=control:track2 # Название трека

If we want to receive only video, then from the audio data we ignore everything except the name of the track. We need it to configure the stream, but no one forces us to accept this stream, however, the camera refuses to work if you completely ignore the audio (if you SETUPonly do it for a video track).

Honestly, I don’t know how different cameras will react if we neglect the port number for the audio stream (7878), because we specify it with the command SETUP.

Next are two requests SETUP, indicating the ports on which we would like to receive video and audio streams. The first is the port for RTP, the second is for RTCP. The camera response contains information about the ports, you can consult them to make sure that everything is configured correctly. We also need to remember the identifierSession. We will have to indicate it in all subsequent calls.

After the command PLAY, video will be transferred to port 41760 and audio to port 7878. And upon command, the TEARDOWNbroadcast stops, the connection is disconnected.

MJPEG over RTP

RTP packets come to us, we need to decrypt them. For this, I will give here a table of such a package with a description of all fields.
+ Bit offset0-1234-789-1516-31
0VPXCCMPTSequence number
32Timestamp
64SSRC Identifier
96... CSRC Identifiers ...
96+ (CC × 32)Extension Header IDExtension Header Length (EHL)
96+ (CC × 32) + (X × 32)... Extension Header ...
96+ (CC × 32) + (X × 32) + (X × EHL)Payload

  1. V (Version): (2) protocol version. Now version number 2.
  2. P (Padding, Supplement): (1) used when the RTP packet is padded with empty bytes at the end, for example, for encryption algorithms.
  3. X (Extension): (1) indicates the presence of an extended header, determined by the application. In our case, this is not used.
  4. CC (CSRC Count): (4) contains the number of CSRC identifiers. We are not used either.
  5. M (Marker): (1) is used at the application level, in our case this bit is set to one if the RTP packet contains the end of the JPEG frame.
  6. PT (Payload Type): (7) indicates the format of the payload - the transmitted data. For MJPEG it's 26.
  7. Sequence Number : (16) RTP packet number, used to detect lost packets.
  8. Timestamp (32): timestamp, in our case 90,000 hertz (90,000 = 1 second).
  9. SSRC (Synchronization Source): (32) the identifier of the synchronizer, however ridiculous it may sound. Specifies the source of the stream.
  10. CSRC (Contributing Source): (32) identifiers of additional sources, used when we have a stream coming from several places.
  11. Extension Header ID : (16) the identifier of the extension, if we have it, you need to know what it is. In our case, it is not used.
  12. Extension Header Length : (16) The length of this header in bytes.
  13. Extension Header : The title itself. The content can be very different, depending on the context.
  14. Payload : payload data is our very same JPEG frames. Fragmented, of course.
Fields starting with CSRC are optional. They are not used to transmit MJPEG from cameras, as far as I know.

Transferred one level of encapsulation above. Now the task is to convert the received video data into a full-fledged JPEG image. In the case of MJPEG over HTTP, everything is simple - we cut out a piece of the stream and work with it immediately as with a JPEG image. In the case of RTP, the image is not completely transmitted; the JPEG header is omitted to save traffic. It must be restored independently from the attached data.

The RTP Payload for MJPEG specification is described in RFC2435 . I will also give you a table with a description of all the format fields:
+ Bit offset0-78-1516-2324-31
0Type-specificFragment offset
32TypeQWidthHeight
if Type in 64..127Restart Marker header
if Q in 128..255MBZPrecisionLength
Quantization table data

  1. Type-specific (Depends on type): (8) the meaning of the field depends on the implementation, in our case it does not apply.
  2. Fragment Offset : (24) indicates the position of the current frame fragment in the entire frame.
  3. Type : (8) how the image is restored depends on the type.
  4. Q (Quality): (8) image quality.
  5. Width : (8) frame width.
  6. Height : (8) and height.
  7. Restart Marker header : (32) is used when decoding JPEG if RST markers are used. I don’t know if their cameras are used or not, but I ignore this header. This field appears only with Type 64 to 127.
  8. Quantization Table Data : if present, then you do not need to calculate them separately. And they are needed to properly recreate pictures from JPEG data. If these tables are not correct, then the image will be with the wrong colors and contrasts. There should be two tables: Luma and Chroma for brightness and color, respectively.
  9. MBZ, Precision, Length : (32) parameters of quantization tables, I ignore them, I set Length to 128 - two tables of 64 bytes each. Otherwise, I do not know how to work with them.
The RST marker header and quantization tables may not be present. If there is no first, then it’s very good, since I don’t count on anything else. If there is no second, the necessary tables are calculated based on the Q parameter. An

RTCP packet contains a subset of it, it can be of four types: 201 - source report, 202 - receiver report, 203 - source description and 204 - destination is determined by the application. First of all, we must accept type 201, then send type 202. 203 and 204 are optional, but I also take them into account. There can be several RTCP packets in one UDP packet.

All types have a similar structure. Any RTCP packet begins with the following data:
+ Bit offset0-123-78-1516-31
0VersionPaddingSC or RC or SubtypePacket typeLength

  1. Version : (2) RTP version.
  2. Padding : (1) the same as for RTP.
  3. SC or RC or Subtype : (5) depending on the type, it can be the number of sources (Sources Count) or the number of recipients (Receivers Count) included in the report of the recipient and source, respectively. If it is an APP packet, then this field defines the subtype of such a packet.
  4. Packet Type : (8) packet type, 201 — Sender's Report SS, 202 — Receiver's Report RR, 203 — Source Description SDES, and 204 — the destination is determined by the application (APP).
  5. Length : (16) The size of the data following the header, measured in 32 bit units.
Further I will not give fields for each subtype, they can be viewed in RFC3550 . Let me just say that SS and RR types carry information about sent / received packets and about time delays. SDES in itself carries different text fields that define the source, such as its name, email, phone, location, etc.

This concludes the introduction.

Python MJPEG over RTSP client


So we got to the python. The client consists of several files, main.pycontains a callback function that processes the received images, it also starts the Twisted network framework and stores the parameters for connecting to the camera. All listings I shorten, the full version can be downloaded from the link at the end of the article.
main.py
20:	def processImage(img):
21:	    'This function is invoked by the MJPEG Client protocol'
22:	    # Process image
23:	    # Just save it as a file in this example
24:	    f = open('frame.jpg', 'wb')
25:	    f.write(img)
26:	    f.close()
27:	
28:	def main():
29:	    print 'Python M-JPEG Over RSTP Client 0.1'
30:	    config = {'request': '/jpeg',
31:	          'login': '',
32:	          'password': 'admin',
33:	          'ip': '192.168.0.252',
34:	          'port': 554,
35:	          'udp_port': 41760,
36:	          'callback': processImage}
37:	    # Prepare RTP MJPEG client (technically it's a server)
38:	    reactor.listenUDP(config['udp_port'], rtp_mjpeg_client.RTP_MJPEG_Client(config))
39:	    reactor.listenUDP(config['udp_port'] + 1, rtcp_client.RTCP_Client()) # RTCP
40:	    # And RSTP client
41:	    reactor.connectTCP(config['ip'], config['port'], rtsp_client.RTSPFactory(config))
42:	    # Run both of them
43:	    reactor.run()
44:	    # On exit:
45:	    print 'Python M-JPEG Client stopped.'

In principle, you can work without implementing the RTCP protocol and receiving audio data. In this case, the camera disconnects after about a minute. We have to reconnect all the time, this is done automatically, so it does not cause problems. However, for the article I added the RTCP part and made a blank for receiving audio data.

The next important file is rtsp_client.py. He is the most confused, but his goal is obvious - to correctly establish the connection described above.
rtsp_client.py
012:	class RTSPClient(Protocol):
013:	    def __init__(self):
014:	        self.config = {}
015:	        self.wait_description = False
016:	
017:	    def connectionMade(self):
018:	        self.session = 1
019:	        # Authorization part
020:	        if self.config['login']:
021:	            authstring = 'Authorization: Basic ' + b64encode(self.config['login']+':'+self.config['password']) + '\r\n'
022:	        else:
023:	            authstring = ''
024:	        # send OPTIONS request
025:	        to_send = """\
026:	OPTIONS rtsp://""" + self.config['ip'] + self.config['request'] + """ RTSP/1.0\r
027:	""" + authstring + """CSeq: 1\r
028:	User-Agent: Python MJPEG Client\r
029:	\r
030:	"""
031:	        self.transport.write(to_send)
032:	        if debug:
033:	            print 'We say:\n', to_send
034:	    
035:	    def dataReceived(self, data):
036:	        if debug:
037:	            print 'Server said:\n', data
038:	        # Unify input data
039:	        data_ln = data.lower().strip().split('\r\n', 5)
040:	        # Next behaviour is relevant to CSeq
041:	        # which defines current conversation state
042:	        if data_ln[0] == 'rtsp/1.0 200 ok' or self.wait_description:
043:	            # There might be an audio stream
044:	            if 'audio_track' in self.config:
045:	                cseq_audio = 1
046:	            else:
047:	                cseq_audio = 0
048:	            to_send = ''
049:	            if 'cseq: 1' in data_ln:
050:	                # CSeq 1 -> DESCRIBE
051:	                to_send = """\
052:	DESCRIBE rtsp://""" + self.config['ip'] + self.config['request'] + """ RTSP/1.0\r
053:	CSeq: 2\r
054:	Accept: application/sdp\r
055:	User-Agent: Python MJPEG Client\r
056:	\r
057:	"""
058:	            elif 'cseq: 2' in data_ln or self.wait_description:
059:	                # CSeq 2 -> Parse SDP and then SETUP
060:	                data_sp = data.lower().strip().split('\r\n\r\n', 1)
061:	                # wait_description is used when SDP is sent in another UDP
062:	                # packet
063:	                if len(data_sp) == 2 or self.wait_description:
064:	                    # SDP parsing
065:	                    video = audio = False
066:	                    is_MJPEG = False
067:	                    video_track = ''
068:	                    audio_track = ''
069:	                    if len(data_sp) == 2:
070:	                        s = data_sp[1].lower()
071:	                    elif self.wait_description:
072:	                        s = data.lower()
073:	                    for line in s.strip().split('\r\n'):
074:	                        if line.startswith('m=video'):
075:	                            video = True
076:	                            audio = False
077:	                            if line.endswith('26'):
078:	                                is_MJPEG = True
079:	                        if line.startswith('m=audio'):
080:	                            video = False
081:	                            audio = True
082:	                            self.config['udp_port_audio'] = int(line.split(' ')[1])
083:	                        if video:
084:	                            params = line.split(':', 1)
085:	                            if params[0] == 'a=control':
086:	                                video_track = params[1]
087:	                        if audio:
088:	                            params = line.split(':', 1)
089:	                            if params[0] == 'a=control':
090:	                                audio_track = params[1]
091:	                    if not is_MJPEG:
092:	                        print "Stream", self.config['ip'] + self.config['request'], 'is not an MJPEG stream!'
093:	                    if video_track: self.config['video_track'] = 'rtsp://' + self.config['ip'] + self.config['request'] + '/' + basename(video_track)
094:	                    if audio_track: self.config['audio_track'] = 'rtsp://' + self.config['ip'] + self.config['request'] + '/' + basename(audio_track)
095:	                    to_send = """\
096:	SETUP """ + self.config['video_track'] + """ RTSP/1.0\r
097:	CSeq: 3\r
098:	Transport: RTP/AVP;unicast;client_port=""" + str(self.config['udp_port']) + """-"""+ str(self.config['udp_port'] + 1) + """\r
099:	User-Agent: Python MJPEG Client\r
100:	\r
101:	"""
102:	                    self.wait_description = False
103:	                else:
104:	                    # Do not have SDP in the first UDP packet, wait for it
105:	                    self.wait_description = True
106:	            elif "cseq: 3" in data_ln and 'audio_track' in self.config:
107:	                # CSeq 3 -> SETUP audio if present
108:	                self.session = data_ln[5].strip().split(' ')[1]
109:	                to_send = """\
110:	SETUP """ + self.config['audio_track'] + """ RTSP/1.0\r
111:	CSeq: 4\r
112:	Transport: RTP/AVP;unicast;client_port=""" + str(self.config['udp_port_audio']) + """-"""+ str(self.config['udp_port_audio'] + 1) + """\r
113:	Session: """ + self.session + """\r
114:	User-Agent: Python MJPEG Client\r
115:	\r
116:	"""
117:	                reactor.listenUDP(self.config['udp_port_audio'], rtp_audio_client.RTP_AUDIO_Client(self.config))
118:	                reactor.listenUDP(self.config['udp_port_audio'] + 1, rtcp_client.RTCP_Client()) # RTCP
119:	            elif "cseq: "+str(3+cseq_audio) in data_ln:
120:	                # PLAY
121:	                to_send = """\
122:	PLAY rtsp://""" + self.config['ip'] + self.config['request'] + """/ RTSP/1.0\r
123:	CSeq: """ + str(4+cseq_audio) + """\r
124:	Session: """ + self.session + """\r
125:	Range: npt=0.000-\r
126:	User-Agent: Python MJPEG Client\r
127:	\r
128:	"""
129:	            elif "cseq: "+str(4+cseq_audio) in data_ln:
130:	                if debug:
131:	                    print 'PLAY'
132:	                pass
133:	                
134:	            elif "cseq: "+str(5+cseq_audio) in data_ln:
135:	                if debug:
136:	                    print 'TEARDOWN'
137:	                pass
138:	
139:	            if to_send:
140:	                self.transport.write(to_send)
141:	                if debug:
142:	                    print 'We say:\n', to_send

In the presence of an audio track, this module also launches the rtp_audio_client.pycorresponding RTCP client.

After a successful connection, it accepts work by rtp_mjpeg_client.pyprocessing the incoming data stream.
rtp_mjpeg_client.py
08:	class RTP_MJPEG_Client(DatagramProtocol):
09:	    def __init__(self, config):
10:	        self.config = config
11:	        # Previous fragment sequence number
12:	        self.prevSeq = -1
13:	        self.lost_packet = 0
14:	        # Object that deals with JPEGs
15:	        self.jpeg = rfc2435jpeg.RFC2435JPEG()
16:	
17:	    def datagramReceived(self, datagram, address):
18:	        # When we get a datagram, parse it
19:	        rtp_dg = rtp_datagram.RTPDatagram()
20:	        rtp_dg.Datagram = datagram
21:	        rtp_dg.parse()
22:	        # Check for lost packets
23:	        if self.prevSeq != -1:
24:	            if (rtp_dg.SequenceNumber != self.prevSeq + 1) and rtp_dg.SequenceNumber != 0:
25:	                self.lost_packet = 1
26:	        self.prevSeq = rtp_dg.SequenceNumber
27:	        # Handle Payload
28:	        if rtp_dg.PayloadType == 26: # JPEG compressed video
29:	            self.jpeg.Datagram = rtp_dg.Payload
30:	            self.jpeg.parse()
31:	            # Marker = 1 if we just received the last fragment
32:	            if rtp_dg.Marker:
33:	                if not self.lost_packet:
34:	                    # Obtain complete JPEG image and give it to the
35:	                    # callback function
36:	                    self.jpeg.makeJpeg()
37:	                    self.config['callback'](self.jpeg.JpegImage)
38:	                else:
39:	                    #print "RTP packet lost"
40:	                    self.lost_packet = 0
41:	                    self.jpeg.JpegPayload = ""

He is easy to understand. Each time we accept another datagram, we parse it with a module rtp_datagram.py, and feed the result to a module rfc2435jpeg.pythat creates a full-fledged JPEG image. Next, we wait for the marker to appear rtp_dg.Markerand as it appears, call the callback function with the restored image.

The RTP datagram parser looks like this:
rtp_datagram.py
26:	    def parse(self):        
27:	        Ver_P_X_CC, M_PT, self.SequenceNumber, self.Timestamp, self.SyncSourceIdentifier = unpack('!BBHII', self.Datagram[:12])
28:	        self.Version =      (Ver_P_X_CC & 0b11000000) >> 6
29:	        self.Padding =      (Ver_P_X_CC & 0b00100000) >> 5
30:	        self.Extension =    (Ver_P_X_CC & 0b00010000) >> 4
31:	        self.CSRCCount =     Ver_P_X_CC & 0b00001111
32:	        self.Marker =       (M_PT & 0b10000000) >> 7
33:	        self.PayloadType =   M_PT & 0b01111111
34:	        i = 0
35:	        for i in range(0, self.CSRCCount, 4):
36:	            self.CSRS.append(unpack('!I', self.Datagram[12+i:16+i]))
37:	        if self.Extension:
38:	            i = self.CSRCCount * 4
39:	            (self.ExtensionHeaderID, self.ExtensionHeaderLength) = unpack('!HH', self.Datagram[12+i:16+i])
40:	            self.ExtensionHeader = self.Datagram[16+i:16+i+self.ExtensionHeaderLength]
41:	            i += 4 + self.ExtensionHeaderLength
42:	        self.Payload = self.Datagram[12+i:]

The JPEG recovery module is large enough because it contains several tables and a rather long header generation function. Therefore, I will omit them here, providing only the functions of parsing the RTP payload and creating the final JPEG image.
rfc2435jpeg.py
287:	    def parse(self):
288:	        HOffset = 0
289:	        LOffset = 0
290:	        # Straightforward parsing
291:	        (self.TypeSpecific,
292:	        HOffset, #3 byte offset
293:	        LOffset,
294:	        self.Type,
295:	        self.Q,
296:	        self.Width,
297:	        self.Height) = unpack('!BBHBBBB', self.Datagram[:8])
298:	        self.Offest = (HOffset << 16) + LOffset
299:	        self.Width = self.Width << 3
300:	        self.Height = self.Height << 3
301:	        
302:	        # Check if we have Restart Marker header
303:	        if 64 <= self.Type <= 127:
304:	            # TODO: make use of that header
305:	            self.RM_Header = self.Datagram[8:12]
306:	            rm_i = 4 # Make offset for JPEG Header
307:	        else:
308:	            rm_i = 0
309:	        
310:	        # Check if we have Quantinization Tables embedded into JPEG Header
311:	        # Only the first fragment will have it
312:	        if self.Q > 127 and not self.JpegPayload:
313:	            self.JpegPayload = self.Datagram[rm_i+8+132:]
314:	            QT_Header = self.Datagram[rm_i+8:rm_i+140]
315:	            (self.QT_MBZ,
316:	             self.QT_Precision,
317:	             self.QT_Length) = unpack('!BBH', QT_Header[:4])
318:	            self.QT_luma = string2list(QT_Header[4:68])
319:	            self.QT_chroma = string2list(QT_Header[68:132])
320:	        else:
321:	            self.JpegPayload += self.Datagram[rm_i+8:]
322:	        # Clear tables. Q might be dynamic.
323:	        if self.Q <= 127:
324:	            self.QT_luma = []
325:	            self.QT_chroma = []
326:	            
327:	    def makeJpeg(self):
328:	        lqt = []
329:	        cqt = []
330:	        dri = 0
331:	        # Use exsisting tables or generate ours
332:	        if self.QT_luma:
333:	            lqt=self.QT_luma
334:	            cqt=self.QT_chroma
335:	        else:
336:	            MakeTables(self.Q,lqt,cqt)        
337:	        JPEGHdr = []
338:	        # Make a complete JPEG header
339:	        MakeHeaders(JPEGHdr, self.Type, int(self.Width), int(self.Height), lqt, cqt, dri)
340:	        self.JpegHeader = list2string(JPEGHdr)
341:	        # And a complete JPEG image
342:	        self.JpegImage = self.JpegHeader + self.JpegPayload
343:	        self.JpegPayload = ''
344:	        self.JpegHeader = ''
345:	        self.Datagram = ''

I also implemented a module for receiving audio data rtp_audio_client.py, but did not convert them to playable data. If someone will need it, I made a sketch in this file as everything should be. You just need to organize parsing in the likeness rfc2435jpeg.py. Audio data is easier because it is not fragmented. Each package carries enough data to reproduce. I will not bring this module here, since the article is already very long (I would soon have implemented a Habrafold).

For correct operation, we need to receive and send RTCP packets, accept Sender's Reports, send Receiver's Reports. To simplify the task, we will send our RR immediately after receiving SR from the camera and we will lay in them idealized data that everything is fine.
rtcp_client.py
09:	class RTCP_Client(DatagramProtocol):
10:	    def __init__(self):
11:	        # Object that deals with RTCP datagrams
12:	        self.rtcp = rtcp_datagram.RTCPDatagram()
13:	    def datagramReceived(self, datagram, address):
14:	        # SSRC Report received
15:	        self.rtcp.Datagram = datagram
16:	        self.rtcp.parse()
17:	        # Send back our Receiver Report
18:	        # saying that everything is fine
19:	        RR = self.rtcp.generateRR()
20:	        self.transport.write(RR, address)

But the module works directly with RTCP datagrams. It turned out also big enough.
rtcp_datagram.py
049:	    def parse(self):
050:	        # RTCP parsing is complete
051:	        # including SDES, BYE and APP
052:	        # RTCP Header
053:	        (Ver_P_RC,
054:	        PacketType,
055:	        Length) = unpack('!BBH', self.Datagram[:4])
056:	        Version = (Ver_P_RC & 0b11000000) >> 6
057:	        Padding = (Ver_P_RC & 0b00100000) >> 5
058:	        # Byte offset
059:	        off = 4
060:	        # Sender's Report
061:	        if PacketType == 200:
062:	            # Sender's information
063:	            (self.SSRC_sender,
064:	            self.NTP_TimestampH,
065:	            self.NTP_TimestampL,
066:	            self.RTP_Timestamp,
067:	            self.SenderPacketCount,
068:	            self.SenderOctetCount) = unpack('!IIIIII', self.Datagram[off: off + 24])
069:	            off += 24
070:	            ReceptionCount = Ver_P_RC & 0b00011111
071:	            if debug:
072:	                print 'SDES: SR from', str(self.SSRC_sender)
073:	            # Included Receiver Reports
074:	            self.Reports = []
075:	            i = 0
076:	            for i in range(ReceptionCount):
077:	                self.Reports.append(Report())
078:	                self.Reports[i].SSRC,
079:	                self.Reports[i].FractionLost,
080:	                self.Reports[i].CumulativeNumberOfPacketsLostH,
081:	                self.Reports[i].CumulativeNumberOfPacketsLostL,
082:	                self.Reports[i].ExtendedHighestSequenceNumberReceived,
083:	                self.Reports[i].InterarrivalJitter,
084:	                self.Reports[i].LastSR,
085:	                self.Reports[i].DelaySinceLastSR = unpack('!IBBHIIII', self.Datagram[off: off + 24])
086:	                off += 24
087:	        # Source Description (SDES)
088:	        elif PacketType == 202:
089:	            # RC now is SC
090:	            SSRCCount = Ver_P_RC & 0b00011111
091:	            self.SourceDescriptions = []
092:	            i = 0
093:	            for i in range(SSRCCount):
094:	                self.SourceDescriptions.append(SDES())
095:	                SSRC, = unpack('!I', self.Datagram[off: off + 4])
096:	                off += 4
097:	                self.SourceDescriptions[i].SSRC = SSRC
098:	                SDES_Item = -1
099:	                # Go on the list of descriptions
100:	                while SDES_Item != 0:
101:	                    SDES_Item, = unpack('!B', self.Datagram[off])
102:	                    off += 1
103:	                    if SDES_Item != 0:
104:	                        SDES_Length, = unpack('!B', self.Datagram[off])
105:	                        off += 1
106:	                        Value = self.Datagram[off: off + SDES_Length]
107:	                        off += SDES_Length
108:	                        if debug:
109:	                            print 'SDES:', SDES_Item, Value
110:	                    if SDES_Item == 1:
111:	                        self.SourceDescriptions[i].CNAME = Value
112:	                    elif SDES_Item == 2:
113:	                        self.SourceDescriptions[i].NAME = Value
114:	                    elif SDES_Item == 3:
115:	                        self.SourceDescriptions[i].EMAIL = Value
116:	                    elif SDES_Item == 4:
117:	                        self.SourceDescriptions[i].PHONE = Value
118:	                    elif SDES_Item == 5:
119:	                        self.SourceDescriptions[i].LOC = Value
120:	                    elif SDES_Item == 6:
121:	                        self.SourceDescriptions[i].TOOL = Value
122:	                    elif SDES_Item == 7:
123:	                        self.SourceDescriptions[i].NOTE = Value
124:	                    elif SDES_Item == 8:
125:	                        self.SourceDescriptions[i].PRIV = Value
126:	                        # Extra parsing for PRIV is needed
127:	                    elif SDES_Item == 0:
128:	                        # End of list. Padding to 32 bits
129:	                        while (off % 4):
130:	                            off += 1
131:	        # BYE Packet
132:	        elif PacketType == 203:
133:	            SSRCCount = Ver_P_RC & 0b00011111
134:	            i = 0
135:	            for i in range(SSRCCount):
136:	                SSRC, = unpack('!I', self.Datagram[off: off + 4])
137:	                off += 4
138:	                print 'SDES: SSRC ' + str(SSRC) + ' is saying goodbye.'
139:	        # Application specific packet
140:	        elif PacketType == 204:
141:	            Subtype = Ver_P_RC & 0b00011111
142:	            SSRC, = unpack('!I', self.Datagram[off: off + 4])
143:	            Name = self.Datagram[off + 4: off + 8]
144:	            AppData = self.Datagram[off + 8: off + Length]
145:	            print 'SDES: APP Packet "' + Name + '" from SSRC ' + str(SSRC) + '.'
146:	            off += Length
147:	        # Check if there is something else in the datagram        
148:	        if self.Datagram[off:]:
149:	            self.Datagram = self.Datagram[off:]
150:	            self.parse()
151:	    
152:	    def generateRR(self):
153:	        # Ver 2, Pad 0, RC 1
154:	        Ver_P_RC = 0b10000001
155:	        # PT 201, Length 7, SSRC 0xF00F - let it be our ID
156:	        Header = pack('!BBHI', Ver_P_RC, 201, 7, 0x0000F00F)
157:	        NTP_32 = (self.NTP_TimestampH & 0x0000FFFF) + ((self.NTP_TimestampL & 0xFFFF0000) >> 16)
158:	        # No lost packets, no delay in receiving data, RR sent right after receiving SR
159:	        # Instead of self.SenderPacketCount should be proper value
160:	        ReceiverReport = pack('!IBBHIIII', self.SSRC_sender, 0, 0, 0, self.SenderPacketCount, 1, NTP_32, 1)
161:	        return Header + ReceiverReport

Parsing is strictly according to the RFC. I use a function unpackto convert data into numerical variables, I move through the data array using a variable offthat contains the current offset.

And here is the link: Python MJPEG over RTSP client .

It was no longer possible to make a version of listings with Russian comments, so forgive me if it’s not convenient for anyone.

Useful to read

  1. Multimedia over the internet
  2. List of RTP profiles for audio and video
On this article the end, and who mastered - well done!

Also popular now: