Silverlight + nginx = resumable browser file downloads
This article discusses the experience of implementing Silverlight-client for the organization of renewable file downloads on the Files@mail.ru project .
Why is this needed? I think there is no need to tell that uploading files to the server and storing them now provides a very large number of web projects, from small to very large. Moreover, the download is usually implemented in the form of a regular one
The problem is that the HTTP protocol is initially textual and is not very suitable for transferring large amounts of binary data. It follows that when the user disconnects, reboots the computer, and the like, the half-transferred file must be downloaded again, and in the case of a slow channel, this turns into a real mockery.
What to do?
The first rumors about a new version of Adobe Flash 10 with support for the FileReference.load () method for reading the contents of a file inspired us. But there it was: Adobe “outwitted everyone”. The FileReference.load () method completely loads the entire contents of the file into the computer’s memory, thereby “suspending” the machine when trying to read a large file (in experiments, the file was already “large” about 500 MB on a computer with 2 GB of RAM). In addition, Flash does not support files larger than 2GB.
We were saddened and disappointed. In addition, support for partial loading from the server side was urgently required, and it was too lazy to do it ourselves.
And once we thought: “And let's look at Silverlight, maybe it will give us something more than Flash?” - and we were not mistaken.
In Silverlight, working with files is implemented more competently and accessible than in Flash - we can read the file selected by the user in the dialog at an arbitrary offset by buffers of arbitrary size. At the same time, the file size in Silverlight is limited by a 64-bit number, i.e. we can upload files of almost infinite size (theoretically up to 16 384 PB).
In addition, in the Valery Kholodkov repository (if someone suddenly doesn’t know, then this is the author of the excellent nginx_upload_module module for downloading files), a branch called partial-upload appeared and slowly developed, one name of which led us into awe.
With the support of Valery, we began to write a Silverlight client and “dock” it with the server module ...
After hours of rewriting the client code and testing it, we finally got the first working option.
With bated breath, we started downloading the first file - oh my god, what a bliss it is when you pull out the network cable during the download process, then stick it back in and, miraculously, the download resumes almost from the point of the cliff. But bliss quickly passed, because during a quick test, bugs were found both in the client code and in the server module code.
Many thanks to Valery for fixing bugs quite quickly in his module, and to us for fighting Silverlight and C #.
One fine August day, we finally finally tested and fixed all the bugs found and did not fail to take advantage of this in order to make the users of Mail.Ru Files happy .- I mean, they let it into production.
And finally - a decision in the studio!
Download is as follows.
The client generates a unique session identifier for each uploaded file.
Also, for each file, a hash is considered, the purpose of which is to uniquely identify a unique file within the user's computer.
After selecting a file in the dialog and calculating its hash, we check the availability of information about this file in the local Silverlight storage, and if there is information, we start the download from the first “hole” in the loaded byte ranges.
Then the client sends a piece of the file, indicating the range of bytes sent in the X-Content-Range header (due to Silverlight restrictions, this header is used instead of the standard Content-Range header for HTTP, although the server module supports both headers) and the session identifier in the Session-ID header . In this case, pure binary data is sent in the request body, i.e. the contents of the piece.
In the header of the Range response from the server comes a list of byte ranges of this file that are already uploaded to the server. This list is also duplicated in the body of the answer (for what duplicated - see below).
After each chunk has successfully loaded, information about the loaded ranges is stored / updated in the Silverlight local storage, the key is the hash of the file. This allows you to reload the file even after closing the browser. After loading each chunk, the server module returns http-code 201 to us, while the download request is not proxied to the backend.
When the module determines that the file is fully downloaded, it proxies the backend request with a link to a temporary file (the same as the standard upload module). In fact, for the backend, the transition from using the standard upload module to using the partial-upload module is completely transparent, i.e. the backend code does not need to be changed at all.
1. It is not possible to set the Content-Range header, therefore we use the X-Content-Range header
2. We cannot reliably determine the server response code, we only see 200 or 404 codes (when using the HTTP Stack Browser in Silverlight)
3. When using the HTTP Client Stack in Silverlight, we lose proxy authorization and are forced to manually set cookies, but we can accurately determine the server response code - therefore, we use the HTTP Browser Stack with a little tricks to determine the 201 response code:
4. The calculation of the “correct” file hash (for example, md5) on large files takes a lot of time - tens of seconds - which is unacceptable, therefore we take 50 parts of the file of 100KB, for each part we calculate the sum using the Adler32 algorithm (this algorithm was chosen from due to its high speed of operation on the advice of a familiar hacker) and then concatenate individual amounts - this is the “unique” hash of file
5. Silverlight, if there are certain Russian letters in the file name (the letter “z” definitely fell into disfavor of Microsoft) threw an exception in line ...
... so I had to make a modification - encode the file name at boot and decode on the server
6. Even though the buffers are reset after loading a certain number of bytes, Silverlight caches the POST request and sends it completely. This makes it impossible to download the entire file (without chunks), because on large files, the client’s memory is not enough to buffer the request. This feature also makes it impossible to adequately display download progress.
Therefore, we are trying to divide the file into 100 chunks to display progress from 0% to 100%, but at the same time limit the size of the chunk above and below for cases of very large and very small files, respectively, which can lead to more than 100 chunks, so much less.
7. There is an unpleasant bug in Opera (it’s already possible to go astray, which one): if the response from the server has a zero-length body, then Silverlight does not call the response reader reading handler. That is why we asked Valery to duplicate the range of downloaded bytes in the body of the server response.
We stepped on a lot of unpleasant rakes, and we want the path of other developers to be less thorny. Therefore, we decided to open the client part code. Meet MrUploader . Together with Valery Kholodkov's nginx-upload module, it is especially tasty.
Why is this needed? I think there is no need to tell that uploading files to the server and storing them now provides a very large number of web projects, from small to very large. Moreover, the download is usually implemented in the form of a regular one
, less often - using Flash, even less often - by other means (we do not consider FTP download in this article).The problem is that the HTTP protocol is initially textual and is not very suitable for transferring large amounts of binary data. It follows that when the user disconnects, reboots the computer, and the like, the half-transferred file must be downloaded again, and in the case of a slow channel, this turns into a real mockery.
What to do?
How did we get to such a life
The first rumors about a new version of Adobe Flash 10 with support for the FileReference.load () method for reading the contents of a file inspired us. But there it was: Adobe “outwitted everyone”. The FileReference.load () method completely loads the entire contents of the file into the computer’s memory, thereby “suspending” the machine when trying to read a large file (in experiments, the file was already “large” about 500 MB on a computer with 2 GB of RAM). In addition, Flash does not support files larger than 2GB.
We were saddened and disappointed. In addition, support for partial loading from the server side was urgently required, and it was too lazy to do it ourselves.
And once we thought: “And let's look at Silverlight, maybe it will give us something more than Flash?” - and we were not mistaken.
In Silverlight, working with files is implemented more competently and accessible than in Flash - we can read the file selected by the user in the dialog at an arbitrary offset by buffers of arbitrary size. At the same time, the file size in Silverlight is limited by a 64-bit number, i.e. we can upload files of almost infinite size (theoretically up to 16 384 PB).
In addition, in the Valery Kholodkov repository (if someone suddenly doesn’t know, then this is the author of the excellent nginx_upload_module module for downloading files), a branch called partial-upload appeared and slowly developed, one name of which led us into awe.
With the support of Valery, we began to write a Silverlight client and “dock” it with the server module ...
Happy end
After hours of rewriting the client code and testing it, we finally got the first working option.
With bated breath, we started downloading the first file - oh my god, what a bliss it is when you pull out the network cable during the download process, then stick it back in and, miraculously, the download resumes almost from the point of the cliff. But bliss quickly passed, because during a quick test, bugs were found both in the client code and in the server module code.
Many thanks to Valery for fixing bugs quite quickly in his module, and to us for fighting Silverlight and C #.
One fine August day, we finally finally tested and fixed all the bugs found and did not fail to take advantage of this in order to make the users of Mail.Ru Files happy .- I mean, they let it into production.
And finally - a decision in the studio!
A bit about client-server interaction
Download is as follows.
The client generates a unique session identifier for each uploaded file.
SessionId = (1100000000 + new Random().Next(10000000, 99999999)).ToString();
* This source code was highlighted with Source Code Highlighter.
Also, for each file, a hash is considered, the purpose of which is to uniquely identify a unique file within the user's computer.
UniqueKey = "";
try
{
if (FileLength < Constants.MinFilesizeToAdd)
{
throw new Exception();
}
// Adler32 version to compute "unique" file hash
// UniqueKey will be Constants.NumPoints * sizeof(uint) length
int part_size = (int)((file.Length / Constants.NumPoints) < Constants.MaxPartSize ? file.Length / Constants.NumPoints : Constants.MaxPartSize);
byte[] buffer = new Byte[part_size];
byte[] adler_sum = new Byte[Constants.NumPoints * sizeof(uint) / sizeof(byte)];
int current_point = 0;
int bytesRead = 0;
Stream fs = file.OpenRead();
AdlerChecksum a32 = new AdlerChecksum();
while (current_point < Constants.NumPoints && (bytesRead = fs.Read(buffer, 0, part_size)) != 0)
{
a32.MakeForBuff(buffer, bytesRead);
int mask = 0xFF;
for (int i = 0; i < sizeof(uint) / sizeof(byte); i++)
{
UniqueKey += (char)((mask << (i * sizeof(byte)) & a32.ChecksumValue) >> (i * sizeof(byte)));
}
fs.Position = ++current_point * file.Length / Constants.NumPoints;
}
}
catch (Exception) { }
* This source code was highlighted with Source Code Highlighter.
After selecting a file in the dialog and calculating its hash, we check the availability of information about this file in the local Silverlight storage, and if there is information, we start the download from the first “hole” in the loaded byte ranges.
Then the client sends a piece of the file, indicating the range of bytes sent in the X-Content-Range header (due to Silverlight restrictions, this header is used instead of the standard Content-Range header for HTTP, although the server module supports both headers) and the session identifier in the Session-ID header . In this case, pure binary data is sent in the request body, i.e. the contents of the piece.
UriBuilder ub = new UriBuilder(UploadUrl);
HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(ub.Uri);
webrequest.Method = "POST";
webrequest.ContentType = "application/octet-stream";
// Some russian letters in filename lead to exception, so we do uri encode on client side
// and uri decode on server side
webrequest.Headers["Content-Disposition"] = "attachment; filename=\"" + HttpUtility.UrlEncode(File.Name) + "\"";
webrequest.Headers["X-Content-Range"] = "bytes " + currentChunkStartPos + "-" + currentChunkEndPos + "/" + FileLength;
webrequest.Headers["Session-ID"] = SessionId;
webrequest.BeginGetRequestStream(new AsyncCallback(WriteCallback), webrequest);
* This source code was highlighted with Source Code Highlighter.
In the header of the Range response from the server comes a list of byte ranges of this file that are already uploaded to the server. This list is also duplicated in the body of the answer (for what duplicated - see below).
After each chunk has successfully loaded, information about the loaded ranges is stored / updated in the Silverlight local storage, the key is the hash of the file. This allows you to reload the file even after closing the browser. After loading each chunk, the server module returns http-code 201 to us, while the download request is not proxied to the backend.
When the module determines that the file is fully downloaded, it proxies the backend request with a link to a temporary file (the same as the standard upload module). In fact, for the backend, the transition from using the standard upload module to using the partial-upload module is completely transparent, i.e. the backend code does not need to be changed at all.
Silverlight limitations we had to work around:
1. It is not possible to set the Content-Range header, therefore we use the X-Content-Range header
2. We cannot reliably determine the server response code, we only see 200 or 404 codes (when using the HTTP Stack Browser in Silverlight)
3. When using the HTTP Client Stack in Silverlight, we lose proxy authorization and are forced to manually set cookies, but we can accurately determine the server response code - therefore, we use the HTTP Browser Stack with a little tricks to determine the 201 response code:
if (ResponseText != null && ResponseText.Length != 0)
{
// We cannot check response.StatusCode, see comments in constructor of FileUploadControl
if (Regex.IsMatch(ResponseText, @"^\d+-\d+/\d+")) // we got 201 response
{
...
}
else // we got 200 response
{
BytesUploaded = FileLength;
}
}
* This source code was highlighted with Source Code Highlighter.
4. The calculation of the “correct” file hash (for example, md5) on large files takes a lot of time - tens of seconds - which is unacceptable, therefore we take 50 parts of the file of 100KB, for each part we calculate the sum using the Adler32 algorithm (this algorithm was chosen from due to its high speed of operation on the advice of a familiar hacker) and then concatenate individual amounts - this is the “unique” hash of file
5. Silverlight, if there are certain Russian letters in the file name (the letter “z” definitely fell into disfavor of Microsoft) threw an exception in line ...
webrequest.Headers["Content-Disposition"] = "attachment; filename=\"" + File.Name + "\"";
* This source code was highlighted with Source Code Highlighter.
... so I had to make a modification - encode the file name at boot and decode on the server
webrequest.Headers["Content-Disposition"] = "attachment; filename=\"" + HttpUtility.UrlEncode(File.Name) + "\"";
* This source code was highlighted with Source Code Highlighter.
6. Even though the buffers are reset after loading a certain number of bytes, Silverlight caches the POST request and sends it completely. This makes it impossible to download the entire file (without chunks), because on large files, the client’s memory is not enough to buffer the request. This feature also makes it impossible to adequately display download progress.
Therefore, we are trying to divide the file into 100 chunks to display progress from 0% to 100%, but at the same time limit the size of the chunk above and below for cases of very large and very small files, respectively, which can lead to more than 100 chunks, so much less.
public long FileLength
{
get { return fileLength; }
set
{
fileLength = value;
ChunkSize = (long)(fileLength / (100 / Constants.PercentPrecision));
if (ChunkSize < Constants.MinChunkSize)
ChunkSize = Constants.MinChunkSize;
if (ChunkSize > Constants.MaxChunkSize)
ChunkSize = Constants.MaxChunkSize;
}
}
* This source code was highlighted with Source Code Highlighter.
7. There is an unpleasant bug in Opera (it’s already possible to go astray, which one): if the response from the server has a zero-length body, then Silverlight does not call the response reader reading handler. That is why we asked Valery to duplicate the range of downloaded bytes in the body of the server response.
We stepped on a lot of unpleasant rakes, and we want the path of other developers to be less thorny. Therefore, we decided to open the client part code. Meet MrUploader . Together with Valery Kholodkov's nginx-upload module, it is especially tasty.