Features cross-domain scripting on subdomains with XML in Opera and some others
Recently, work has been done on writing a custom script (Firefox, Chrome, and Opera browsers), in which you need to access an XML document located in a senior second-level domain from a third-level domain. The work opened a look at some features of the behavior of browsers, especially Opera, the reasons for which are not fully understood. But, since such scripting (reading and writing to XML documents in a subdomain) is sometimes necessary, I would like to share practical results and show open questions.
There is an HTML page with a non-strict Doctype (transitional), which should receive data from some XML, which, in addition, we can not change and write our scripts into it (as if it were XHTML). We cannot change the HTML page either, introducing, for example, XML, but we manage all the actions only through a user script that is launched at the time of onload. First thought - there is nothing simpler, a classic task with XMLHttpRequest and rebuilding the DOM according to our needs.
Yes, if XML were on the same domain, then there is nothing easier. But the domain is different, so AJAX-request here would be possible, but in a difficult-redundant way: 1) load some existing simple HTML from the subdomain into the frame, 2) write the AJAX script in the script, 3) read the XML in AJAX, 4 ) read the script from the subdomain received. Why do 2 reads if you can get by with one reading in a single XML frame? It is not necessary to load the code into the frame, just reading is also easier. Therefore, we exclude AJAX as a method that accompanies a complex method and try to make it simple.
As you know, reading JS data and a DOM document in a subdomain can be done by specifying document.domain in the subdomain equal to the senior domain:
Indeed, in FF and Chrome, work on this scheme does not have any peculiarities, and it would be possible not to write about it, but the Opera had strange unresolved problems (maybe from XML?) That were simply bypassed, but they are experimentally observed. About them and some side effects in Firefox - this article.
Since the scripts are custom, IE had nothing to do in the study.
A working example of the script can be viewed in Firefox + GreeaseMonkey, Chrome or Opera, installing the script described in the article (latest version 1.3).
We will only talk about the essence - about that part of the script that relates to the issue of cross-domain access.
We completely abandoned AJAX on XMLHttp, and now we need to make a frame into which XML is loaded.
XML has the following structure:
Then we turn on the counter of periodic attempts to read the XML structure, because we cannot catch the onload event in a foreign frame. (Actually, in Opera it is possible, the code and explanations are lower, but reading tests gave even worse results, about them later.)
Here the fun begins. Firstly, in order not to have errors, you have to check the tree step by step as a sapper (of course, you can catch errors in try-catch ).
Why in the second if statement had to separate the "Opera" and non-Opera parts? Because at Opera we were able to write the variable u to the XML document .
FF couldn’t do this (why - the question is open ), but I didn’t really want to, because there is the second part of the condition: a long expression reading the node in the tag is the same remembered username that was read in FF / Chrome without any problems. What were the problems at the Opera?
The problems ( question two ) were strange and poorly explained. Trying to read the login in the second way, by the nodes, and not written in advance, we got the existence of the nodes with the login, getElementsByTagName ('login') [0], but the lack of .childNodes - the text in the login. Those. as if XML was of the form
Neither delays nor dances like reading getElementsByTagName ('habrauser') [0] .childNodes [1] ([1] - because there are text nodes at the place of line breaks) did not help.seemed empty from the point of view of the Opera. Why is the second open question. (In the frame it was non-empty and was seen if you did not write ifr.style.display = 'none';.)
Despite the incomprehensibility of the behavior of the first node, I had to leave the script for Opera in this form - by chance coincidence of workable alternatives, we still got substitute is the variable u in the document. But the solution in general terms, if stood first, it would be an impossible task for the Opera (if solved this way).
Finally, the third question and feature of Opera and FF. In the document “Web Technologies for Opera Web Applications” I spied a hack for connecting onload to a frame. Interestingly, the appearance of the DOM document visible at the time of onload was even worse. We did not see the nodes not only with login, but also with karma. It turns out approximately such an effect as an inaccurate choice of the onload moment, but not quite like that - the login [0] .firstChild node does not appear for a long time, if not to say that always. What to do with all this and how to avoid? Maybe Opera is "choking" in a long series of node checks and we need to do them somehow differently? No one has encountered this situation?
How an arbitrary document in Opera will be seen through nodes is a theoretical question. While there is no desire to answer it, because the very meaning of what is happening is incomprehensible, therefore there is nowhere to predict behavior and probe the results.
1. The opera can write to the subdomain in the frame through contentDocument and contentWindow new variables. Firefox does not know how, but through contentWindow it does not throw an error - just undefined. (The call of the frame is in the hypostasis of the document: document.getElementsByName ('ifr'), so it would be more correct to call through contentDocument, but it is interesting that FF does not work, giving an uncaught error .)
2. Opera can hack the onload of someone else’s document in order to execute code in the subdomain after the document is generated; Firefox does not. (Although Opera had little use for XML.)
3. An Opera or a code for verifying the existence of a node, which leads to the effect of unreadableness of the first text node, while it exists, you need to figure out how to access the nodes correctly, is there any cross-domain effect, have other developers encountered similar effects.
Conditions of the problem.
There is an HTML page with a non-strict Doctype (transitional), which should receive data from some XML, which, in addition, we can not change and write our scripts into it (as if it were XHTML). We cannot change the HTML page either, introducing, for example, XML, but we manage all the actions only through a user script that is launched at the time of onload. First thought - there is nothing simpler, a classic task with XMLHttpRequest and rebuilding the DOM according to our needs.
Yes, if XML were on the same domain, then there is nothing easier. But the domain is different, so AJAX-request here would be possible, but in a difficult-redundant way: 1) load some existing simple HTML from the subdomain into the frame, 2) write the AJAX script in the script, 3) read the XML in AJAX, 4 ) read the script from the subdomain received. Why do 2 reads if you can get by with one reading in a single XML frame? It is not necessary to load the code into the frame, just reading is also easier. Therefore, we exclude AJAX as a method that accompanies a complex method and try to make it simple.
As you know, reading JS data and a DOM document in a subdomain can be done by specifying document.domain in the subdomain equal to the senior domain:
document.domain = 'сайт.ру';
Indeed, in FF and Chrome, work on this scheme does not have any peculiarities, and it would be possible not to write about it, but the Opera had strange unresolved problems (maybe from XML?) That were simply bypassed, but they are experimentally observed. About them and some side effects in Firefox - this article.
Since the scripts are custom, IE had nothing to do in the study.
A working example of the script can be viewed in Firefox + GreeaseMonkey, Chrome or Opera, installing the script described in the article (latest version 1.3).
The process of the script.
We will only talk about the essence - about that part of the script that relates to the issue of cross-domain access.
We completely abandoned AJAX on XMLHttp, and now we need to make a frame into which XML is loaded.
if(!document.getElementsByName('ifr').length){ //создание фрейма
var ifr=document.createElement('iframe');
ifr.setAttribute('name', 'ifr');
ifr.src = 'http://habrahabr.ru/api/profile/'+username+'/';
ifr.style.display='none';
document.body.appendChild(ifr);
}
XML has the following structure:
spmbt
24
59.3
1038
Then we turn on the counter of periodic attempts to read the XML structure, because we cannot catch the onload event in a foreign frame. (Actually, in Opera it is possible, the code and explanations are lower, but reading tests gave even worse results, about them later.)
win.habrKarmView.ii=20; //число попыток прочитать фрейм
win.habrKarmView.ww = setInterval(showValue, 300);
Here the fun begins. Firstly, in order not to have errors, you have to check the tree step by step as a sapper (of course, you can catch errors in try-catch ).
var f = document.getElementsByName('ifr');
if(f && f[0] && f[0].contentDocument && f[0].contentDocument.getElementsByTagName('login')
&& f[0].contentDocument.getElementsByTagName('login')[0]
&& f[0].contentDocument.getElementsByTagName('karma')[0]){
if( (f[0].contentDocument.u == username || !f[0].contentDocument.u) && self.opera
|| !self.opera && f[0].contentDocument.getElementsByTagName('login')[0].childNodes[0].nodeValue == username ){
...тело функции showValue - отображаем полученные из XML данные...
}
}
Why in the second if statement had to separate the "Opera" and non-Opera parts? Because at Opera we were able to write the variable u to the XML document .
if(self.opera) document.getElementsByName('ifr')[0].contentDocument.u = username;
FF couldn’t do this (why - the question is open ), but I didn’t really want to, because there is the second part of the condition: a long expression reading the node in the tag is the same remembered username that was read in FF / Chrome without any problems. What were the problems at the Opera?
The problems ( question two ) were strange and poorly explained. Trying to read the login in the second way, by the nodes, and not written in advance, we got the existence of the nodes with the login, getElementsByTagName ('login') [0], but the lack of .childNodes - the text in the login. Those. as if XML was of the form
24
59.3
1038
Neither delays nor dances like reading getElementsByTagName ('habrauser') [0] .childNodes [1] ([1] - because there are text nodes at the place of line breaks) did not help.
Despite the incomprehensibility of the behavior of the first node, I had to leave the script for Opera in this form - by chance coincidence of workable alternatives, we still got substitute is the variable u in the document. But the solution in general terms, if
Finally, the third question and feature of Opera and FF. In the document “Web Technologies for Opera Web Applications” I spied a hack for connecting onload to a frame. Interestingly, the appearance of the DOM document visible at the time of onload was even worse. We did not see the nodes not only with login, but also with karma. It turns out approximately such an effect as an inaccurate choice of the onload moment, but not quite like that - the login [0] .firstChild node does not appear for a long time, if not to say that always. What to do with all this and how to avoid? Maybe Opera is "choking" in a long series of node checks and we need to do them somehow differently? No one has encountered this situation?
How an arbitrary document in Opera will be seen through nodes is a theoretical question. While there is no desire to answer it, because the very meaning of what is happening is incomprehensible, therefore there is nowhere to predict behavior and probe the results.
Useful knowledge and conclusions.
1. The opera can write to the subdomain in the frame through contentDocument and contentWindow new variables. Firefox does not know how, but through contentWindow it does not throw an error - just undefined. (The call of the frame is in the hypostasis of the document: document.getElementsByName ('ifr'), so it would be more correct to call through contentDocument, but it is interesting that FF does not work, giving an uncaught error .)
if(self.opera)
document.getElementsByName('ifr')[0].contentDocument.u = username;
2. Opera can hack the onload of someone else’s document in order to execute code in the subdomain after the document is generated; Firefox does not. (Although Opera had little use for XML.)
var ifr=document.createElement('iframe');
ifr.src = 'http://habrahabr.ru/api/profile/'+username+'/';
ifr.style.display='none';
ifr.onload = function(){
...;
}
document.body.appendChild(ifr);
3. An Opera or a code for verifying the existence of a node, which leads to the effect of unreadableness of the first text node, while it exists, you need to figure out how to access the nodes correctly, is there any cross-domain effect, have other developers encountered similar effects.