On the issue of AVR and world records

    Do well, bad yourself


    The reason for the post was the recent (when I started to write this post, it was really recent, but something post had been in the Unfinished folder for a long time) publishing on Habré regarding aspects of the implementation of the UART software at AVR. The questions raised by themselves are not without interest, but such strange answers are given to them that they considered it their duty to make the necessary clarifications. The topic is indicated, those who want to read about "kings, cabbage and shoes", that is, the requirements of standards, reading (correct) technical documentation and records in programming in assembly language for AVR, can click on the button below.

    Let us denote the question in more detail - is it possible to implement the IRPS (the interface name I used to use, nee the name UART) on the AVR type MC (specifically, it was Tiny13) when operating from an internal generator. The fact is that this generator has not very good performance in frequency hold accuracy, which is why this question arises. At once I will make a reservation that it does not matter whether we will consider the software implementation (as suggested in the original post) or use the hardware blocks of the MC. The results of one method (in terms of accuracy parameters over time) are almost completely translated to another.

    The crucial question is whether the internal generator can provide the required accuracy of operation, since in the case of a negative answer to this question, further research becomes meaningless. To compare two independent values, we need to know both of them, so we will start by determining the required accuracy of frequency hold and the capabilities provided by this particular MC in this part. An important remark to the previous sentence is not a specific instance, “given to us in sensations,” but a specific type of MC, which is represented by its technical description.

    To begin with, we will find what is easier to find (well, I thought so) - the requirements for the accuracy of the interface time parameters. Open the standard on RS232 and see everything you need right away. It turned out that "you can not just take it and ...", because the standard is paid and all copies on the web are illegal. Okay, we take the domestic version of the GOST for the C2 junction and do not find there any time parameters at all, except for the duration of the front and the cut of the pulse. At first, this caused a slight shock — as it could be — but then it came to be understood that junction C2 describes only the interface part of the IRPS and the requirements should lie in the latter. In principle, everything is logical, it is incomprehensible only why it is not explicitly described in GOST, but in the end, sometimes you can think for yourself, although all the same “it’s not neatly working out”.

    Of course, knowing the transmission protocol, it is possible from general considerations to find the maximum allowable mismatch between the transmitter and receiver speeds (0.5 / 9.5 = 5.2%), but this will be a study of a spherical horse you know where, because:

    1. the requirements of the standard can and should be more stringent than a similar theoretical calculation of the maximum permissible mismatch;
    2. knowledge of the final mismatch figure will not give us the transmitter and receiver budget.

    Wanderings across the Ineta expanses resulted in Atmel's AppNote (well, if we still use the MK of this company), which explicitly states that the mismatch is 2% with an equal budget, which leads to the requirement of maintaining the transmitter frequency to 1%. We will believe a respected company and suppose that they have access to secret materials and this figure is correct, especially since it looks believable. I understand the vulnerability of such a position, but I, frankly, was tired of looking for the exact answer to such a simple question, and I can't wait to move on to the next part.

    The next half of the answer lies inside the MC and is determined by the technical documentation for it. First, a little about the structure of the internal generator, especially since it is more or less described. The generator uses a chain as the timing element of the RC and, since the task of forming an integral capacitor and the exact resistor in the integral performance is very nontrivial, the total frequency will vary from instance to instance of the MC. To make this option more predictable, manufacturers have added a hardware node controlled via a calibration byte. This node allows you to change the frequency of the generator in a wide range and, accordingly, to obtain the desired value with a much higher accuracy.

    It would be interesting to know exactly how the control is implemented in the hardware, I see the option of either controlling the voltage of the capacitor charge through the DAC or controlling the comparison voltage on the comparator. Both of these options, however, lead to significant non-linearity of the regulation characteristics, although they are not difficult to implement. But the establishment of the internal implementation of the generator is not part of our task, we are interested in its external parameters.

    So we open the documentation (you can open the file in the viewer, and I have a typographical version of the description printed by the manufacturer itself - yes, before that happened) and look for the appropriate section. We are interested in the parameters in the section "Calibrated Internal RC Oscillator", then, if necessary, follow the links. And here we (I’m not sure about you) were waiting for the first disappointment - I have been working with Atmel products for a long time (about 15 years), and have always believed that they have good documentation on the MC. According to psychiatrists, “there are no healthy people, there are no additional ones” and a close examination of the relevant section confirmed this truth, as I could have overlooked such failures in the documentation before. In my defense I can only say that:

    1. I have never used an internal generator in the MK data, so I did not study it especially carefully;
    2. when I started working with these MCs (much more than 10 years ago), I was young (well, definitely younger than now) and stupid and didn’t understand the need for good (understandable, comprehensive and unambiguous) documentation;
    3. I am ready to forgive myself a lot, simply because I forgive myself a lot, and all my flaws are not fatal (the last argument is especially convincing, isn't it?).

    So, having finished sprinkling ashes on my head, I will begin to state my complaints about the documentation and there can be no excuse for the manufacturer. Open the above section and begin to carefully study it, if necessary, go to the necessary pages (you still click on the links). Together we will look for the following parameters characterizing the time characteristics of the generator: nominal accuracy, the influence of the supply voltage, the influence of temperature and aging parameters - this is the minimum necessary set for estimating the accuracy parameters of any generator.

    The first part of Marlezonsky ballet is nominal accuracy.

    Immediately we find the necessary parameter - the table of the generator setting accuracy, in which we see two lines of “Factory Calibrated” with the specified value ± 10% and “Manual Calibrating” with the same parameter ± 2%.

    A number of questions immediately arise about these data - what they mean and how measurements of this parameter are carried out. For the first line in the table itself, the temperature (ambient or MK itself is not clear, but these are whims on my part) and supply voltage, in addition, the note says (in my opinion, unnecessarily) that this measurement is taken at a specific point in space external conditions. One can guess that in this case we should use the calibration coefficient recorded at the manufacturer, although this would be better to indicate this explicitly in a note. Everything is more or less clear and is interpreted almost unambiguously (although in the context of the study of technical documentation, it would be necessary to say that everything is unclear and allows variations in interpretations, and this is unacceptable, but if we do that,

    But with the second line of the case is worse - given the limits of change in temperature and supply voltage and argues that the use of some kind of magic calibration procedure can achieve significantly better than the factory, the result in the entire range. I immediately have a question - if this can be achieved everywhere (at any point of temperature and power) and the manufacturer knows how to do this, then why hasn’t she done it herself during factory calibration at a specific point of conditions? We turn to the description of the calibration byte and see that it takes 128 values ​​and this overlaps the range from 50% to 200% of the nominal, which corresponds to 150/128 ~ 1.17% of the frequency change per unit calibration value, which should give the expected accuracy better than one%. But then we should take into account that the adjustment characteristic is clearly not linear and in the area of ​​large calibration values ​​we have 60% / 32 ~ 2% step (data is taken from the graph, I have repeatedly expressed my attitude to this method of presenting technical parameters, but I repeat, this is an unacceptable method, although, of course, better than nothing), which gives an accuracy of 1% and if we take into account not the monotony of the adjusting characteristic (yes, this is exactly indicated in the documentation, not drawn in the graph, but clearly indicated in the text. I categorically refuse to understand how, and most importantly, firm want body to make just such a law of adjustment, but she succeeded), which is clearly indicated in the recommendations, it should be considered an accuracy of 2% is achievable. I do not like the fact that I had to look at the graph, but this is not necessary and the tabular data are sufficient.

    The second part of Marlezonsky ballet. - the influence of external conditions.

    But then begins "thrash, waste and sodomy." Instead of tables of values, we are invited to look at pictures (in the documentation for some reason they are called graphs of typical values), and, as you know, “the main advantage of the graphical presentation of information is its visibility, and it has no other advantages.” One could even use such information and remove the boundary values ​​from the schedule (“although it is offensive to the team”), if this schedule had not been given in the “Typical Characteristics” section. I don’t know how anyone, personally, I am deeply convinced that I’ll indicate typical (or typical, I don’t know how correctly, in one film they say “typical appearance”) values, even in the form of a graph, even in a table - it’s just nothing to indicate. They can not be guided in the design,

    All right, we drove, we will try to extract at least some information and see that when the temperature changes from -40 to + 80 ° С, the generator frequency changes by ± 4%. A similar picture with the supply voltage - only typical graphs and the resulting error in -6 + 2% from 3.3 to 5.5. Data on the aging of the generator is simply not given, which, in general, is logical, since against the background of the parameters already given, the accuracy of one percent for 5 years (a characteristic value for silicon) no one cares about.

    Now we have all the data to answer our initial question - with factory calibration the generator does not meet the interface requirements for accuracy, with calibration for specific conditions of use - meets the boundary requirements, but does not meet the standard. It should also be noted that if the calibration for the supply voltage and a specific MK can be done in the manufacture of the device and hope that they do not change in time, then the temperature can only be taken into account “on the fly” and requires an external standard of time of appropriate accuracy. Since the development of devices should be guided by the rule “we believe in God, everything else requires evidence” and we did not prove the possibility of compliance, the correct answer is to guarantee the implementation of an IRPS meeting the requirements of the standard in this MC with an internal generator. We note that we made the above conclusion in the analysis of the documentation and formulated it in such a way as to emphasize that everything can turn out well on a specific instance of the MC if the stars rise successfully. That is, our conclusion contradicts the previously mentioned post, how could this happen, because everything works fine for a person - let's understand.

    Now the criticism of the above post will begin. First, let's think about how we can ensure that the device is checked for compliance with the requirements of a specific interface. I can suggest the following ways:

    1. A good way is to measure the critical parameters of the device interface and compare them with the requirements of the standard - this can be done using universal instruments (in our case, an oscilloscope and the length of the bit interval or a complete package), or using a specialized instrument that is certified to perform testing of this interface.
    2. So-so way - to organize interaction with another device that implements the counterpart of the interface and is proven (meets the requirements of the standard). Of course, such a test is completely insufficient, and rather, it can be used more to confirm the malfunction of the device under test, but does at least something.
    3. A bad way is to independently implement the response part of the interface (in the same device or in another) and interact with it. Since both devices are obviously not proven, the benefits of such a test are very, very doubtful. A good example of this approach is the “echo” on the serial channel, which proves nothing but the fact that the device is not broken in principle and is capable of transmitting something, and it reports little more than the transmission speed than nothing.
    4. A terrible way is to take a device that doesn’t meet the requirements of the standard at all (and better contradicts them) and work as in the previous paragraph.

    It is the latter method used in the post under consideration - a software receiver of the serial channel is implemented, which, in contradiction with the requirements of the standard, changes its frequency, adjusting to the input signal (specifically, the length of the start bit), which allows you to consistently receive a signal of poor quality in the sense of time parameters. It cannot be said that it should never be done this way; moreover, the analog modem adopted the setting for the incoming speed, which was implemented in the same way, but it was exactly the frequency switching by changing the divider, and obviously not our case. And it is in this version that everything turns out perfectly and information is transmitted steadily under any external conditions. Therefore, if we talk about the possibility of transmitting information between two MCs working from internal generators, using an interface remotely resembling the IRPS, the answer is yes. If we are talking about interaction with external devices that meet the requirements of the standard and nothing more, then we will expect many unpleasant surprises.

    The general conclusion from the above:

    1. When designing devices, you should focus on documentation (RTFM),
    2. it is necessary to study the documentation and interpret the read correctly (RTFMF),
    3. keep in mind that there may be reticence, inaccuracies (and even errors) in documentation in our time, therefore
    4. verify the information obtained for consistency and plausibility, and
    5. use the experimentally obtained information only to confirm the findings obtained from the analysis of the documentation, while
    6. especially carefully choose the methods of experiments on testing equipment for obtaining a reliable result.

    Well, in conclusion, as promised, a little assembler. I allowed myself to rewrite the code snippet given by the author in a normal way, since the assembler built into GCC is nothing but a mockery of a programmer. No, I, of course, understand that the developers of the compiler were guided by weighty considerations, but the result painfully resembles the phrase “well, it works.”

    .equ delay=15
    TX_Byte:
    	cli
    ;	ld 	r18,Z+
    ;	cp 	r18,r1
    ;	breq	Exit_Transmit
    ;	dec 	r1
    	cbi 	port, TX_line
    Delay_TX:
    	ldi 	r16,delay
    Do_Delay_TX:
    	nop
    	dec 	r16
    	brne	Do_Delay_TX	
    TX_Bit:
    	sbrc 	r18,0
    	sbi 	port,TX_line
    	sbrs	r18,0
    	cbi 	port,TX_line
    	lsr	r18
    	lsr	r17
    	brcs	Delay_TX
    	sbi	port,	TX_line
    	ldi	r16,delay
    Stop_Bit_TX:
    	nop
    	dec	r16
    	brne	Stop_Bit_TX
    	Sei
    

    And an error in the program immediately catches the eye - in line 3 (commented out) the value of register 1 must be zero, but the assignment is not explicitly stated in the function. After completing the transmission cycle of one byte, this value is guaranteed by line 12, but not on the first pass. Therefore, initialization must be added, which will require an increase in code size.

    The second drawback is the formation of the level in rows 4–7, since the method adopted by the author for issuing the next bit will lead to front jittering for 2 clocks at various transitions (0–1 and 1–0), which will increase the requirements for accuracy of frequency hold. It’s not that it gives a very strong influence, but if you can correct the flaw without extending the program, then why not - see the epigraph. The original version took 4 words and was executed in 4 bars, the new one takes 4 words and is executed in the same 4 bars. Yes, the corrected version requires a deeper study of the architecture of the MC, but who said that it would be easy. On the other hand, in the first variant, the port modification is atomic, and in the second one — no, in this case it does not matter (we explicitly prohibited interruptions), but the sediment remains.

    The third drawback is the issue of more style. I have repeatedly expressed my attitude towards the magic constants that we see in the preamble of this program. I emphasize once again - because the author sets a constant in the preamble of the program, and not directly in the operator, “ordinary street magic” does not disappear anywhere. The fact is that we must explicitly show the reader a method of forming a specific value, and not create a synonym for the value obtained in an unknown way. You can, of course, write a comment to the line with the value in which you specify the calculation formula, but it is better to use the calculation formula explicitly when forming the constant and then the comment is simply not needed (of course, with the speaking names of the constants used). This is done in the text below, and note that

    There is one more error - the duration of the start bit is somewhat different from the bit interval for data. Although the deviation is not too significant (3 clocks), nevertheless, at high transmission speeds, where the length of the bit interval leaves about 90 clocks, this is already a few percent error, which is unacceptable. This error can be easily corrected by adding additional delay commands, but this will increase the length of the program, so for now just fix its presence and then ensure that the correct architecture of the program (this way, even this short program applies this concept) eliminates automatically.

    Well, now that we have corrected errors (except the last), we will try to improve the program in the sense of the main criterion (to achieve a record, in this particular case) - the length of the code. The first thing that catches your eye is the presence of two time exposures, which is bad because it violates the principle of DRY (general requirement) and increases the size of the code (specific requirement). It would be possible to arrange this fragment in the form of a subroutine and we would still benefit in length, because we add 3 code words (1 for calling in two places and 1 for return), and save 4, but there is a much more beautiful way - neat the organization of the byte transfer cycle, which can be seen in the following text.

    .equ delay=15
    TX_Byte:
    	cli
    	sec		; потребуется для стоп-бита
    	clt		; устанавливаем старт-бит
    TransBit:		; собственно формирование бита
    	in 	r17,port
    	bld 	r17,Tx_line
    	outport,r17
    Delay_TX:		; формируем длину битового интервала
    	ldi 	r17,delay
    Do_Delay_TX:
    	nop
    	dec 	r17
    	brne	Do_Delay_TX	
    TX_Bit:
    	bst 	r16,0ror	r16
    	clc		
    	brne  TransBit ; есть что передавать
    	brcs	TransBit ; передаем стоп-бит
    Exit_Transmit:
    	Sei
    

    Let us note how we use the transmitted byte together with the transfer bit as a bit counter, a beautiful solution, but it has one drawback - the duration of the last bit of data will be several (2 clocks) longer than the others, due to the transition delay. If we were talking about the stop bit, then “don't give a damn and forget,” since we have not set the minimum interval between transfers, but this is a significant bit, and we have just criticized the original program for such behavior. We will not be like the biblical character from the parable of the mote in someone else's eye and take steps to eliminate it. This phenomenon could easily be compensated by introducing a delay of 2 cycles, but the code length will increase, and this is a key parameter.

    The next improvement is related to the formation of the bit interval duration, which in the source program is performed on a 4 clock cycle. If we make it 3-clock (minimally possible in this MK), then we can save one byte of code and potentially we can improve the accuracy parameters, since the discreteness of the delay will be less (the deviation does not exceed half the size of the discrete with proper rounding). But it should be borne in mind that in a particular case we can lose accuracy, it all depends on the source data. Another circumstance that could affect the choice of just such a cycle duration - the maximum delay size with a byte counter is 256 values ​​- for the available option you can use speeds from 9600 baud and above, but with 3 cycle delay it is impossible. It would be very nice to reflect this circumstance (minimum port speed) in the comments to the program and at the same time display a warning message in case of violation of this requirement. Well, make the appropriate modifications to the parameter formation macros to form the delay, not forgetting to use “speaking” names to designate variables.

    .equ Freq = 8000000
    .equ BaudRate = 115200
    .equ PayLoad = 9  ; количество тактов вне цикла
    .equ CycleTime = 3 ;количество тактов в цикле
    .equ delay=((Freq*2/BaudRate - PayLoad*2)+CycleTime)/(CycleTime*2)
    TX_Byte:
    	clildir18,10
    	sec		; потребуется для стоп-бита
    	clt		; устанавливаем старт-бит
    TransBit:
    	inr17,portbldr17,Tx_lineoutport,r17Delay_TX:
    	ldir17,delayDo_Delay_TX:
    	decr17brneDo_Delay_TXTX_Bit:
    	bstr16,0
    	rorr16decr18brneTransBitExit_Transmit:
    	sei

    Now let's look at the result - the code size decreased from 20 to 16 words (if only the transmission itself is taken into account, then even more strikingly - from 18 to 14, the front jitter disappeared (of course, only that component of the jitter, which is due to the program features, we have not encroach), keeping accuracy of slots has improved, the program has become clearer and easier to understand (through comments, because even a well-written program in assembler self-documented, as a rule, is not).

    The conclusion of the Last days of the - if we are going to set the world record in programming in assembly language, it behooves us very deeply explore the architecture of a particular IC and apply this knowledge to get a perfect result, paying attention to all the subtleties.

    And finally, the task of writing a minimum size code nowadays looks a bit contrived, but, quite unexpectedly, it receives confirmation of its vitality. At the end of last year (2016, that's how long this post was waiting for its turn) a new MK from the MSP430 family was announced, which along with a uniquely low price (26 cents - we are waiting for the appearance of Chinese devices based on it) has a uniquely small amount of program memory - 512 byte (no, I was not mistaken, the letter "k" immediately after the number does not). So the code size can be critical when using this device, and indeed writing such extreme programs requires an in-depth study of the MC, and “work in itself is a blessing”.

    Also popular now: