The story of a small legacy code study
It’s good when there is someone more experienced in the team who will show what and how to do, what rake and angle what, and where to download the best drawings of bicycles for 2007 on DVD. This story is about how the wish was given out as valid, what was the result, and how the crisis was overcome.
This happened at a time when, having, as it seemed to me, a mediocre experience in development, I was looking for a place where you can evolve (or mutate) from a non-junior to even a confident junior. In mysterious ways of the Lord, such a place was found, a project was attached to the place, and the “old-school” programmer, who wrote more than girls over his career in systems. "Excellent! The project, and therefore there is money for the RFP, the mentor is attached, we live! ”I thought, but then, as in the description for a typical horror, the heroes in dark darkness faced terrible horror ...
First things first:
We started development on a once-proprietary php engine, used to store data (here you might think MySQL \ PostgreSQL \ SQLite \ MongoDB \ Something-else-but-necessarily-with-suffix DB-otherwise- guys-do not understand, but they did not guess) api-gateway.
“Haha, using php, do you attach another api-gateway to it, and store data on it? Isn't it easier to work with api directly from js-code? Or use a DBMS + PHP? ”Asks a seasoned reader. And he will be right. But at that time, I, who had not yet seen the species, did not think so, well, who knows, cool guys probably do that, and the “old-school” programmers know better.
As I was further explained:
A characteristic feature of the api gateway was that json data was transmitted in a get request. Yes, yes, those very lovely json-objects, were subjected to url-encoding, and put in the query string. And everything would be fine, when suddenly one day ... the length of the get request ceased to be enough. Stupidly, the url-encoded json, kanalya, did not fit in there! The “old-school” programmer, scratching his head, asked:
“What are we going to do?” Our json has grown, but we haven’t noticed ... ”
“Well, uh, maybe then we'll post them in post?” I suggested, so it seems to be more correct.
“Ok, pass to post.”
It was atas number one.
To screw new functionality into the project, it was necessary to implement
corresponding CRUD requests on the gateway, which is exactly what our “old school” comrade actually did. The problem was that he did this once every 3 days, giving out “Done, check.” Checks at times showed that not everything worked, for example, getting the list is ok, adding a new item is not ok. It took some time to fix and refine, after which it was possible to release the functionality in mass access. The proposal to do the implementation of queries on the gateway yourself, because it is at least faster, was rejected, because "it’s difficult there, you won’t figure it out." The result of this approach was the closure of the work “on myself”. If, for example, it was necessary to fix something en masse in the database, then, choosing between 3-day waiting and the implementation of corrections myself through queries - I chose the 2nd option. Customers didn’t like to wait, new introductory flew stably. One of such introductory ones, namely the mass affixing of a sign to users of some sign, was entrusted to me to implement, there was an hour for everything about everything, the authorities were waiting for a beautiful report. Here awaits us atas number two-s.
The fact is that the format of json data transmitted in requests implied only a few required fields, all the others were arbitrary, a clear and final structure did not exist. For example, to add a user, I passed a json of the form:
The optional part that was transmitted in the add / update requests was saved and given in its entirety (I will tell about how this was implemented below). The bottom line is, the time does not stand still, it would be necessary to solve the problem - to update users, put down their labels. But do not drive the whole structure every time? Need to check! I tested it on myself - I transmitted only one field in the update request, checked, the field appeared, the rest of the data is in place. The point is small - loop and update the rest.
The script puffed softly, receiving and transmitting data, and everything seemed to be going well ... when suddenly - a call. “We do not see the name of the users in the system!” - report from that end of the wire. “Come on! It worked out fine! ”- an unpleasant chill ran across my back. Further investigation showed that indeed, the name "" was indicated in the name, although all other data were in place. What to do in such a situation? Deploy backup!
"Comrade" old-school "programmer, wi hev er problems hir! Need backup! When is the latest relevant done? ”- I ask.
“Uhh ... I’ll see now .... No, there’s no bakapa. ”
The situation was saved by the fact that a couple of hours earlier I finalized and tested the module with reports, I had a csv-box with all the necessary data, the order was restored within another hour.
Lack of intelligible documentation, descriptions of work algorithms, input validity checks, and most importantly - database backups - atas number two-s.
Since then, backups began to be removed every day.
Shaky, but the work was moving, problems were resolved, some faster, some slower, when all of a sudden ... customers realized that the system was not understood by someone else’s servers, and for such an attitude towards PD and organizing ZI activities in ISPD they will not stroke the head. It is necessary to transfer the server to yourself.
Why was the system not originally transferred? The leadership had one passion - centralization. Management dreamed of a system that would do everything! Do you need to attach a kid to school? You go into the system, in a special office, there you submit an application. You need to, say, order pizza - you go into the system, into another special office, apply for pizza. Maybe you wanted to communicate with beautiful ladies / gentlemen? At your service is a third special cabinet - you are also submitting an application there. And so on ad infinitum.
Advantages - one login and password for everything, data is safely and securely stored on the gateway. There are even backups. And, mind you, no one will take this system from us! And even if it takes it away - what's next? All the same, they won’t understand the system of protection against “old-school” programmers - everything is complicated there.
VDS with the system was unloaded, attributed to customers, they deployed it, everyone dances and sings, beauty!
And then a wave of curiosity and some suspicion covered me.
If our web application is full of holes, then where is the data? Have you really stayed on other servers? And if they decide to close the system from the outside, then everything will collapse?
A simple check showed that the data, as well as the gateway processors themselves, were on the same server. And, no, they were not transferred there because of the transfer of the server, they were always there.
Now I had at that disposal the very secret “old-school” development, which I set about researching. Of course, cool reverse engineering in the style of the articles of the Hacker magazine, with ollydbg, offsets, and other cool things did not work, so I’m sharing what I have.
The development itself was implemented in python, there were only .pyc files that could easily be decompiled back into readable code. Frankly, it took a lot of time, as much as 25 minutes, to figure out how it works.
So, the complex system developed by the “old school” programmer, which few can understand, consists of:
Very difficult development was given to consciousness with difficulty and was in no hurry to reveal the long-lost secrets of the masters of the past.
Incredible (Go to> Definition, thanks to PhpStorm for understanding PL \ SQL), incomprehensible to the mind of the layman suffering True Knowledge of the Lost Civilization of Old School Programmers was nevertheless gained. In general, when connected, a temporary table was generated in the authentication data verification function, in which the user id was stored.
This was just the beginning, an indicative list of serious vulnerabilities found:
And another serious problem is not a formalized data storage scheme. As promised earlier, I’m talking about storing “any fields” from JSON. No, they were not stored as a row in the table. They were divided into key-value pairs and stored in a separate table. For example, for users there were 2 tables - users, and users_data (string key, string value) - where the data was actually stored. The result of this approach was an increase in time with complex samples from the database.
Actually this was enough to make and implement the decision to transfer the system to a new api, understandable, documented, and supported.
Perhaps this system is “Legacy”, and the “old-school” programmer who created it is the essence of Legacy.
Nevertheless, the conclusions are as follows:
This happened at a time when, having, as it seemed to me, a mediocre experience in development, I was looking for a place where you can evolve (or mutate) from a non-junior to even a confident junior. In mysterious ways of the Lord, such a place was found, a project was attached to the place, and the “old-school” programmer, who wrote more than girls over his career in systems. "Excellent! The project, and therefore there is money for the RFP, the mentor is attached, we live! ”I thought, but then, as in the description for a typical horror, the heroes in dark darkness faced terrible horror ...
First things first:
1. Size matters
We started development on a once-proprietary php engine, used to store data (here you might think MySQL \ PostgreSQL \ SQLite \ MongoDB \ Something-else-but-necessarily-with-suffix DB-otherwise- guys-do not understand, but they did not guess) api-gateway.
“Haha, using php, do you attach another api-gateway to it, and store data on it? Isn't it easier to work with api directly from js-code? Or use a DBMS + PHP? ”Asks a seasoned reader. And he will be right. But at that time, I, who had not yet seen the species, did not think so, well, who knows, cool guys probably do that, and the “old-school” programmers know better.
As I was further explained:
- Gateway = security, no one will enter and exit just like that
- Gateway = secure data storage, you just can’t get into it, + backups
- Gateway = speed, works quickly and without failures, time-tested
- The authoritative point of view of “old-school” programmers is that your php is full of holes, any web application is hacked by default, so there is nothing to store data next to it
A characteristic feature of the api gateway was that json data was transmitted in a get request. Yes, yes, those very lovely json-objects, were subjected to url-encoding, and put in the query string. And everything would be fine, when suddenly one day ... the length of the get request ceased to be enough. Stupidly, the url-encoded json, kanalya, did not fit in there! The “old-school” programmer, scratching his head, asked:
“What are we going to do?” Our json has grown, but we haven’t noticed ... ”
“Well, uh, maybe then we'll post them in post?” I suggested, so it seems to be more correct.
“Ok, pass to post.”
It was atas number one.
2. Time and backup management
To screw new functionality into the project, it was necessary to implement
corresponding CRUD requests on the gateway, which is exactly what our “old school” comrade actually did. The problem was that he did this once every 3 days, giving out “Done, check.” Checks at times showed that not everything worked, for example, getting the list is ok, adding a new item is not ok. It took some time to fix and refine, after which it was possible to release the functionality in mass access. The proposal to do the implementation of queries on the gateway yourself, because it is at least faster, was rejected, because "it’s difficult there, you won’t figure it out." The result of this approach was the closure of the work “on myself”. If, for example, it was necessary to fix something en masse in the database, then, choosing between 3-day waiting and the implementation of corrections myself through queries - I chose the 2nd option. Customers didn’t like to wait, new introductory flew stably. One of such introductory ones, namely the mass affixing of a sign to users of some sign, was entrusted to me to implement, there was an hour for everything about everything, the authorities were waiting for a beautiful report. Here awaits us atas number two-s.
The fact is that the format of json data transmitted in requests implied only a few required fields, all the others were arbitrary, a clear and final structure did not exist. For example, to add a user, I passed a json of the form:
POST /api/users
{
"email":"ivanov@mail.ru",
"password":"myEmailIsVeryBig",
"name_last":"Иванов",
"name_first":"Иван",
"name_middle":"Иваныч",
"birth":"01.01.1961",
//а вот тут следует вольноопределяемая часть, что считаем нужным - то и отдаем
"living_at":"ул.Сусаниа, д.3 к.4 кв.24",
"phone_num":"+70000000000"
}
The optional part that was transmitted in the add / update requests was saved and given in its entirety (I will tell about how this was implemented below). The bottom line is, the time does not stand still, it would be necessary to solve the problem - to update users, put down their labels. But do not drive the whole structure every time? Need to check! I tested it on myself - I transmitted only one field in the update request, checked, the field appeared, the rest of the data is in place. The point is small - loop and update the rest.
The script puffed softly, receiving and transmitting data, and everything seemed to be going well ... when suddenly - a call. “We do not see the name of the users in the system!” - report from that end of the wire. “Come on! It worked out fine! ”- an unpleasant chill ran across my back. Further investigation showed that indeed, the name "" was indicated in the name, although all other data were in place. What to do in such a situation? Deploy backup!
"Comrade" old-school "programmer, wi hev er problems hir! Need backup! When is the latest relevant done? ”- I ask.
“Uhh ... I’ll see now .... No, there’s no bakapa. ”
The situation was saved by the fact that a couple of hours earlier I finalized and tested the module with reports, I had a csv-box with all the necessary data, the order was restored within another hour.
Lack of intelligible documentation, descriptions of work algorithms, input validity checks, and most importantly - database backups - atas number two-s.
Since then, backups began to be removed every day.
3. Deep striking
Shaky, but the work was moving, problems were resolved, some faster, some slower, when all of a sudden ... customers realized that the system was not understood by someone else’s servers, and for such an attitude towards PD and organizing ZI activities in ISPD they will not stroke the head. It is necessary to transfer the server to yourself.
Why was the system not originally transferred? The leadership had one passion - centralization. Management dreamed of a system that would do everything! Do you need to attach a kid to school? You go into the system, in a special office, there you submit an application. You need to, say, order pizza - you go into the system, into another special office, apply for pizza. Maybe you wanted to communicate with beautiful ladies / gentlemen? At your service is a third special cabinet - you are also submitting an application there. And so on ad infinitum.
Advantages - one login and password for everything, data is safely and securely stored on the gateway. There are even backups. And, mind you, no one will take this system from us! And even if it takes it away - what's next? All the same, they won’t understand the system of protection against “old-school” programmers - everything is complicated there.
VDS with the system was unloaded, attributed to customers, they deployed it, everyone dances and sings, beauty!
And then a wave of curiosity and some suspicion covered me.
If our web application is full of holes, then where is the data? Have you really stayed on other servers? And if they decide to close the system from the outside, then everything will collapse?
A simple check showed that the data, as well as the gateway processors themselves, were on the same server. And, no, they were not transferred there because of the transfer of the server, they were always there.
Now I had at that disposal the very secret “old-school” development, which I set about researching. Of course, cool reverse engineering in the style of the articles of the Hacker magazine, with ollydbg, offsets, and other cool things did not work, so I’m sharing what I have.
The development itself was implemented in python, there were only .pyc files that could easily be decompiled back into readable code. Frankly, it took a lot of time, as much as 25 minutes, to figure out how it works.
So, the complex system developed by the “old school” programmer, which few can understand, consists of:
- The script processed by the Apache, which actually received the request. What did this script do? Opened a connection to a specific localhost port and passed a request there with all its data. All. Interests go further.
- The server part that processed requests from the script. The logic of his actions was quite interesting. Firstly, there was no data manipulation in the code, and no queries in the database; instead, the database functions in PL \ SQL were called. All logic, checks, and so on, everything was laid down in the database function. 50% of the script was a dictionary containing the name of the request, the function associated with it, and the names of the parameters of the function that should correspond to the data passed in the get-request line. JSON data, if needed, was passed as a separate parameter. A feature of the organization of the server part was the backup connection during user authentication. If the login and password were found in the database, the session ID was generated, and the instance of the open connection was added to the dictionary (and was killed by the timeout of 10 minutes, so as not to be killed - there was a special method to extend the life of the session), the key was the session ID, which was not directly stored in the database. How exactly is the session ID associated with user data? After all, is there a request for data in which the user ID is not transmitted? It works, which means that something is wrong here.
Very difficult development was given to consciousness with difficulty and was in no hurry to reveal the long-lost secrets of the masters of the past.
Incredible (Go to> Definition, thanks to PhpStorm for understanding PL \ SQL), incomprehensible to the mind of the layman suffering True Knowledge of the Lost Civilization of Old School Programmers was nevertheless gained. In general, when connected, a temporary table was generated in the authentication data verification function, in which the user id was stored.
This was just the beginning, an indicative list of serious vulnerabilities found:
- DDoS using mass authentication (connections were reserved, and, therefore, rested on the DBMS connection limit, which, given the possibility of extending the session life time, made it possible to completely fill the memory with connections, and the work of new users in the system would be impossible);
- lack of protection against brute force (the number of failed login attempts is not detected, is not stored, is not checked;
- lack of control over actions with entities (for example, the list of documents requested by the user was issued taking into account the organization to which the user is attached, and if you know the ID of the document, you can successfully complete the request to update / delete the document, and the list of users is good even without passwords, which, incidentally, were stored in the database in the clear, without hashing, could be received by anyone at all).
And another serious problem is not a formalized data storage scheme. As promised earlier, I’m talking about storing “any fields” from JSON. No, they were not stored as a row in the table. They were divided into key-value pairs and stored in a separate table. For example, for users there were 2 tables - users, and users_data (string key, string value) - where the data was actually stored. The result of this approach was an increase in time with complex samples from the database.
Actually this was enough to make and implement the decision to transfer the system to a new api, understandable, documented, and supported.
Morality
Perhaps this system is “Legacy”, and the “old-school” programmer who created it is the essence of Legacy.
Nevertheless, the conclusions are as follows:
- If you are told “there it’s difficult, you won’t understand” - it means there is a complete atas
- If they are crushed by authority, then something is unclean
- Trust, but verify - security is not a state, security is a continuous process, therefore it is better to verify the declared qualities of reality than to find out later that all users suddenly became “Ivanov Ivanov Ivanitch”, but there are no bacaps.