
“Distributed Information Flow Control”
Recently, at the request of his supervisor, he made a compilation and review for his group of some topical topics in the field of OS security and systems in general: automated trust negotiation (automatic "discussion" of access rights - I don’t know how to translate correctly) and information flow control (information flow control ) I wanted to post this compilation, but to my surprise I found, in my opinion, unreasonably little information about DIFC (distributed information flow control) in .RU, and therefore decided to write this short article on DIFC.
Motivation
Almost the only way to ensure the security and privacy of data in the system is considered to be authentication (answering the question: “Who said that?”) And authorization (“What does he have the right to do with this data”). Those. if the program needs access to some data, we actually have 2 options: refuse or believe. If we do not trust this program, we lose the ability to work with it and possibly lose important functionality. If we decide that we trust the program (and / or its developers), then the program actually becomes the “sovereign master” of this information (or copy). Such a principle in the literature is called All-Or-Nothing - "All or Nothing."
Naturally, this principle is not flexible enough and, in addition, is the main reason for many vulnerabilities in systems, such as buffer overflows. In general, this principle does not allow the creation of more interesting applications where access rights are not limited to traditional ones: “no access”, “read only” and “read / write”. It turns out that there are systems that allow much more flexible differentiation of access rights to data - systems with support for managing information flows. The most important feature of these systems is that they follow the data throughout their life cycle in the system. Recall that traditionally the system is responsible only for initial access to data, for example, checking whether a program has access to a file, and what the program does with this data, the system is no longer interested.
A classic example. Let's say that there are 2 users in the system: Alice and Bob. They want to make an appointment, but so as not to reveal too much information about their weekly schedule. Is it possible to write a program in a multi-user Linux / Unix / Windows system so that it has simultaneous access to the calendars of both Alice and Bob and guarantees the confidentiality of both users of the system?
The easiest way is to ask the “superuser” to write such a program, or at least correctly assign rights to an existing solution. But this way creates at least 2 problems:
1. There is no guarantee that the program does not contain logical errors and, for example, does not copy Alice’s data elsewhere (or the administrator will assign rights incorrectly).
2. You need to trust 100% to the “superuser” and in addition, such a process is non-interactive, ie wait until the admin writes such a program or sets the rights.
The solution to the first problem is carried out using systems with support for managing information flows.
In general, systems with support for managing information flows are conditionally divided into 2 categories: centralized and distributed (decentralized). Famous SELinux and AppArmor are all centralized.. In the same article, I will try to talk about decentralized systems, using the Flume OS as a research (therefore, completely unsuitable for real use, unfortunately), as had some experience of “communicating” with her. Decentralized systems allow you to get rid of the second problem - dependence on the superuser.
(Distributed) Information Flow Control
In short, the idea of controlling the flow of information is trivial and consists in tracking how the data "flows" in the system from sender to receiver. The main task of the system is to prevent unauthorized data leakage from it. In general, no program (except “privileged”) can have simultaneous (in the context of program life) access to private data and any “sink of information” (sink), such as a monitor, printer, socket (AF_INET). Those. if the program read my personal files once, then after that the system will not allow this program to have access to the network.
In order to make data private, you must explicitly indicate this, for example, using special flags / tags. Here is the main difference between centralized and distributed systems. In the first case, there is a special user - the "security manager", who is responsible for the correct "tagging" of data and determining the access rights of various programs to such data. For example, you can assign a “top secret” tag to files with your passwords or personal income information and allow access to it only for Vim / Emacs without rights (1) export this data anywhere and (2) remove these tags. Thus, even if you compromise your text editors, the system (assuming that the system itself is safe and works without errors) will not allow you to save these files somewhere in the system (/ tmp) with other more permissive tags and send them in any way to the Internet. I did not work with SELinux, so I refer you to the official manuals for further information.
In distributed systems, any program / entity can create its own tags, assign rights and give access to its data for other programs.

In Flume OS, you can create a tag to access some personal data. And you have a choice. You can give public access to the assignment of this tag and / or to delete it. Suppose that we created the tag tag1 and granted the right {tag1 +} to public access, then any program can put this tag in its own set of tags. If we create a file F and associate it with the tag1 tag, then any p1 process can include this tag1 in its tag set and after that it will be able to read all the data marked with tag1. However, since {tag1-} is not in the public domain, this process will not be able to remove tag1 from its set of tags and from now on can only communicate with processes, with a set of tags that are a superset of the similar set of process p1.
In principle, the system should ensure that a process can send a message to another only if the recipient has at least the same set of labels (or even more) as the sender, and also that no process with a non-empty set has access to " "information" (by induction mat it is proved that a system with such conditions is safe). Disclaimer: the original article has a more formal wording and, in addition to the security tag, there is still the concept of the integrity tag, but I will not discuss it in this article.
Flume- This is one of the developed systems that ensures the "correctness" of the flow of information. At the system level, Flume is Linux with a modified LSM system that intercepts basic system calls, stores information about tags, tags, and checks for the correct flow of data from one process to another.
Now back to the example of the calendar of Alice and Bob. In Flume OS, Alice will assign tag A for her calendar, and Bob for her own - tag B. Alice will give {A +} public access, and Bob {B-}. Bob runs the program labeled {A, B}, that is, with the ability to access both calendars. This program finds several convenient time intervals where neither Alice nor Bob are busy, removes tag B ({B-} in the public domain) and writes the result to file F, which receives the automatic tag A ({A-} is not in open access). Alice opens the file F, because she is the owner of tag A and selects a specific date from the list of "suggested by Bob." Just in case, I’ll remind you that {B +} is not in the public domain, so Alice cannot read Bob’s calendar.
Conclusion
Unfortunately, I will not be able to cover all areas of applicability of DIFC ideas (even those that I used as a motivation for the problem). There are many excellent articles on this topic, starting from the most classic (Jiff) to fairly recent HiStar / DStar or Resin. If there is interest in this topic, I can talk in more detail / formally about, for example, the Resin framework from MIT. At one time, I was lucky enough to talk with Barbara Liskov (which is probably one of the main authorities in this area) on the topic of controlling the flow of information, the applicability of these principles to other tasks, and just got sick with this topic. There are several interesting “visions” for the development of this idea: W5 (world wide web without walls) or Fabric . But this is a completely different story ...
Motivation
Almost the only way to ensure the security and privacy of data in the system is considered to be authentication (answering the question: “Who said that?”) And authorization (“What does he have the right to do with this data”). Those. if the program needs access to some data, we actually have 2 options: refuse or believe. If we do not trust this program, we lose the ability to work with it and possibly lose important functionality. If we decide that we trust the program (and / or its developers), then the program actually becomes the “sovereign master” of this information (or copy). Such a principle in the literature is called All-Or-Nothing - "All or Nothing."
Naturally, this principle is not flexible enough and, in addition, is the main reason for many vulnerabilities in systems, such as buffer overflows. In general, this principle does not allow the creation of more interesting applications where access rights are not limited to traditional ones: “no access”, “read only” and “read / write”. It turns out that there are systems that allow much more flexible differentiation of access rights to data - systems with support for managing information flows. The most important feature of these systems is that they follow the data throughout their life cycle in the system. Recall that traditionally the system is responsible only for initial access to data, for example, checking whether a program has access to a file, and what the program does with this data, the system is no longer interested.
A classic example. Let's say that there are 2 users in the system: Alice and Bob. They want to make an appointment, but so as not to reveal too much information about their weekly schedule. Is it possible to write a program in a multi-user Linux / Unix / Windows system so that it has simultaneous access to the calendars of both Alice and Bob and guarantees the confidentiality of both users of the system?
The easiest way is to ask the “superuser” to write such a program, or at least correctly assign rights to an existing solution. But this way creates at least 2 problems:
1. There is no guarantee that the program does not contain logical errors and, for example, does not copy Alice’s data elsewhere (or the administrator will assign rights incorrectly).
2. You need to trust 100% to the “superuser” and in addition, such a process is non-interactive, ie wait until the admin writes such a program or sets the rights.
The solution to the first problem is carried out using systems with support for managing information flows.
In general, systems with support for managing information flows are conditionally divided into 2 categories: centralized and distributed (decentralized). Famous SELinux and AppArmor are all centralized.. In the same article, I will try to talk about decentralized systems, using the Flume OS as a research (therefore, completely unsuitable for real use, unfortunately), as had some experience of “communicating” with her. Decentralized systems allow you to get rid of the second problem - dependence on the superuser.
(Distributed) Information Flow Control
In short, the idea of controlling the flow of information is trivial and consists in tracking how the data "flows" in the system from sender to receiver. The main task of the system is to prevent unauthorized data leakage from it. In general, no program (except “privileged”) can have simultaneous (in the context of program life) access to private data and any “sink of information” (sink), such as a monitor, printer, socket (AF_INET). Those. if the program read my personal files once, then after that the system will not allow this program to have access to the network.
In order to make data private, you must explicitly indicate this, for example, using special flags / tags. Here is the main difference between centralized and distributed systems. In the first case, there is a special user - the "security manager", who is responsible for the correct "tagging" of data and determining the access rights of various programs to such data. For example, you can assign a “top secret” tag to files with your passwords or personal income information and allow access to it only for Vim / Emacs without rights (1) export this data anywhere and (2) remove these tags. Thus, even if you compromise your text editors, the system (assuming that the system itself is safe and works without errors) will not allow you to save these files somewhere in the system (/ tmp) with other more permissive tags and send them in any way to the Internet. I did not work with SELinux, so I refer you to the official manuals for further information.
In distributed systems, any program / entity can create its own tags, assign rights and give access to its data for other programs.

In Flume OS, you can create a tag to access some personal data. And you have a choice. You can give public access to the assignment of this tag and / or to delete it. Suppose that we created the tag tag1 and granted the right {tag1 +} to public access, then any program can put this tag in its own set of tags. If we create a file F and associate it with the tag1 tag, then any p1 process can include this tag1 in its tag set and after that it will be able to read all the data marked with tag1. However, since {tag1-} is not in the public domain, this process will not be able to remove tag1 from its set of tags and from now on can only communicate with processes, with a set of tags that are a superset of the similar set of process p1.
In principle, the system should ensure that a process can send a message to another only if the recipient has at least the same set of labels (or even more) as the sender, and also that no process with a non-empty set has access to " "information" (by induction mat it is proved that a system with such conditions is safe). Disclaimer: the original article has a more formal wording and, in addition to the security tag, there is still the concept of the integrity tag, but I will not discuss it in this article.
Flume- This is one of the developed systems that ensures the "correctness" of the flow of information. At the system level, Flume is Linux with a modified LSM system that intercepts basic system calls, stores information about tags, tags, and checks for the correct flow of data from one process to another.
Now back to the example of the calendar of Alice and Bob. In Flume OS, Alice will assign tag A for her calendar, and Bob for her own - tag B. Alice will give {A +} public access, and Bob {B-}. Bob runs the program labeled {A, B}, that is, with the ability to access both calendars. This program finds several convenient time intervals where neither Alice nor Bob are busy, removes tag B ({B-} in the public domain) and writes the result to file F, which receives the automatic tag A ({A-} is not in open access). Alice opens the file F, because she is the owner of tag A and selects a specific date from the list of "suggested by Bob." Just in case, I’ll remind you that {B +} is not in the public domain, so Alice cannot read Bob’s calendar.
Conclusion
Unfortunately, I will not be able to cover all areas of applicability of DIFC ideas (even those that I used as a motivation for the problem). There are many excellent articles on this topic, starting from the most classic (Jiff) to fairly recent HiStar / DStar or Resin. If there is interest in this topic, I can talk in more detail / formally about, for example, the Resin framework from MIT. At one time, I was lucky enough to talk with Barbara Liskov (which is probably one of the main authorities in this area) on the topic of controlling the flow of information, the applicability of these principles to other tasks, and just got sick with this topic. There are several interesting “visions” for the development of this idea: W5 (world wide web without walls) or Fabric . But this is a completely different story ...