Increasing trust in ROI while maintaining the confidentiality of personal data

This post is an answer to habrahabr.ru/post/244753

To increase trust and transparency in the ROI, you can apply a fairly simple solution described in this post. When a user votes in favor, against, or withdraws his vote for any initiative, it is necessary for the ROI to generate a special verification code, but not containing the user's personal information. A list of such codes should be publicly available. Thus, everyone could check the result of recording their voting in the public domain. This technical solution is simple, makes it possible to control the counting of votes to some extent, and thus increases the confidence of citizens in the ROI.

Proposed protocol of action.

To begin with, the ROI generates a 2048 bit RSA key pair (SK, PK) for use as an electronic digital signature (EDS), where SK is the secret key and PK is the public key. The public key is published in the public domain, and the secret is stored and used only on the ROI server. Such keys can be one for the entire ROI or many different for different initiatives. For example, you can generate a separate key for each initiative. Or update the key for the entire ROI from time to time. To identify the key, we will use the concept of “key version” (or index, key number). Additionally, but not at all necessary for the first version of the system, the ROI can publish a key certificate.

The structure and generation of codes that should be publicly available.

1.When a user votes, the ROI forms the following vector V, 49 bytes long:
Key version (number): 4 bytes Initial
number: 4 bytes
Event time (UTC time): 8 bytes
Event type: 1 byte (FOR, AGAINST, REVIEW)
Voter hash user, calculated as
H = SHA256 (UserSecret; SNILS; Initiative Number): 32 bytes.

UserSecret is a secret, the source of which is the user himself. For example, it can be a password for a vote that a user enters on the ROI website when voting. If the ROI technically allows it, it can be a password when entering the ROI or its hash, in which case the ROI can automatically substitute it in the field. In any case, it should be what comes from the user, and that he knows one and no one else. About the reasons for the innovation - read the spoiler comments.
UPD: Changes to the first edition, motivation
It was: H = SHA256 (SK; SNILS; Initiative Number): 32 bytes.
As a result of discussions, it became clear that the system has a hole. Imagine the following scenario: ROI caches the results and issues them, say, once every 5 minutes. Let's say some user voted, and the correct code is generated and sent to him (Vx, Sx). Suppose someone else also votes in these 5 minutes, and the ROI can send him not a new code, but the code of the previous user (Vx, Sx). Later, when checking their codes, both users will see that their code is on the statement, but the counter at the ROI will only increase by 1, and this cannot be verified. That is, we will get a scenario in which a voice may disappear at the ROI. The weak link here is the Hx vector. Therefore, it is necessary that the user himself can verify the correctness of the initial data of this vector, but no one else could. This complicates the system a bit.

It should be noted that if a user performs several actions, for example, FOR-REVIEW-AGAINST, then UserSecret must remain unchanged for all these actions, so that in the Hx logs it is the same for these operations for one user and the selected initiative. In the case when it is impossible to recall the voice (as it is now at the ROI), this question becomes irrelevant and there is no problem here.

Corollary: Now the user needs not only to verify that his code (Vx, Sx) is in the list of logs being uploaded, but also to verify that the unique vector Hx from Vx was correctly generated.


2. Further, the ROI uses the corresponding RSA secret key SK in order to receive the second part of the code - digital signature (EDS):
S = RSA_Sign (SK; V) - the result will be 256 bytes.

3. A pair (V; S) is sent by e-mail to the voted user, and is also placed in public access (for example, in PEM text format).

Pros:

• Any person using the open list of pairs {V; S} can calculate the total number of votes for the selected initiative, having previously verified each Vx value using the public key of ROI as follows: RSA_Verify (PK; Sx; Vx) returns the value “success” or "Not success." In fact, the function decrypts the Sx signature using the PK public key and checks the result against Vx, if it is equal to success.

• Anyone on the list {V; S} can find their voting code, as it must be sent by e-mail to the user immediately after the vote. If the code is not found in the general list, the user can present his pair (Vx, Sx) as evidence of an unaccounted voice. In addition, the user must verify that his own vector Hx from Vx was generated correctly, excluding the possibility of a scenario where the ROI sends the same pair (Vx, Sx) to two users, and takes into account only one vote.

• Third parties will not be able to talk about the unaccounted for Vx voice without providing the corresponding Sx signature pair, since for this you need to know the ROI secret key. Thus, the ROI is protected from unreasonable claims of this kind.

• Field H is added to line V in order to uniquely identify the actions of the same user within the framework of the selected initiative. For example, the PROCESS-CANCEL sequence must be tied to a specific user so that this sequence can be tracked in the {V; S} event list. At the same time, the SNILS of the user itself is unavailable, since the secret key of the ROI is included in the hash that is generated on the ROI server. Neither the secret key nor the SNILS of a person can be recognized by the hash. And even if SNILS is known, it is impossible to find out the secret key of the ROI by hash. Also, it is impossible to verify how one or another person voted, knowing him SNILS, since the connection between SNILS and H is not public information, it is known only to the person who voted, as well as the result of voting - this information is now sent to the user by e-mail. Thus, this design does not change the current level of security of personal information (as a person voted), and there is no leakage of information via SNILS or the ROI secret key.

• When a lost user (Vx, Sx) is presented to a public by a specific user, a connecting pair is created between the voice and the specific person, and then it becomes clear to everyone how that specific person voted. But now the situation is similar - if I voted and my vote was not counted, then, stating this, I will publicly give out information about how I voted. However, the plus is that the lost voice (Vx, Sx) and, thus, the claim to the ROI, can be transferred to the public space anonymously, without giving out a connection with a specific user.

Cons:

• It is impossible to track the scenario in which the ROI can add votes FOR or AGAIN for non-existent SNILS. But this scenario is possible now.

Technical implementation:

To implement the idea, the ROI can install OpenSSL (an open and free cryptographic library that is widely used in many systems and also when establishing encrypted channels in IP connections, browsers and many other applications), and use it from its scripts for all of the above operations: generation RSA key (for digital signature), signing and hashing of SHA256. Key generation is a slow operation, but rare (once or when opening a new initiative). Private key signing and hashing are fast operations. OpenSSL can be used both from the command line or script, as well as from various compiled programming languages, such as C / C ++. Implementation does not require any infrastructural or other complex steps, and may well fit into several lines of script or code.

UPD: Clarifications and additions at the request of readers.
I have a feeling that if all the opinions are folded, then it will turn out zero. But I will try:

1. I clarified by text that the RSA key is generated on the ROI side for use as an EDS. Also RSA_Encrypt / RSA_Decrypt were replaced with RSA_Sign / RSA_Verify, respectively.

2. There was a question why not ECDSA 256 bit (elliptic curve digital signing algorithm). Yes, there are advantages in the size of the digital signature S. There will be not 256 bytes, as in the case of RSA, but 72. But there are also disadvantages in speed. The RSA_Verify operation is many times faster than ECDSA_Sign and Verify. And if we just change RSA_Sign / Verify to RSA_Encrypt / Decrypt and publish SK instead of PK, then we get a server that can quickly sign, many times faster than ECDSA.

3. Why not GOST? I heard about our Soviet standards from the bottom of my ear, I just know that some of them were copied from foreign ideas, and something was added there to make it look different. An example is the “GOST” cipher, created in the KGB (well, I know it by that name in the scientific community) which copies 3DES with slight variations. According to GOST hash algorithm, the same issues - I have no idea.

4. Let’s, to prevent the Chinese from asking such questions (they also have something of their own), as in paragraphs 2–3, we just replace it immediately: let the asymmetric algorithm be X and the hashing algorithm Y, and choose which one you want. Just follow the security level of at least 128 bits for the algorithms used (and preferably 256), but otherwise this is a matter of choice and does not greatly affect the essence.

Thanks!

Also popular now: