How did I blakecoin miner do

I don’t know how anyone, but the past 2017 shocked me with the rapid rise of Bitcoin. Now, of course, the excitement has already left, and in the 17th year everyone spoke about and wrote about cryptocurrencies.
I saw people trying to make money on cryptocurrencies. Who knows how. Someone bought all the video cards for all the savings and began to mine in the garage on their own. Someone invested in cloud mining. Someone is trying to organize their pool. Someone launched chocolate bitcoins into production, and someone produces mineral water:

I also began to study what these same bitcoins are. Once I even started my own research on the SHA256 algorithm and wrote an article here on the hub: " Is it possible to calculate bitcoins faster, easier or easier?". My research on hashing algorithms is still ongoing and still not nearly completed ... Maybe someday I’ll write a separate article about it. And now for now this ..
I tried to run the bitcoin miner in FPGA. I realized that the time had already passed, but I still wanted to touch the technology.At the end of last year, for some reason I suddenly remembered that I was completely idle with the Terasic DE10-Standard motherboard with the Intel Cyclone V 5CSXFC6D6F31C6 FPGA - this is the chip that has an integrated ARM processor. I thought it would be interesting to launch some kind of altcoin miner this board. Why? Invest in equipment I no longer necessary, and so it is. What matters is that pay earned more than it consumes energy.
Finding the right altcoin was quite simple. I was looking for ready-made projects for FPGA, which I can adapt to my board. There were not very many of them. In fact, as I understand it, there are only a few people around the world who have done FPGA projects and most importantly published them in the public domain, for example, on github.
Thus, I took the github.com/kramble/FPGA-Blakecoin-Miner project and adapted it to my existing Mars rover3 board , and also adapted this project for the DE10-Standard.
Actually, how I adapted the project for the Mars rover3 board is written here . For Cyclone V, in principle, everything is the same - only an revision of the quake project blake_cv, my sources are here .
To my regret, only three hashes of the blake function are placed in my Cyclone V.

There is a little lack of FPGA capacity for up to four hashes. I run a project at a frequency of 120 MHz and one blake hash is calculated in one clock cycle. So the productivity of my project is 120 * 3 = 360MH / sec. Not much honestly, however, as I said, I already had a board, and I do not need to return its cost ... Quartus also says that Fmax = 150MHz. You can try to raise the frequency, but I'm afraid I will have to install a cooler, it will buzz - well, not so much I need these crypts to listen to another buzz in the room.
The general idea of the project is this: the board has a chip that has both an FPGA and Dual-ARM:

When the board starts, the FPGA is first loaded from U-BOOT, then Linux starts and cgminer mining program is launched in it. At first I thought that I could arrange a virtual communication channel between ARM and FPGA, and this is actually possible, but it didn’t work out that way. The fact is that the cgminer miner program works with hardware miners via USB and uses the libusb library. That is, it’s easier for me to connect the FPGA to the Linux system via the USB-COM to FTDI converter, than to fence a drain by connecting the FPGA to the ARM bus. I had already done something like this and it was not very simple .
Now my “miner” looks like this (I installed a heatsink on the thermal paste on Cyclone V, otherwise it gets very hot):

To tell you the truth, the main problems I just had were not with the FPGA project, but with cgminer.
The problems are as follows:
1) Which cgminer should be taken as the basis for its development? And the related question is “Where to connect to start mining?”. And what is the relationship between these issues? It would seem where the problem is here - take the freshest cgminer you find. But excuse me: there are 98 forks of the cgminer program on github. They all differ in something, which one is good, and which one is bad, which one is at least at least a worker? Here you have the openource. Each author added something to himself and corrected, or broke ... or made his own coin. It’s not easy to understand. I found a site for myself where on one page there is a link to both the github project and the github project for FPGA . That is, these two projects can somehow and should intersect.
2) Since I took as a basis the FPGA project from the author of kramble, then in fact, of course, it would be logical to take his patches, which he attached to his project. But here it is not without problems. He has patches for cgminer-3.1.1 and cgminer-3.4.3. I decided it was better to take the one that is newer than 3.4.3, but only lost time with her. It seems that the author began to adapt for this version, but something was not brought to the end there and this version is completely raw. I had to take 3.1.1, but it seems like a generally old version.
3) Authors changing the cgminer program in their forks for their altcoins do not follow the correct comments and naming functions in the code. Often in the code here and there the word bitcoin is found, and this fork cgminer itself seems to be unable to count for bitcoin, but only to altcoin.
4) Tests. WHERE TESTS? I don’t understand something, how can I make a complex product without tests? I did not find them.
To tell you the truth, even starting to do something was not easy. Imagine that you need to run some project in FPGA, but it is not very clear what it should do, how to receive data, what data and in what form it is necessary to produce a result. This FPGA project should be accompanied by some program, which is not known exactly where to get it, but it must detect the miner board, send something there (it is not known what) and get something from it. In what format, by which blocks, how often - nothing is known.
In fact, when studying kramble's cgminer patches, I can barely imagine how it should work.
The usbutils.c file contains devices that can be considered as hardware external miners on the USB bus:
static struct usb_find_devices find_dev[] = {
#ifdef USE_BFLSC
{
.drv = DRV_BFLSC,
.name = "BAS",
.ident = IDENT_BAS,
.idVendor = IDVENDOR_FTDI,
.idProduct = 0x6014,
//.iManufacturer = "Butterfly Labs",
.iProduct = "BitFORCE SHA256 SC",
.kernel = 0,
.config = 1,
.interface = 0,
.timeout = BFLSC_TIMEOUT_MS,
.latency = LATENCY_STD,
.epcount = ARRAY_SIZE(bas_eps),
.eps = bas_eps },
#endif
...
{
.drv = DRV_ICARUS,
.name = "BLT",
.ident = IDENT_BLT,
.idVendor = IDVENDOR_FTDI,
.idProduct = 0x6010,
//.iProduct = "Dual RS232-HS",
.iProduct = "USB <-> Serial Cable",
.kernel = 0,
.config = 1,
.interface = 1,
.timeout = ICARUS_TIMEOUT_MS,
.latency = LATENCY_STD,
.epcount = ARRAY_SIZE(ftdi2232h_eps),
.eps = ftdi2232h_eps },
I added the descriptor of my FTDI-2232H USB-to-COM converter to this structure. Now, if cgminer detects a device with VendorId / DeviceId = 0x0403: 0x6010, then it will try to work with this device as with an Icarus card, although it is not one.
Next, look at the driver-icarus.c file and there is the icarus_detect_one function:
static bool icarus_detect_one(struct libusb_device *dev, struct usb_find_devices *found)
{
int this_option_offset = ++option_offset;
struct ICARUS_INFO *info;
struct timeval tv_start, tv_finish;
/* Blakecoin detection hash
N.B. golden_ob MUST take less time to calculate than the timeout set in icarus_open()
0000007002685447273026edebf62cf5e17454f35cc7b1f2da57caeb008cf4fb00000000dad683f2975c7e00a8088275099c69a3c589916aaa9c7c2501d136c1bf78422d5256fbaa1c01d9d1b48b4600000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
{ midstate, data } = { 256'h553bf521cf6f816d21b2e3c660f29469f8b6ae935291176ef5dda6fe442ca6e4, 96'hd1d9011caafb56522d4278bf };
*/
const char golden_ob[] =
// "553bf521cf6f816d21b2e3c660f29469"
// "f8b6ae935291176ef5dda6fe442ca6e4"
// "00000000000000000000000000000000"
// "00000000d1d9011caafb56522d4278bf";
//-----------
"a8c369073d7dc0a63168f5fcf0246e4f"
"eb916bda12787ad1607d2303186ed8f1"
"00000000000000000000000000000000"
"0142b9a0e7b4001cf8b35852a3accab0";
const char golden_nonce[] = "0142b9b1"; //"000187a2";
const uint32_t golden_nonce_val = 0x0142b9b1; //0x000187a2;
unsigned char ob_bin[64];
unsigned char nonce_bin[ICARUS_READ_SIZE];
char *nonce_hex;
int baud, uninitialised_var(work_division), uninitialised_var(fpga_count);
struct cgpu_info *icarus;
int ret, err, amount, tries;
bool ok;
char tmpbuf[256]; //lancelot52
unsigned char* wr_buf = ob_bin;
int bufLen = sizeof(ob_bin);
icarus = usb_alloc_cgpu(&icarus_drv, 1);
if (!usb_init(icarus, dev, found))
goto shin;
usb_buffer_enable(icarus);
get_options(this_option_offset, icarus, &baud, &work_division, &fpga_count);
hex2bin(ob_bin, golden_ob, sizeof(ob_bin));
tries = 2;
ok = false;
while (!ok && tries-- > 0) {
icarus_initialise(icarus, baud);
err = usb_write_ica(icarus, (char *)wr_buf, bufLen, &amount, C_SENDTESTWORK);
if (err != LIBUSB_SUCCESS || amount != bufLen)
continue;
memset(nonce_bin, 0, sizeof(nonce_bin));
ret = icarus_get_nonce(icarus, nonce_bin, &tv_start, &tv_finish, NULL, 500);
The meaning is this. The program sends the known-known hash search task to the board, and the task says with which nonse to start the calculation and this nonse is slightly smaller than the real GOLDEN nonce. Thus, the board will start to read from the specified location and literally immediately in a matter of seconds will stumble upon a GOLDEN nonce and return it. The program will immediately receive this result, compare it with the correct answer and immediately it becomes clear - this is really the HW miner with which you can work or not.
And here there was a terrible problem - the project has patches in C, there is a test program in python and a testbench for FPGA.
In C patches, test data looks like this:
1) patch for cgminer-3.1.1
const char golden_ob[] =
"553bf521cf6f816d21b2e3c660f29469"
"f8b6ae935291176ef5dda6fe442ca6e4"
"00000000000000000000000000000000"
"00000000d1d9011caafb56522d4278bf";
const char golden_nonce[] = "00468bb4";
const uint32_t golden_nonce_val = 0x00468bb4;
1) patch for cgminer-3.4.3
const char golden_ob[] =
"553bf521cf6f816d21b2e3c660f29469"
"f8b6ae935291176ef5dda6fe442ca6e4"
"00000000000000000000000000000000"
"00000000d1d9011caafb56522d4278bf";
const char golden_nonce[] = "000187a2";
const uint32_t golden_nonce_val = 0x000187a2;
And what's right and what's wrong? The initial data is the same, and golden nonce is declared different !!! It’s a paradox ... (I’ll say in advance that the error in the patch for cgminer-3.4.3 is nonce 0x000187a2, but how much time I spent on it ..)
The project has a test program on python that reads a text file, extracts data from it and it transfers to the board via the serial port ... There the test data is like this: Well, that is completely different! Then I realized that this is not the data that is sent to the board, only data is extracted from these, converted into a task in a special way and sent to the board. But still, among these test data for the python program there is NO task similar to that described in the C program !!!
0000007057711b0d70d8682bd9eace78d4d1b42f82da7d934fac0db4001124d600000000cfb48fb35e8c6798b32e0f08f1dc3b6819faf768e1b23cc4226b944113334cc45255cc1f1c085340967d6c0e000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
0000007057711b0d70d8682bd9eace78d4d1b42f82da7d934fac0db4001124d6000000008fa40da64f312f0fa4ad43e2075558faf4e6d910020709bb1f79d0fe94e0416f5255cc521c085340df6b6e01000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
0000007095696e4529ae6568e4b2a0057a18e82ccf8d370bf87e358900f8ab5000000000253c6078c7245036a36c8e25fb2c1f99c938aeb8fac0be157c3b2fe34da2fa0952587a471c00fa391d2e5b02000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
000000704445e0446fcf2a84c47ce7305722c76507ba74796eaf39fe0007d44d00000000cac961f63513134a82713b172f45c9b5e5eea25d63e27851fac443081f453de1525886fe1c01741184a5c70e000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
00000070a3ac7627ca52f2b9d9a5607ac8212674e50eb8c6fb1219c80061ccd500000000ed5222b4f77e0d1b434e1e1c70608bc5d8cd9d363a59cbeb890f6cd433a6bd8d5258a0141c00b4e770777200000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
000000706c90b789e84044d5be8b2fac01fafe3933ca3735269671e90043f8d900000000d74578c643ab8e267ab58bf117d61bb71a04960a10af9a649c0060cdb0caaca35258b3f81c00b4e7b1b94201000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
00000070171d2644781cccf873ce3b6e54967afda244c47fc963bb240141b4ad00000000d56c4fbdc326e8f672834c8dbca53a087147fe0996d0c3a908a860e3db0589665258da3d1c016a2a14603a0a000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
00000070d03c78cb0bb0b41a5a2c6ce75402e5be8a705a823928a5640011110400000000028fb80785a6310685f66a4e81e8f38800ea389df7f16cf2ffad16bb98e0c4855258dda01c016a2ae026d404000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
0000007091a7eef446c4cb686aff8908ab5539d03a9ab2e975b9fe5700ed4ca9000000000f83bb385440decc66c10c0657fcd05f94c0bc844ebc744bba25b5bc2a7a557b5258e27c1c016a2a6ce1900a000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
00000070856bd0a3fda5dac9ede45137e0c5648d82e64fbe72477f5300e96aec0000000026ca273dbbd919bdd13ba1fcac2106e1f63b70f1f5f5f068dd1da94491ed0aa45258e51b1c017a7644697709000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
Well, then I look at the test program-testbench on verilog:
blakeminer #(.comm_clk_frequency(comm_clk_frequency)) uut
(clk, RxD, TxD, led, extminer_rxd, extminer_txd, dip, TMP_SCL, TMP_SDA, TMP_ALERT);
// TEST DATA (diff=1) NB target, nonce, data, midstate (shifted from the msb/left end) - GENESIS BLOCK
reg [415:0] data = 416'h000007ffffbd9207ffff001e11f35052d554469e3171e6831d493f45254964259bc31bade1b5bb1ae3c327bc54073d19f0ea633b;
// ALSO test starting at -1 and -2 nonce to check for timing issues
// reg [415:0] data = 416'h000007ffffbd9206ffff001e11f35052d554469e3171e6831d493f45254964259bc31bade1b5bb1ae3c327bc54073d19f0ea633b;
// reg [415:0] data = 416'h000007ffffbd9205ffff001e11f35052d554469e3171e6831d493f45254964259bc31bade1b5bb1ae3c327bc54073d19f0ea633b;
reg serial_send = 0;
wire serial_busy;
reg [31:0] data_32 = 0;
reg [31:0] start_cycle = 0;
serial_transmit #(.comm_clk_frequency(comm_clk_frequency), .baud_rate(baud_rate)) sertx (.clk(clk), .TxD(RxD), .send(serial_send), .busy(serial_busy), .word(data_32));
There is a prospective data packet that the board should accept. But again, this proposed data packet does not look like a data packet in a C program or data for a test program in python.
This lack of common test data for the python program, C and Verilog spoils the picture very much. It turns out that there are no common points of contact, common tests between the components, and this is sad.
In general, in the verilog project of the blakecoin miner, another uniform mockery of my body was hidden.
If we simulate a project with a verilog testbench, then in the simulator with these test data 416'h000007ffffbd9207ffff001e11f35052d5544 ... it’s wonderful to find and the result is GOLDEN nonce.
Then I compile the project for a real FPGA board, I submit the same data from the program in python and ... the board does not find GOLDEN nonce ...
It turns out that the test data in the verilog testbench is “a little bad”. They are for low complexity, when in the resulting hash there are only 24 leading zeros, and not 32, as required.
In the file experimental / LX150-FourPiped / BLAKE_CORE_FOURPIPED.v there is such a code
reg gn_match_d = 1'b0;
always @(posedge clk)
`ifndef SIM
gn_match_d <= (IV7 ^ b76 ^ d74) == 0;
`else
gn_match_d <= (IV7[23:0] ^ b76[23:0] ^ d74[23:0]) == 0;
`endif
Verilog simulator does not check how it will work in hardware! That is, for a real FPGA board we will check for 32 bits of leading zeros, and in the simulation we will check only 24 bits. This is just lovely. I want to beat the author.
Of course, I won all this. At least the test program in python gives cheerful messages:

Okay, what’s the result? How much has mine? Unfortunately, not at all.
As soon as I was ready to start mining, literally at the end of January, the complexity of the blake increased greatly:

Now I could leave a fee for a day and although it could find solutions, they were not accepted by the pool - there are still few leading zeros.
I tried to switch to another currency - VCASH. With this currency, the pool at least occasionally gave me invigorating messages like this:

But still, the VCASH pool does not charge anything. Sadness, trouble.
I would like to take this opportunity to ask knowledgeable people. Here I have an Nvidia 1060 video card. It gives 1.25GHash / sec on a blakecoin and in two or three hours it gives out nonce, which takes a pool (and charges a pretty penny). I thought that if my FPGA board considers 360MHash / sec, well, that is, about 3 times worse than a video card, then in two hours I will get at least one nonse accepted by the pool. However, this does not happen. Even for a day there is not a penny ... Where is the catch for me and there’s a mystery ...
Now, in my spare time I’m trying to figure out whether I can somehow optimize an existing FPGA project, say, use the built-in memory or something else. Maybe, with luck, I’ll come up with something.