Problems of developing really fast software nowadays

    Firewood is being sawn, saws are being improved, boards are long and long,
    but the speed of our programs is not comparable with the size of these boards ...
    Once I thought of writing a big-wide compiler once in my 18s, I wrote a whole notebook of ideas for it.
    So he died for the eternal optimization of his own code ... =)

    I decided to introduce some of my ideas to the public
    and if something interests them, please contact me to determine the fake activity.
    Simply put - I'm looking for friends to develop a self-optimizing compiler based on datamining and genetic algorithms + a lot of fun goodies in the standard library.

    And so here my small introduction of the first post on a habra begins.
    This unsubscribing does not require a full disclosure of the topic, but simply explains my position
    on the existing compilation and code processing systems that I use in my developments.

    Well, let's begin ...

    We all know that threads are good: they allow us to “parallelize” our programs
    and use them “100%” on multi-core architectures.
    People often think: “Ohhh, maybe I’ll declare several threads to handle this cycle,
    or maybe it was once executed on a 64 nuclear machine. ”
    And they don’t understand how paradoxical this sounds.

    First: respect our grandfathers-codders, having heard this, they would have had a second heart attack.
    Why declare a static number of structures to handle the dynamic number of other structures.

    Secondly: is it profitable? Now imagine 16 threads on a 4-core processor - a standard situation. Is not it? it turns out that we divide everything into 16 small parts and add in front of them the lion's share of the recurring threat api, and it will not be easier to declare only 4 threads and win performance on this by reducing the time it takes to declare initialization and destruction of flows ...

    Thirdly: Who said that the distribution of tasks to all threats will be equally divided. And it doesn’t work out that 3 threads are waiting for the result of one.

    Fourth: memory corrupt, memory poisoning, dead locks and so on ... All OpenMP users and
    Pthreats encountered similar problems. Although they were solved long ago with the help of erlang

    . Fifth: CUDA, OpenCL, DirectCompute Everyone remembered that the video card is a processor ...
    Not much time has passed. And the meaning of these developments? Increase performance, etc.
    So the personal dump of the good old Photoshop CS4 shows that 40-60% of the time can be
    spent on “changing the CUDA kernel modes” as Nvidia’s developers called it.
    People forgot that the shader conveyor has limited accuracy and a set of commands.
    So he sometimes has to dump part of the code for execution to the processor.

    Sixth: how many extensions for drawing graphics do we know? .. yes, it's full.
    And how many of them are used in Cuda?

    Yes, maybe you are an advanced mego-codder, or rather I am a simple student.
    And for you the game of streams (and grandfather locks) is quite familiar for the last years 3.
    Well, rummage (let's say) you ... Rummage ...
    Is it just a game of candles? How much harder is debug?
    How complicated is the synchronization / asynchrony of threads?
    Or don’t you bother yourself? .. It works, the customer’s word is the law, etc.

    So I began to rake this mess. Fun is not selfish and reckless ...
    It is necessary to automate / optimize it.

    To be continued, maybe this is not enough, but in business ...
    In the second post I will tell you what I want to do.
    I hope for understanding and soft criticism "on the topic."

    Thanks for attention.

    Also popular now: