franzose April 20, 2014 at 07:30

One of the opinions on the future of PHP

Transfer

Recently in the community of developers there has been a very lively discussion of everything related to PHP and its future. What pleases - most of these conversations are held in a positive manner. Discussions are popular about PHP 6 and what it might look like. People ask a lot of questions about HHVM and its role in the future of the language and community. So let me share with you some of my thoughts on this.

About backward compatibility

I believe that each next release is required to maintain backward compatibility with the previous one: 6, 7, 99, “elephant enthusiast” - call it whatever you like. And now I will say “mostly”, because some incompatibilities will still occur. But these incompatibilities must be justified and controlled. They should also be aimed only at reviewing the behavior of borderline cases and all that. Although this does not mean that there can be no serious internal reorganization and striving for the purity and simplicity of things. This means that incompatibilities should not obstruct developers.

This approach is very easy to verify:

The code you write should run without problems in both PHP 5.x and PHP 6.x (and any two consecutive major releases).

Why is it important? Take a look at the transition from PHP 4 to PHP 5. It was easy for programmers to write code that worked on both versions, although the final transition to PHP 5 took about 10 years. And imagine, if it were difficult to do this?

Although, as it turns out, nothing needs to be presented. This is exactly what happened to Python. The first release of Python 3 came out about 5 years ago. And today, in 2014, he is still not fully engaged. Not because it is bad, but because it is very difficult to use a single code that would work without problems on both versions. That is, you use either Python 2, or its functionality that will work in Python 3 (as a result, losing the advantages of both). And if the libraries or platforms you need do not have a version for Python 3, you just have to port them yourself, or well ... you are just out of luck. In fact, this is exactly what happens.

I do not want to say that this approach is wrong: the language acquires a million different goodies from such changes. But, it seems to me, for the community and the average user, such a transition is still unnecessarily cardinal.

About rewriting the engine

Many people say: you need to rewrite the PHP engine. Despite the fact that I definitely see advantages in this (yes, the engine is very complicated), I have to ask a question: is this really necessary? Where is the fundamental dog buried? Sure, the PHP engine has architectural miscalculations, but by and large it works well.

So I would prefer to see the transition of the engine to a component basis, its division into subsystems. Today, this has already been partially done. But I would like to see changes that would make the engine truly component. Why is it important? Because with this approach, individual improvements will be able to make a significant contribution to the development of the engine.

For example, at the moment the most confusing part of PHP is the parser and compiler. They are so closely connected and confused that this leads to a lot of problems in development. On the other hand, if they were separate components of the engine, then the parser that the compiler would be much easier to replace. And their common part could be a certain Abstract Syntax Tree. Why AST? Since this is a kind of general idea that both components could use. Yes, it would have been a lot of work to do well, but the benefits would not have been long in coming: from consistent and more predictable syntax to adding the ability to define your own syntax using PHP itself (imagine the ability to define DSL in PHP, which are actually part of language).

So there’s no need to rewrite it again. Refactor and clean.

About the transition of the standard library to the object-oriented approach

Some people suggest moving the standard PHP library to an object-oriented approach: even scalar types would have object behavior. That is, you could write something like the following:

$string = "Foo";
var_dump($string->length); // 3
var_dump($string->toLower()); // string(3) "foo"
// etc

I don’t think that this needs to happen, although I admit it sounds cool.

The reason is simple: scalars are not objects. But, most importantly, they do not belong to any type whatsoever. PHP relies on a type system that thinks strings are integers. The flexibility of the system also lies in the fact that any scalar type can easily be converted to another scalar type. Of course, this is not always good, because because of this a very large number of errors occur.

However, such situations could be resolved by more specific behavior. For example, you could throw a catch warning or an exception when trying to "dirty" type conversion, so if someone tried to cast "123abc" to an integer, you would receive a message about partial data loss.

Even more important, if you have such a type system, you cannot 100% know what type a variable has at a given time. You can assume various options, however, what is really there is not known. The situation will not change very much even after the type conversion or if the language supports the hints of scalar types, since these types can still be changed later.

Thus, all this means that with an object-oriented approach, all scalar operations should have been bound to all scalar types. Which would lead to an object model in which scalars would have not only mathematical methods, but also methods for working with strings. What nonsense ...

Becoming an HHVM

Today, at the time of this writing, I do not recommend using HHVM in production. There are several reasons for this. All of them are known and are not fundamental. Time will tell whether they can be resolved, but I really hope so.

HHVM is controlled by one company. Do not get me wrong, the problem is not that Facebook spends a lot of money on development. But the fact is that the project is controlled by a company whose business does not depend on whether you use HHVM or not. It is one thing if they provided paid support and made HHVM a full-fledged product. Another - that now it is neither an open source project, nor a commercial project - something in between. And I would be very tense when translating production to HHVM in such a situation.
HHVM does not have a public specification, that is, in general, you will program in the same way as with the Zend engine. However, this is a trial and error method, because everything will be fine until you try to support multiple implementations. As a library developer, I already felt this in my own skin. On the other hand, if HHVM and PHP eventually came to some common specification, many things would become much simpler ...
HHVM is a closed source project, although it accepts code from third-party developers (already good). However, the pull request and patch stream does not produce an open source project. Well, where is the clarity of the process? Where is the clarity of perspectives? Where is the openness of participation? Where is the leadership?

At the same time, I know that I am not alone in my opinions. HHVM will be a strong contender in the future, but I believe that until the above issues have been resolved, the time for HHVM in commercial production has not come.

Can PHP and HHVM coexist?

Naturally. Although some tests look convincing, JIT compilers are not magic. They compromise with our real world: many tests reveal this. Well, in fact, if you look closely at the vast majority of tests, you will notice that they do not execute "real" code. Stop-stop, that is, do you compare the performance of HelloWorld or the Fibonacci number generator ?! Well, good luck, just calm down now, please, and throw away all these useless results.

Let me repeat that tests that do not use real systems are useless: this is nonsense and even worse - they are simply dangerous.

In practice, there are tasks that HHVM can handle much faster than PHP. But at the same time there are tasks where PHP will show its speed. The only way to check is to test your application.

But HHVM executes my code as native! How can PHP be faster?
Remember, I said JIT is not magic? So, this is actually so. You cannot compile PHP directly, as it is an interpreted programming language. Which means that you cannot know what code is in the compilation queue exactly until you run this code. So JIT does just that. It analyzes the executable code and, having received sufficient information about it, generates native code. This process is not overhead, because of this, the HHVM is slow in the console.

More importantly, JIT does not generate generic code. It generates code in accordance with the conditions that existed at the time this code was created. So if your function adds two integers, then such code could compile into a simple add statement. However, the compiler will also add instructions for checking parameters for an integer type. And if then you pass not a number to your function (which is normal from the position of PHP), one of the checks will give a false result.

When the check gives a false result, something like “failover” occurs. Simply put, the engine “cancels” everything that it compiled for this method and switches to interpreter mode. Carrying out such an operation is much more expensive than constant work in interpreter mode.

And this is just one reason JIT compilers are not magic.

I don’t want you to think now that I am against JIT compilers. On the contrary, for most tasks, they will show a significant increase in productivity. But still they are not perfect.

Look at other communities and you will see virtual machine implementations along with JIT compilers. CPython and PyPy are good examples. It's also worth noting that Python has a language specification, so you can easily change one implementation for another.

But hack is cool!
Hack is a new programming language developed by Facebook and included in HHVM. Roughly speaking, this is a statically typed version of PHP with some additional features ...

And the hack is awesome! I really want the HHVM problems I have indicated to be somehow solved, and I could contribute!

After all, this is an interesting idea. Now there are several metalanguages built on the basis of PHP. Leaders are Hack and Zephir. But there is a problem. Both are for a specific runtime: Hack runs on HHVM, and Zephir runs on PHP. How to resolve this?

Honestly, I would just drop Zephir and build a compiler from Hack to PECL. Since Hack is a statically typed language, there must be cross-compilation capability between Hack and PECL. And considering that Hack already supports C ++ bindings (for connecting system libraries), theoretically, the compiler should also handle this. In this case, it would not make sense to write the PECL extension. You would write your extension on Hack (which has static code analyzers, debuggers), and generate a completely compatible PECL extension. This thing, of course, is very nontrivial in implementation, but it would be great to try this! Here, by the way, is another argument in favor of the language specification.

About language specification

You probably noticed that I already mentioned several times in the text about the need for a language specification ...
I hint that this is the most important thing that could help improve the future of PHP as a language, platform, ecosystem and community.

To summarize

PHP is entering a very interesting phase of its development. People write cool things, drive progress. So if we want further growth of PHP, I think we should understand very well what we are doing when making this or that choice.

Tags: