Methods in primitive PHP types

Original author: Nikita Popov
  • Transfer
Some time ago, Anthony Ferrara expressed his thoughts on the future of PHP . I agree with most of his views, but not with all. In this article, I will focus on one specific aspect: the conversion of primitive data types, such as strings or arrays, into “pseudo-objects”, allowing you to make method calls on them.

Let's start with a few examples:

$str = "test foo bar";
$str->length();      // == strlen($str)        == 12
$str->indexOf("foo") // == strpos($str, "foo") == 5
$str->split(" ")     // == explode(" ", $str)  == ["test", "foo", "bar"]
$str->slice(4, 3)    // == substr($str, 4, 3)  == "foo"
$array = ["test", "foo", "bar"];
$array->length()       // == count($array)             == 3
$array->join(" ")      // == implode(" ", $array)      == "test foo bar"
$array->slice(1, 2)    // == array_slice($array, 1, 2) == ["foo", "bar"]
$array->flip()         // == array_flip($array)        == ["test" => 0, "foo" => 1, "bar" => 2]

Here $str- this is an ordinary string and $arrayis a simple array - they are not objects. We just give them some object behavior, letting them call methods.

Please note that this behavior is just around the corner. This is no longer a dream, something already exists right now. The PHP extension of scalar objects allows you to define methods for primitive types.



The introduction of support for calling methods in primitive types has several advantages, which I will discuss later:

Ability to clear API


Probably the most common complaint of anyone who has heard anything about PHP is the inconsistent and incomprehensible naming of functions in the standard library, as well as the equally inconsistent and incomprehensible order of parameters. Typical examples:

// различная концепция именования
strpos
str_replace
// совершенно непонятные имена
strcspn                  // STRing Complement SPaN
strpbrk                  // STRing Pointer BReaK
// инвертированный порядок параметров
strpos($haystack, $needle)
array_search($needle, $haystack)

Although this problem is often overestimated (we have an IDE) , it is difficult to deny that the current situation is not optimal enough. It should also be noted that many features have problems that go beyond the strange name. Often, all cases of behavior are taken into account properly and are therefore not processed, so there is a need to specifically process them in the calling code. For string functions, as a rule, these are checks for empty strings or offsets at the very end of the string.

The logical solution is to simply add a huge number of aliases to PHP6 for functions that will unify the names and parameters of the call. We will have string\\pos(), string\\replace(), string\\complement_span()or something like that. Personally for me(and many php-src developers seem to have a similar opinion) this doesn't make much sense. The current function names are deeply rooted in the muscle memory of any PHP programmer and there seems to be no reason to make a few trivial cosmetic changes.

On the other hand, the introduction of the OO API for primitive types makes it possible to redesign the API as a side effect of the transition to a new paradigm. It also allows you to start with a truly clean slate, without having to satisfy any expectations of the old procedural API. Two examples:

  • I would really like the methods $string->split($delimiter)and $array->join($delimiter), which are generally accepted, to have names that are normal for these functions (as opposed to explodeand implode) . On the other hand, it is very inconvenient to have a string\\split($delimiter)function despite the fact that there is already a function str_splitthat does a completely different thing (converts a string into an array).
  • I would like the new API to use exceptions for error reporting, both in the OO API, in which it is already taken for granted, and in the renamed procedural API. However, this approach goes against the current agreement, which states that all procedural functions should use warning to handle errors. Of course, this is not carved in stone, but I would not want to consciously start a holivar;)


My main goal in the OO API for primitive types is to start from scratch, which will allow us to implement a set of correctly designed solutions. But, of course, this is not the only advantage of such a step. OO syntax offers a number of additional benefits, which will be discussed below.

Readability improvement


Procedural calls usually do not stack in a sequential chain. Consider the following example:

$output = array_map(function($value) {
    return $value * 42;
}, array_filter($input, function($value) {
    return $value > 10;
});

At first glance, it is not clear what he called array_mapand what he turned to array_filter? In what order did they volunteer? The variable is $inputhidden somewhere in the middle between two closures, function calls are written in the reverse order, from how they are actually applied. Now the same example using OO syntax:

$output = $input->filter(function($value) {
    return $value > 10;
})->map(function($value) {
    return $value * 42;
});

I believe that in this case the order of actions (first filter, then mapping) and the original array $inputare shown more clearly.

The example, of course, is a little far-fetched, because you can always put out closures in variables or use auto-substitution and syntax highlighting in the IDE. Another example (this time from real code) shows roughly the same situation:

substr(strtr(rtrim($className, '_'), '\\', '_'), 15);

In this case, a number of additional parameters are '_'), '\\\\', '_'), 15completely confusing, it is difficult to associate the substituted values ​​with the corresponding function calls. Compare with this version:

$className->trimRight('_')->replace('\\', '_')->slice(15);

Here, the operations and their arguments are tightly grouped and the order in which the methods are called corresponds to the order in which they are performed.

Another bonus that comes from this syntax is the absence of the needle / haystack problem. While aliases allow us to eliminate this by introducing a naming convention, there is simply no such problem in the OO API:

$string->contains($otherString)
$array->contains($someValue)
$string->indexOf($otherString)
$array->indexOf($someValue)

There can be no confusion as to which part plays which role.

Polymorphism


PHP currently provides an interfaceCountable that can be implemented in classes in order to customize the output count($obj). Why is all this necessary? Due to the fact that we do not have polymorphism of functions. However, we have polymorphism of methods.

If arrays implement $array->count()as a (pseudo-) method, at the code level, you don’t have to worry that it $arrayis an array. This can be implemented in any other object using the method count(). In principle, we get the same behavior as when using it Countable, only without the need for any manipulation.

In fact, here lies a much more general solution. For example, you could implement a classUnicodeString, which implements all methods of type string, and then use regular strings and UnicodeStringinterchangeably. Well, at least in theory. Obviously, this will work only as long as the use is limited only to string methods, and will fail after using the concatenation operator, since full operator overloading is currently only supported for inner classes.

Nevertheless, I hope it is clear that this is a rather powerful concept. The same applies to arrays, for example. You could use a classSplFixedArray that behaves the same as an array, implementing the same interface.

Now that we’ve looked at some of the benefits of this approach, let's also look at some of the problems that you will encounter:

Lax typing


Quote from Anthony's blog:
[C] calaries are not objects, but, more importantly, they cannot be any types. PHP depends on a typing system that sincerely believes that strings are integers. A lot of system flexibility is based on the fact that any scalar type can be converted to any other type with ease. [...]

More importantly, however, because of this weak typing system, you cannot know for 100% what type the variable will be. You can tell how you want to relate to her, but you cannot explicitly indicate what will be under the hood. Even with type casting, you won’t get the ideal situation, because there are times when a type can still change.


To illustrate this problem, consider the following example:

$num = 123456789;
$sumOfDigits = array_sum(str_split($num));

Here it is $numprocessed as a string of numbers, which are separated by str_split, and then summed by array_sum. Now try to do the same using methods:

$num = 123456789;
$sumOfDigits = $num->chunk()->sum();

The method chunk()that is in stringis called from number. What's happening? Anthony offers one of the solutions:

This means that for all scalar operations, all scalar types must be observed. Which leads to an object model, where scalars have all the mathematical methods, as well as all the string ones. What a nightmare.

The quote already says that such a decision is unacceptable. However, I think that we can completely get rid of such cases by simply throwing an error (exception!). To explain why an idea has a right to life, let's take a look at what types in PHP can make a difference.

Primitive types in PHP


In addition to objects, PHP has the following types of variables:

null
bool
int
float
string
array
resource

Now let's think about what the list may actually have meaningful methods: You can immediately remove resource(legacy type) and look at the rest. Nulland boolobviously don't need methods unless you want to come up with abominations like $bool->invert().

The vast majority of mathematical functions do not look very good as methods. Consider:

log($n)        $n->log()
sqrt($n)       $n->sqrt()
acosh($n)      $n->acosh()

I hope that you agree that the math functions for reading are much nicer in the current notation. There are, of course, several methods that could reasonably be classed number. For example, it $num->format(10)reads pretty nicely. More on this. There is no real need for an OO number API, as there are few features that you can enable. In addition, the current mathematical API is not so problematic in terms of naming in accordance with mathematical operations, the names are quite standardized.

Only strings and arrays remain. We have already seen that there are many good APIs for these two types. But what does all this have to do with a weak typing problem? An important point is the following:

Although it is very often used to represent strings as integers, for example, coming via HTTP or from a database, the opposite is not true: it is very rare to need to use an integer as a string. The following code will confuse me:

strpos(54321, 32, 1);

Treating a number as a string is a rather strange job. I think it’s perfectly normal to require a cast in that case. Using the original example with the sum of the digits:

$num = 123456789;
$sumOfDigits = ((string) $num)->chunk()->sum();

Here we found out that, yes, you really don't need to cast a number to a string. It’s acceptable for me to use a similar hack in such cases.

The situation with arrays is even simpler: it makes no sense to apply operations to work with arrays with what is not an array.

Another factor that improves this issue is scalar type control (which is present in any version of PHP). If you use type control string, you will always need to supply a string at the input (even if there is no value passed to the function, depending on the details of the type control implementation).

But this does not mean that there is no problem at all. Due to improper function design, it can sometimes happen that an unexpected type sneaks into the code. For instance,substr($str, strlen($str)), someone very “wise” decided to return bool(false)instead string(0) "". However, this question only concerns substr. It has nothing to do with API methods, so you won’t come across this.

Object transfer semantics


Besides the problem with implicit typing, there is another semantic question about pseudo-methods in primitive types: objects and types in PHP have different semantic ways of interacting with each other. If we begin to allow methods to be called in strings and arrays, they will start to look like objects, and some people because of this may begin to expect that they have the semantics of the object. This problem applies to both strings and arrays:

function change($arg) {
    echo $arg->length(); // $arg выглядит как объект
    $arg[0] = 'x';       // а теперь нет :3
}
$str = 'foo';
change($str); // $str остается прежним
$array = ['f', 'o', 'o'];
change($array); // $array остается прежним

One could change the effect of semantics. In my eyes, transferring large structures, such as arrays, is a pretty bad idea, in the first place, it would be preferable that they be transmitted object-by-object. Nevertheless, there would be a rather large hole in backward compatibility when changing the approach, at least I think so, I did not perform tests to determine the actual impact of such a change. For strings, on the other hand, passing as an object will have disastrous consequences if we force the strings to be completely unchanged. Personally, I think the current approach, allowing you to change a specific character in a string at any time, is very convenient (try to do the same in Python).

I don’t know if there is a good way to solve this problem, except for the explicit mention in our documentation that strings and arrays are only pseudo-objects with methods, and not real objects.

The problem can also be extended to other object-related functions. For example, you could write something like $string instanceof stringthat to explicitly define a string this or a real object. I have no confidence in how far this should go. It is better to strictly adhere to all methods and explicitly mention that these are not real objects. In this case, you get good support for the features of the OO system. We will have to think about it more.

Current state


In conclusion, this approach has a number of problems, but do not consider them as particularly important. At the same time, it gives great opportunities to implement environmentally friendly APIs for our main types and improve the readability (and writing) of the code for performing operations with them.

What is the state of the idea? The people are not particularly opposed to this approach and want such aliases to exist everywhere. The main thing that is not enough to move forward on this issue is the lack of a developed API specification.

I created a scalar objects project that is implemented as a PHP extension. It allows you to register a class that will process method calls for the corresponding primitive type. Example:

class StringHandler {
    public function length() {
        return strlen($this);
    }
    public function contains($str) {
        return false !== strpos($this, $str);
    }
}
register_primitive_type_handler('string', 'StringHandler');
$str = "foo bar baz";
var_dump($str->length());          // int(11)
var_dump($str->contains("bar"));   // bool(true)
var_dump($str->contains("hello")); // bool(false)

Work has begun on a string handler that includes an API specification , but I have not finished the project. I hope I find the motivation to someday continue to develop this idea. There are already a number of projects running on similar APIs.

Here is one of those things that I would like to see in the new PHP.

Also popular now: