Fai December 3, 2012 at 20:46

Likbez on typing in programming languages

From the sandbox

This article contains the necessary minimum of things that you just need to know about typing in order not to call dynamic typing evil, Lisp is a typeless language, and C is a strongly typed language.

The full version contains a detailed description of all types of typing, flavored with code examples, links to popular programming languages and exponential pictures.

I recommend reading the short version of the article first, and then, if you wish, and the full one.

Short version

Typification programming languages are usually divided into two large camps - typed and untyped ( typeless ). The former, for example, include C, Python, Scala, PHP, and Lua, and the latter, assembly language, Forth, and Brainfuck.

Since “typeless typing” is inherently simple as a cork, it is not further divided into any other types. But typed languages are divided into several overlapping categories:

Static / dynamic typing. Static is determined by the fact that the final types of variables and functions are set at the compilation stage. Those. already the compiler is 100% sure which type is where. In dynamic typing, all types are found out already at runtime.

Examples:
Static: C, Java, C #;
Dynamic: Python, JavaScript, Ruby.
Strong / weak typing (sometimes they say strict / non-strict). Strong typing is distinguished by the fact that the language does not allow mixing different types in expressions and does not perform automatic implicit conversions, for example, you cannot subtract a lot from a string. Weakly typed languages perform many implicit conversions automatically, even if a loss of precision can occur or the conversion is ambiguous.

Examples:
Strong: Java, Python, Haskell, Lisp;
Weak: C, JavaScript, Visual Basic, PHP.
Explicit / implicit typing. Explicitly typed languages differ in that the type of new variables / functions / their arguments needs to be set explicitly. Accordingly, languages with implicit typing transfer this task to the compiler / interpreter.

Examples:
Explicit: C ++, D, C #
Implicit: PHP, Lua, JavaScript

It should also be noted that all these categories intersect, for example, the C language has static weak explicit typing, and the Python language has dynamic strong implicit.

Nevertheless, there are no languages with static and dynamic types at the same time. Although looking ahead I’ll say that I'm lying here - they really exist, but more on that later.

Let's go further.

Detailed version

If the short version didn't seem enough to you, good. No wonder I wrote detailed? The main thing is that in the short version it was simply impossible to fit all the useful and interesting information, and the detailed one will be perhaps too long for everyone to read it without straining.

Typeless typing

In typeless programming languages, all entities are considered simply sequences of bits of various lengths.

Typeless typing is usually inherent in low-level (assembly language, Forth) and esoteric (Brainfuck, HQ9, Piet) languages. However, she, along with the disadvantages, has some advantages.

Benefits

Allows you to write at an extremely low level, and the compiler / interpreter will not interfere with any type checks. You are free to perform any operations on any kind of data.
The resulting code is usually more efficient.
Transparency of instructions. With knowledge of the language, there is usually no doubt what a particular code is.

disadvantages

Complexity. Often there is a need to represent complex values, such as lists, strings, or structures. This may cause inconvenience.
Lack of checks. Any meaningless actions, such as subtracting a pointer to an array from a character, will be considered completely normal, which is fraught with subtle errors.
Low level of abstraction. Working with any complex data type is no different from working with numbers, which of course will create a lot of difficulties.

Strong typeless typing?

Yes, it exists. For example, in assembly language (for the x86 / x86-64 architecture, I don’t know others), you cannot assemble a program if you try to load data from the rax register (64 bits) into the cx register (16 bits).

mov cx, eax ; ошибка времени ассемблирования

So it turns out that there is still typing in the assembler? I believe that these checks are not enough. And your opinion, of course, depends only on you.

Static and dynamic typing

The main thing that distinguishes static (static) typing from dynamic (dynamic) is that all type checks are performed at the compilation stage, and not at the execution stage.

It may seem to some people that static typing is too limited (in fact, it is, but it was long ago eliminated with the help of some techniques). To some, dynamically typed languages are a game with fire, but what features distinguish them? Are both species likely to exist? If not, then why are there many statically and dynamically typed languages?

Let's figure it out.

Benefits of Static Typing

Type checks happen only once - at the compilation stage. And this means that we will not need to constantly find out if we are trying to divide the number into a string (and either give an error or perform a conversion).
Execution speed. From the previous paragraph it is clear that statically typed languages are almost always faster than dynamically typed ones.
Under some additional conditions, it allows you to detect potential errors already at the compilation stage.
Acceleration of development with the support of IDE (screening out options that are obviously not suitable for type).

Benefits of Dynamic Typing

The simplicity of creating universal collections - heaps of everything and everything (such a need rarely arises, but when dynamic typing arises, it will help out).
The convenience of describing generalized algorithms (for example, sorting an array that will work not only on a list of integers, but also on a list of real numbers and even on a list of strings).
Easy to learn - languages with dynamic typing are usually very good for starting programming.

General programming

Well, the most important argument for dynamic typing is the convenience of describing generalized algorithms. Let's imagine a problem - we need a search function across several arrays (or lists) - over an array of integers, over an array of real and an array of characters.

How will we solve it? We will solve it in 3 different languages: one with dynamic typing and two with static.

I’ll take one of the simplest search algorithms - brute force. The function will receive the element to be searched, the array itself (or list) and return the index of the element, or, if the element is not found, (-1).

Dynamic Solution (Python):

def find( required_element, list ):
    for (index, element) in enumerate(list):
        if element == required_element:
            return index
    return (-1)

As you can see, everything is simple and there is no problem with the fact that the list can contain at least numbers, at least lists, at least other arrays. Very well. Let's go further - we will solve the same problem in C!

Static Solution (C):

unsigned int find_int( int required_element, int array[], unsigned int size ) {
    for (unsigned int i = 0; i < size; ++i )
        if (required_element == array[i])
            return i;
    return (-1);
}
unsigned int find_float( float required_element, float array[], unsigned int size ) {
    for (unsigned int i = 0; i < size; ++i )
        if (required_element == array[i])
            return i;
    return (-1);
}
unsigned int find_char( char required_element, char array[], unsigned int size ) {
    for (unsigned int i = 0; i < size; ++i )
        if (required_element == array[i])
            return i;
    return (-1);
}

Well, each function individually is similar to a version from Python, but why are there three? Has static programming lost?

Yes and no. There are several programming techniques, one of which we will now consider. It is called generalized programming and the C ++ language supports it pretty well. Let's look at the new version:

Static solution (general programming, C ++):

template 
unsigned int find( T required_element, std::vector array ) {
    for (unsigned int i = 0; i < array.size(); ++i )
        if (required_element == array[i])
            return i;
    return (-1);
}

Good! It doesn’t look much more complicated than the Python version and doesn’t have to write much. In addition, we got an implementation for all arrays, and not just for the 3 ex needed to solve the problem!

This version seems to be exactly what you need - we get both the benefits of static typing and some of the benefits of dynamic.

It's great that this is generally possible, but could be even better. Firstly, generalized programming can be more convenient and more beautiful (for example, in the Haskell language). Secondly, in addition to generalized programming, you can also apply polymorphism (the result will be worse), function overload (similarly), or macros.

Statics in dynamics

It should also be mentioned that many static languages allow the use of dynamic typing, for example:

C # supports the pseudo-type dynamic.
F # supports syntactic sugar in the form of the? Operator, on the basis of which imitation of dynamic typing can be implemented.
Haskell - Dynamic typing is provided by the Data.Dynamic module.
Delphi - through a special type of Variant.

Also, some dynamically typed languages allow you to take advantage of static typing:

Common Lisp - type declarations.
Perl - since version 5.6, rather limited.

So, move on?

Strong and weak typing

Strongly typed languages do not allow mixing entities of different types in expressions and do not perform any automatic transformations. They are also called "strongly typed languages." The English term for this is strong typing.

Weakly typed languages, on the contrary, in every way contribute to the programmer mixing different types in one expression, and the compiler will lead everything to a single type. They are also called "languages with weak typing." The English term for this is weak typing.

Weak typing is often confused with dynamic typing, which is completely wrong. A dynamically typed language can be both weakly and strongly typed.

However, few people attach importance to the rigor of typing. It is often claimed that if the language is statically typed, then you can catch many potential errors during compilation. They lie to you!

The language must also have a strong typification. Indeed, if, instead of an error message, the compiler simply adds a string to a number, or even worse, subtracts another from one array, what is the use of all the type checking at the compilation stage? That's right - weak static typing is even worse than strong dynamic typing! (Well, this is my opinion)

So, does weak typing have any advantages at all? Perhaps it looks like this, but despite the fact that I am an ardent supporter of strong typing, I must agree that the weak also has advantages.

Want to know which ones?

The benefits of strong typing

Reliability - You will get an exception or a compilation error, instead of incorrect behavior.
Speed - instead of hidden conversions, which can be quite expensive, with strong typing, you need to write them explicitly, which makes the programmer at least know that this piece of code can be slow.
Understanding the operation of the program - again, instead of implicit type casting, the programmer writes everything himself, which means he understands that comparing a string and a number is not by itself or magic.
Certainty - when you write transformations manually, you know exactly what you are transforming and into what. Also, you will always understand that such transformations can lead to a loss of accuracy and to incorrect results.

Benefits of Weak Typing

Ease of use of mixed expressions (for example, from integers and real numbers).
Abstraction from typing and focus on the task.
Brevity of record.

Well, we figured it out, it turns out that weak typing also has advantages! Are there any ways to transfer the advantages of weak typing to strong?

It turns out there are even two.

Implicit type conversion in unambiguous situations and without data loss

Wow ... Pretty long point. Let me continue to shorten it to “limited implicit conversion” So what does an unambiguous situation and data loss mean?

An unambiguous situation is a transformation or operation in which the entity is immediately understandable. For example, adding two numbers is an unambiguous situation. And the conversion of the number into an array is not (it is possible to create an array of one element, maybe an array with such a length, filled with elements by default, or maybe the number will be converted to a string and then to an array of characters).

Data loss is even easier. If we convert a real number 3.5 to an integer, we will lose a part of the data (in fact, this operation is also ambiguous - how will rounding be done? Forward? For lesser? Discarding the fractional part?).

Conversions in ambiguous situations and transformations with data loss are very, very bad. There is nothing worse than that in programming.

If you don’t believe me, study the PL / I language or even just look for its specification. It has conversion rules between ALL data types! This is just hell!

Okay, let's think about a limited implicit conversion. Are there any such languages? Yes, for example in Pascal you can convert an integer to a real number, but not vice versa. There are also similar mechanisms in C #, Groovy, and Common Lisp.

Okay, I said that there is still a way to get a couple of advantages of weak typing in a strong language. And yes, it is and is called polymorphism of designers.

I will explain it using the wonderful Haskell language as an example.

Polymorphic constructors came about as a result of the observation that safe implicit conversions are most often needed when using numeric literals.

For example, in an expression pi + 1, you do not want to write pi + 1.0or pi + float(1). I want to write simply pi + 1!

And this is done in Haskell, due to the fact that literal 1 does not have a specific type. It is neither a whole, nor material, nor complex. This is just a number!

As a result, when writing a simple function sum x ythat multiplies all numbers from x to y (with an increment of 1), we get several versions at once - sum for integers, sum for real, sum for rational, sum for complex numbers, and even sum for all those numeric types that you yourself have identified.

Of course, this technique saves only when using mixed expressions with numerical literals, and this is only the tip of the iceberg.

Thus, we can say that balancing on the verge between strong and weak typing is the best way out. But so far, no language holds the perfect balance, so I'm more inclined towards strongly typed languages (such as Haskell, Java, C #, Python), rather than weakly typed languages (such as C, JavaScript, Lua, PHP).

Okay, let's move on?

Explicit and Implicit Typing

An explicitly typed language assumes that the programmer must specify the types of all the variables and functions that he declares. The English term for this is explicit typing.

An implicit typing language, on the contrary, offers you to forget about types and shift the task of type inference to a compiler or interpreter. The English term for this is implicit typing.

At first, you can decide that implicit typing is equivalent to dynamic and explicit typing is static, but later on we will see that this is not so.

Are there any advantages for each species, and again, are there any combinations of them and are there languages with support for both methods?

The benefits of explicit typing

The presence of each function signature (for example int add(int, int)) allows you to easily determine what the function does.
The programmer immediately writes what type of values can be stored in a particular variable, which removes the need to remember this.

Benefits of Implicit Typing

Shortening the record is def add(x, y)clearly shorter than int add( int x, int y).
Resistant to change. For example, if in a function the temporary variable was of the same type as the input argument, then in the explicitly typed language, when changing the type of the input argument, it will also be necessary to change the type of the temporary variable.

Well, it is clear that both approaches have both pros and cons (and who expected something else?), So let's look for ways to combine these two approaches!

Explicit typing of choice

There are languages with implicit typing by default and the ability to specify the type of values if necessary. The translator will automatically output the real type of expression. One of these languages is Haskell, let me give you a simple example, for clarity:

-- Без явного указания типа
add (x, y) = x + y
-- Явное указание типа
add :: (Integer, Integer) -> Integer
add (x, y) = x + y

Note: I intentionally used a non-curried function, and also intentionally wrote down a private signature instead of the more general add :: (Num a) => a -> a -> a*, as I wanted to show the idea, without explaining Haskell's syntax.

* Thanks to int_index for finding the error.

Hm. As we see, it is very beautiful and short. Recording a function takes only 18 characters per line, including spaces!

However, automatic type inference is a rather complicated thing, and even in a cool language like Haskell, it sometimes fails. (as an example, the restriction of monomorphism can be given)

Are there languages with explicit typing by default and implicit by necessity? Con
echno.

Implicit typing of choice

The new C ++ language standard, called C ++ 11 (previously called C ++ 0x), introduced the auto keyword, which allows the compiler to infer a type based on context:

Давайте сравним:
// Ручное указание типа
unsigned int a = 5;
unsigned int b = a + 3;
// Автоматический вывод типа
unsigned int a = 5;
auto b = a + 3;

Not bad. But the record was not reduced much. Let's look at an example with iterators (if you don’t understand, don’t be afraid, the main thing is to note that the record is greatly reduced due to automatic output):

// Ручное указание типа
std::vector vec = randomVector( 30 );
for ( std::vector::const_iterator it = vec.cbegin(); ... ) { 
    ...
}
// Автоматический вывод типа
auto vec = randomVector( 30 );
for ( auto it = vec.cbegin(); ... ) { 
    ...
}

Wow! This is a reduction. Okay, but is it possible to do something in the spirit of Haskell, where the return type will depend on the types of the arguments?

And again, the answer is yes, thanks to the decltype keyword in combination with auto:

// Ручное указание типа
int divide( int x, int y ) {
    ...
}
// Автоматический вывод типа
auto divide( int x, int y ) -> decltype(x / y) {
    ...
}

This form of writing may not seem very good, but in combination with generalized programming (templates / generics), implicit typing or automatic type inference does wonders.

Some programming languages for this classification

I will give a short list of popular languages and write how they are divided into each category of “typing”.

JavaScript  - Динамическая | Слабая      | Неявная
Ruby        - Динамическая | Сильная     | Неявная
Python      - Динамическая | Сильная     | Неявная
Java        - Статическая  | Сильная     | Явная
PHP         - Динамическая | Слабая      | Неявная
C           - Статическая  | Слабая      | Явная
C++         - Статическая  | Слабая      | Явная
Perl        - Динамическая | Слабая      | Неявная
Objective-C - Статическая  | Слабая      | Явная
C#          - Статическая  | Сильная     | Явная
Haskell     - Статическая  | Сильная     | Неявная
Common Lisp - Динамическая | Сильная     | Неявная
D           - Статическая  | Сильная     | Явная
Delphi      - Статическая  | Сильная     | Явная

Notes on the table (for the idea and reminder of C # thanks qxfusion ):

C # - supports dynamic typing, through a special pseudo-type dynamic from version 4.0. Supports implicit typing with dynamic and var.
C ++ - after the C ++ 11 standard, it received support for implicit typing using the keywords auto and decltype. It supports dynamic typing when using the Boost library (boost :: any, boost :: variant). It has the features of both strong and weak typing.
Common Lisp - the standard provides for type declarations, which some implementations can also use for static type checking.
D - also supports implicit typing.
Delphi - supports dynamic typing through a special type of Variant.

Maybe I made a mistake somewhere, especially with CL, PHP and Obj-C, if you have a different opinion on some language - write in the comments.

Conclusion

Okay It will be light soon, and I feel that there is nothing more to say about typing. Oh how? The theme is bottomless? A lot is left unsaid? I ask for comments, share useful information.

And good luck!

useful links

Propedopedia: Typing
Wikipedia: Typing
Quadrants of typing in programming languages

Tags: