ph_piter July 31, 2015 at 10:07

Proper API Design: What is “One,” “Many,” “Zero,” and “Nothing”

Transfer

Hello, our regular and occasional readers.

Today we want to offer you an interesting article on API design and related pitfalls. Do not ask how we came upon it; creative search is a very non-linear matter.

Enjoy reading

Overview

There are many factors to consider when designing an API. Security, consistency, state management, style; this list seems to be endless. However, one factor is often overlooked - it is about scale. If you take into account the scale of the system when designing the API from the very beginning, then later (when the system will grow) you can save hundreds of hours of working time.

Introduction

It is sometimes difficult to formulate what constitutes an application programming interface (API). From a technical point of view, any function called by another programmer’s code can be attributed to the API. Discussions about which code “pulls” the API are beyond the scope of this article, so we will assume that APIs are the simplest functions.

This article has specially selected simple examples that serve only to illustrate its main theme. Functions in C # were used, but the basic principles set forth here are applicable in almost any language, framework, or system. The data structures in the article are modeled in the common relational style used in many industrial databases. Again, the examples are written as illustrations only, and should not be considered as recommendations.

Requirements

Suppose you are writing the simplest order processing system for a customer, and you have already defined three main classes (or, if you want, “data structures”). The Customer class has a “foreign key” (in database terminology) for the Address class, and the Order class has foreign keys for the Address and Customer classes. Your task is to create a library that can be used to process orders. The first business rule for this case: the HomeAddress state of the customer (Customer) should be the same as the BillingAddress state of the Order. Do not ask why, business rules usually do not understand the mind :)

public class Address
{
    public int AddressId { get; set; }
    public string Street { get; set; }
    public string City { get; set; }
    public string State { get; set; }
    public string Zipcode { get; set; }
}
public class Customer
{
    public Address HomeAddress { get; set; }
    public int CustomerId { get; set; }
    public int HomeAddressId { get; set; }
    public string CustomerName { get; set; }
}
public class Order
{
    public Customer MainCustomer { get; set; }
    public Address ShippingAddress { get; set; }
    public Address BillingAddress { get; set; }
    public int OrderId { get; set; }
    public int CustomerId { get; set; }
    public int ShippingAddressId { get; set; }
    public int BillingAddressId { get; set; }
    public decimal OrderAmount { get; set; }
    public DateTime OrderDate { get; set; }
}

Implementation

Checking if two fields are the same is obviously a simple task. You hope to impress the boss, so you have concocted a solution in less than 10 minutes. The VerifyStatesMatch function returns a Boolean value by which the caller can determine whether the business rule is being executed or not. You run your library through a few simple tests and make sure that it takes an average of 50 ms to execute the code, no jambs are visible in it. The boss is very pleased, gives your library to other developers to use it in their applications.

public bool VerifyStatesMatch(Order order)
{
    bool retVal = false;
    try
    {
        // Допустим, на эту операцию тратится 25 мс.
        Customer customer = SomeDataSource.GetCustomer(order.CustomerId);
        // Допустим, на эту операцию тратится 25 мс.
        Address shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId);
        retVal = customer.HomeAddress.State == shippingAddress.State;
    }
    catch (Exception ex)
    {
        SomeLogger.LogError(ex);
    }
    return retVal;
}

Problem

The next day, come to work, and you have a sticker on your monitor: "Come to me urgently - Chef." You realize that you succeeded so much yesterday with your library that today the boss decided to entrust you with an even more serious task. However, it soon turns out that there are serious problems with your code.

You : Good morning, chef, what happened?
Chief : This is your library, from it there are continuous problems in the code!
You : What? How?
Head : Bob says your algorithm is too slow, John complains that everything is not working correctly, and Steve said this: "the reference to the object does not indicate an instance of the object."
You : I can not imagine, yesterday it tested, and everything was normal
Head: I do not want to hear anything. Go and figure it out!

Not the best start to the day, right? It seems to me that most developers have ever faced a similar situation. You thought you wrote the library “perfectly,” and it brought a whole bunch of problems. But if you correctly understand what “One”, “Many”, “Zero” and “Nothing” are, then you will learn to distinguish where your API does not meet the expectations of colleagues.

One

en.wikipedia.org/wiki/The_Matrix

The first guide to action is to understand what “One” is and how to work with it. I mean, your API should in any case process one portion of the expected input without any errors. Such errors are theoretically possible, but you are not required to report them to the caller. “Isn't that obvious?” You might think. Well, let's look at an example and look at what errors might occur when processing Order.

Customer customer = SomeDataSource.GetCustomer(order.CustomerId);
Address shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId);
// что если customer.HomeAddress не загрузилось правильно или оказалось равно null?     
retVal = customer.HomeAddress.State == shippingAddress.State;

As is clear from the above comment, we assume that the HomeAddress property loaded correctly from the data source. Although in 99.99% of cases it is likely to be so, a truly reliable API should also consider such a scenario when this does not happen. In addition, depending on the language, comparing two State properties may fail if either of these properties does not load correctly. In this case, it is important that we do not know anything about the input that we can receive, or about data extracted from code that we do not control.

This is the simplest example, so let's fix our code and move on.

Customer customer = SomeDataSource.GetCustomer(order.CustomerId);
Address shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId);
if(customer.HomeAddress != null)
{
    retVal = customer.HomeAddress.State == shippingAddress.State;
}

A lot of

msdn.microsoft.com/en-us/library/w5zay9db.aspx We

return to the above scenario. We need to talk to Bob. Bob complained that the code was slow, but the 50 ms value was consistent with the execution time expected in a system with this architecture. But it turns out that Bob processes 100 orders of your largest user in one package, so in the Bob cycle it takes 5 seconds to complete your method.

// Код Боба:
foreach(Order order in bobsOrders)
{
    ...
    bool success = OrderProcess.VerifyStatesMatch(order);
    ....
}

You : Bob, why did you get that my code is too slow? It takes only 50 ms to process an order.
Bob : Our customer is Acme Inc. requires their batch orders to be processed as fast as possible. I have to serve 100 orders, so 5 seconds is too long.
You : Oh, I didn’t know that we had to process orders in batches.
Bob : Well, this is only for Acme, they are our largest client.
You : They didn’t tell me anything about Acme or about batch orders.
Bob : Shouldn't your code provide efficient processing of several orders at once?
You : Ah ... yes, of course.

It is clear what happened and why the code seems to Bob "too slow." They didn’t tell you anything about Acme or batch processing. The Bob loop loads the regular Customer class and most likely loads the same Address record 100 times. This problem can be easily solved if you accept an array of orders, and not just one, plus add some simple caching. The params keyword in C # exists for just such situations.

public bool VerifyStatesMatch(params Order[] orders)
{
    bool retVal = false;
    try
    {
        var customerMap = new Dictionary();
        var addressMap = new Dictionary();
        foreach (Orderorder in orders)
        {
            Customer customer = null;
            if(customerMap.ContainsKey(order.CustomerId))
            {
               customer = customerMap[order.CustomerId];
            }
            else
            {
               customer = SomeDataSource.GetCustomer(order.CustomerId);
               customerMap.Add(order.CustomerId, customer);
            }
            Address shippingAddress = null;
            if(addressMap.ContainsKey(order.ShippingAddressId))
            {
               shippingAddress = addressMap[order.ShippingAddressId];
            }
            else
            {
               shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId);
               addressMap.Add(order.ShippingAddressId,shippingAddress);
            }
            retVal = customer.HomeAddress.State == shippingAddress.State;
            if(!retVal)
            {
                break;
            }
        }
    }
    catch (Exception ex)
    {
       SomeLogger.LogError(ex);
    }
    return retVal; 
}

If you modify the function in this way, Bob’s batch processing will accelerate dramatically. Most data calls will disappear, because you can simply find an entry by its ID in the temporary cache (dictionary).

Once you open your API for “Many”, you will immediately have to connect some kind of border control. What if, for example, someone sends a million orders to your method? Does such a large number go beyond the capabilities of this architecture? It is in this case that the idea of both system architecture and business processes is useful. If you know that in practice it may be necessary to process a maximum of 10,000 orders, then you can confidently establish control at 50,000. This way you guarantee that no one can put the system in one giant unacceptable call.

Of course, the list of possible optimizations is not limited to this, but the example shows how you can get rid of unnecessary work if you count on “many” instances from the very beginning.

Zero

You : Steve, are you passing a null pointer to my code?
Steve : I think not, but what?
You : The boss says the system swears "the link does not indicate ...".
Steve : Ah, that’s probably the case with the legacy system. I do not control the output from this system, we just upload its output to the new system through the pipeline, as it is.
You : Some kind of nonsense, so why not solve the problem with these zeros?
Steve : I decide; I do a zero check in the code. Are you not?
You : O ... yes, of course.

"An object reference does not indicate an instance of an object." Should I explain the meaning of this error? Many of us have spent more than one hour of our lives fighting it. In most languages, null, empty set, etc. - a perfectly valid state for any type with an undefined value (non-value type). Therefore, any serious API must consider the value “Null”, even if the caller is not technically allowed to pass it.

Of course, checking all references to zero is a complex matter and sometimes an excessive measure. However, in no case should you trust the input coming from a source that you do not control. Therefore, we must check the “orders” parameter for zero, as well as the Order instances inside it for zero.

By correctly performing a zero check, you can avoid annoying calls from customers who apply for technical support and ask what an “instance of an object” is. I always prefer to overtake; better my function returns the default value and logs a message (or sends a warning), than it will throw out a rather useless error "does not indicate an instance of the object." Of course, such a solution completely depends on the type of system, on whether the code is executed on the client or on the server, etc. The point is that zero can be ignored, but only until it comes back to you.

EXPLANATION: Honestly, I am not saying that a function should be “inactive” if it encounters an invalid state. If null parameters are not acceptable for your system, throw an exception (like ArgumentNull in .NET). However, in some situations it is perfectly acceptable to return a meaningful default, and there is no need to throw an exception. For example, current methods usually return the value that was passed to them if they cannot do anything with this value. There are too many factors that do not allow giving general recommendations in case you have to face zero.

Nothing

youtu.be/CrG-lsrXKRM

You : John, what do you pass into my code? It looks like an incomplete Order.
John: Oh, sorry. I don’t need your method, but another library requires me to pass the Order parameter. I think this library calls your code. I do not work with orders, but I must use a different library.
You : This library needs to be corrected: it is crookedly designed!
John : You see, that library organically developed along with business tasks - they were changing. Matt wrote it, but he will not be this week; In general, I don’t know how to change it. But shouldn't your code check if input is valid?
You : Yes ... indeed.

Of all four principles, Nothing is probably the hardest to describe. Zero, although it seems to be “nothing” and “emptiness”, has a definition and can be quantified. What is it, in most languages, a special keyword is built in for zero. When working with null, your API must deal with such input, which, in essence, is garbage. In our example, we are talking about processing Order, which does not have CustomerId, or has an OrderDate of five centuries ago. A more illustrative example is a collection in which there is not a single element. This collection is not null, therefore it should belong to the category “Many”, but the caller did not fill the collection with any data. It is always necessary to take into account such a scenario in which “nothing” appears. Let's adjust our example so that “nothing” is also processed in it. The caller cannot simply pass on something like Order; her order will have to satisfy the minimum general requirements. Otherwise, this information will be regarded as "nothing."

...
// Да, я схалтурил.  ;-) 
if (order != null && order.IsValid)
...

Conclusion

I hope that I managed to convey to the readers the main idea of this article: it does not happen that the code can accept any input information without problems. When implementing any function or API, you have to consider how this API will be used. In our example, the original function increased from 12 to 50 lines, although we did not make any fundamental changes to it. All the code that we added is needed to ensure scaling, border control, and also so that the function handles any input correctly and efficiently.
The volume of stored data in recent years has grown exponentially, and therefore the scale of the input data will increase, while the quality of this data can only fall. If you write the API correctly from the very beginning, this can play a crucial role for business growth, adaptation to an expanding customer base, and in the long run - to save technical support costs (and you will have less headaches).

Tags:

Proper API Design: What is “One,” “Many,” “Zero,” and “Nothing”

Also popular now: