What you do not need to code yourself

    Recently I wrote my bike and posted it on a hub Here it is: "The simplest Connection pool without a DataSource in Java . " The article is not one of the most successful, only please do not minus anymore. So, in order not to repeat such mistakes myself and, perhaps, to warn someone against such errors, I decided to translate the article “Seven Things You Should Never Code Yourself” by Andy Lester , a well -known open-source IT worker . So, to whom it is interesting, I ask under cat.

    We programmers love to solve problems. We love when ideas arise in our heads, redirected to our fingers and thereby create great solutions.

    But sometimes we jump in too quickly and begin to crank our code without taking into account all the consequences this could lead to. We do not take into account that someone may have already solved this problem, and that there is already code available for use that has been written, tested and prodazhenny by someone else. Sometimes we just need to stop and think before we start typing.

    For example, if you come across one of these seven programming tasks, then almost always you are better off looking for an existing solution than trying to implement something yourself:

    1. Parsing HTML or XML

    A task whose complexity is often neglected, at least on the basis of how many times it has been asked about StackOverflow, is HTML or XML parsing. Retrieving data from arbitrary HTML looks deceptively simple, but in fact, this task should be solved using libraries. Say you want to extract a URL from a tag such as

    This is actually a simple regular expression that matches the pattern.


    The string “” will be displayed in the template search results and it can be assigned to a string variable. But will such code find the desired values ​​in tags that have other attributes:

    After changing the code so that it handles such cases, will it work if the quotation marks have a different look:

    or there will be no quotes at all:

    What to do if the tag spans multiple lines and is self-closing:

    And will your code know whether to ignore commented tags:


    By the time you do another cycle looking for cases that your code cannot deal with, while correcting and testing your code, you could already use the right library and solve all your problems.

    I gave you a clear story with examples: you will spend much less time searching for an existing library and studying it than trying to write your own bike, which will then have to be expanded so that it works in those cases that you did not even think about when you started write it.

    2. Parsing CSV and JSON

    CSV files are deceptively simple, but fraught with some danger. Files with comma separated values ​​are trivial for parsing, right?

    # ID, name, city
    1, Queen Elizabeth II, London

    Of course, until you have to deal with commas enclosed in double quotes:

    2, J. R. Ewing, "Dallas, Texas"

    If you solved the problem with the use of such double quotes, what would happen if there are embedded quotes in the string that need to skip:

    3, "Larry \"Bud\" Melman", "New York, New York"

    You can handle this as well until you have to deal with line breaks in the middle of a record.

    JSON has the same data type hazards as CSV, with the additional problem of being able to store layered data structures.

    Save yourself the hassle and inaccuracies. Any data that cannot be processed by comma-separated lines must be processed by the library.

    If reading structured data in an unstructured way is considered bad practice, then the idea of ​​changing data in place is even worse. People often say something like “I want to change all tags with such and such URLs so that they have a new attribute.” But even such a seemingly simple thing as “I want to change every fifth field in this CSV name is Bob on Steve ”is fraught with danger because, as noted above, you cannot read commas properly. For everything to be correct, you need to read the data using a competent library into the internal structure, change the data, and then write the changed data back using the same library. Nothing poses such a risk of data distortion as if its structure does not meet your expectations.

    3. Checking Email Addresses

    There are two ways to verify your email address. You can check in a simple way by saying, “I need to have some characters before the @ sign, and then some characters after it”, this idea is implemented by the regular expression:


    It, of course, is not complete, and allows for the presence of incorrect elements, but at least we have the @ sign in the middle.

    Or you can check for compliance with RFC 822 . These rules cover all cases that are rare, but still valid. A simple regular expression does not produce such a slice. You will have to use a library written by someone else.

    If you are not going to check for compliance with RFC 822, then everything you do will be using rules that may seem reasonable, but might not be right. This approach is a compromise, but don’t be fooled into thinking that you have covered all the cases if you didn’t turn to the RFC in the end, or just use a library written by someone else.

    (For further discussion of email validation, see Stackoverflow )

    4. Work with URL

    URLs are not as nasty as email addresses, but they are still full of annoying little rules that you should remember. What characters should be encoded? How do you handle spaces? What about the + signs? What characters can follow the # sign?

    Regardless of the language you use, there is code for breaking URLs into components and for assembling URLs from properly designed components.

    5. Work with date / time

    Date / time manipulations are the main problem in which you most likely will not be able to cover all aspects on your own. When processing date / time, time zones, daylight saving time, leap years, and even leap seconds should be taken into account. There are only four time zones in the United States, and they are one hour apart. In the rest of the world, things are not so simple.

    Be it for arithmetic with dates, which is to calculate the date that will come after three days from a certain date, or to validate the input line to match the date format, use the existing libraries.

    6. Template systems

    It is almost a rite of passage. The junior programmer must create a huge amount of boilerplate text and come up with some kind of simple format like:

    Dear # user #,
    Thank you for your interest in #product # ...

    This format works for a while, but then it all ends up with the need to add output formats, numerical formatting, output of structured data to a table, etc. until there is a monster that requires endless care and feeding.

    If you are doing anything complicated than just replacing a string with a string, take a step back and find a good template library. Things are even easier if you write in PHP, the language itself in this case is a template system (although these days they often forget about it).

    7. Logging frameworks

    Logging tools are another example of projects that start small and grow into monsters. From a small function designed for logging to a file, you may soon need to log into several files, or send an e-mail at the end of the process, or so that it supports logging levels, etc. Regardless of the language you use, there are at least three ready-made logging packages that have been used for years and which will save you from the problems described above.

    Is a library overkill?

    Before you treat with neglect or contempt the idea of ​​connecting a third-party module, you should pay close attention to your protests and objections. The first objection is usually this: "Why do I need a whole library just to do this (check this date / parse this HTML / etc ..)," My answer is: "What's wrong with that?" just don’t write the microcontroller code for the toaster, where you have to squeeze out every byte of space for the code.

    If you have speed limits, keep in mind that avoiding the use of the library may be a premature optimization. Downloading an entire library for working with date / time can make validation 10 times slower than your knee-deep solution, but check your code to see if it really is that good.

    We programmers are proud of our skills, and we like the process of creating code. This is normal. Just remember that your responsibility as a programmer is not just to write code, but to solve problems, and often the best way to solve a problem is to write as little code as possible.

    Translator's note:
    By the way, the last paragraph is very harmonious with the main idea from the article “How to improve your programming style?” .
    UPD1. List of tools by major programming languages, categorized: awesome-awesomeness (link provided by hell0w0rd in the comments, special thanks to him).

    Also popular now: