10 potential SQL errors that programmers make

Original author: Lukas Eder
  • Transfer
  • Tutorial
The original article is called “10 SQL errors that Java developers make”, but by and large, the principles contained in it can be attributed to any language.



Java programmers interfere with object-oriented and imperative thinking depending on their level:
- mastery (anyone can program imperatively)
- dogma (a template for applying templates anywhere and naming them)
- moods (applying a true object approach is a little more complicated than imperative)

But everything changes when Java developers write SQL code.
SQL is a declarative language that has nothing to do with object-oriented or imperative thinking. It is very easy to express a request in SQ, but it is rather difficult to express it correctly and optimally. Developers not only need to rethink their programming paradigm, they also need to think in terms of set theory.

The following are common mistakes that Java developers make using SQL in JDBC or jOOQ (without a specific order). For the other 10 errors, see this article .

1. Forgot about NULL


Misunderstanding NULL is probably the biggest mistake Java developers can make when they write SQL. This may be because NULL is also called UNKNOWN. If it were simply called UNKNOWN, it would be easier to understand. Another reason is that when you retrieve data and bind variables, JDBC reflects SQL NULL in Java null. This can cause NULL = NULL (SQL) to behave the same as null == null (JAVA).

One of the clearest examples of the NULL misconception is when NULL predicates are used with a string value expression .

Another, more specific problem occurs when there is no understanding of the NULL value in NOT IN anti-joins .

Medicine:
Train yourself. Nothing complicated - always think about NULL when writing SQL:
- Is this predicate correct with respect to NULL?
- Does NULL affect the result of this function?

2. Processing data in Java memory


Not many Java programmers know SQL very well. Random JOIN, weird UNION and alright. What about window functions? Grouping sets? Many Java developers load SQL data into memory, transform it into some suitable collection, and perform the necessary calculations on these collections with verbose cyclic structures (at least until the improvement of collections in JAVA 8 ).

But some SQL databases support additional (SQL standard!) OLAP functions that are better suited for this and are easier to write. One example (not standard) is Oracle's excellent MODEL statement.. Just let the database do the processing and pull the results into Java memory. Because, in the end, some smart guy has already optimized these expensive products. So, using OLAP in the database, you get two things:
- Simplicity. Most likely, it is easier to write correctly in SQL than in Java.
- Performance. DBs are likely to be faster than your algorithm. And, more importantly, you don’t have to pull millions of records by wire.

Remedy:
Every time you write a data-oriented algorithm using Java, ask yourself: "Is there any way to transfer this work to the database?"

3. Using UNION instead of UNION ALL


Shame on the fact that UNION ALL requires an extra word on UNION. It would be much better if the SQL standard were defined to support:
- UNION (allows duplication)
- UNION DISTINCT (removes duplication)

Removing duplicates is not only less commonly used, it is also rather slow on large sampling results, because two sub queries must be ordered, and each tuple must be compared with its subsequent tuple.

Remember that even if the SQL standard defines INTERSECT ALL and EXCEPT ALL, not every database can implement these little-used sets of operations.

Remedy:
Think about whether you wanted to write UNION ALL every time you write UNION.

4. Using JDBC to paginate a large sample


Most databases support some kind of pagination through LIMIT ... OFFSET, TOP ... START AT, OFFSET ... FETCH operators. In the absence of support for these statements, there is still the possibility of having ROWNUM (Oracle) or ROW_NUMBER () OVER () filtering (DB2, SQL Server 2008 and others) , which are much faster than a memory breakdown. This applies mainly to large displacements!

Remedy:
Just use these operators, or a tool (such as jOOQ) that can mimic these operators for you.

5. Data connection in Java memory


From the early days of SQL to the present, some heavy-hearted Java programmers write JOINs. They have an outdated fear that JOINs run slowly. This may be so if the overhead optimizer chooses to make a nested loop by loading entire tables into memory before creating cells of the joined table. But this rarely happens. With normal predicates, restrictions, indexes, MERGE JOIN or HASH JOIN operations are very fast - it all depends on the correct metadata ( Tom Kyte wrote well about this ). However, there are probably still few Java developers who load two tables with two separate queries and join them in the Java memory in one way or another.

Medicine:
If you choose from different tables at different stages, think again, suddenly you can express your requests one.

6. Using DISTINCT or UNION to remove duplicates from a random Cartesian product


Due to complex joins (JOINs), any developer can lose track of meaningful SQL query relationships. More specifically, when using relationships with compound foreign keys, you might forget to add meaningful predicates to the JOIN ... ON statements. This can lead to duplication of lines always or only in exceptional situations. Then some developers may add a DISTINCT statement to stop data duplication. This is not correct for three reasons:
- It can cure the consequences, but not the cause. And yet this may not solve the consequences under boundary conditions.
- This is slow for large samples. DISTINCT performs an ORDER BY operation to remove duplication.
- This is slow for large Cartesian works that will still be loaded into memory.

Medicine:
Generally, if you get unwanted duplicates, revise your JOIN predicates. Probably there somewhere formed a small Cartesian product.

7. Avoidance of the MERGE statement


This is actually not a mistake, but perhaps it is the lack of knowledge or fears of the powerful MERGE operator . Some databases know other forms of the UPSERT statement, for example MySQL ON DUPLICATE KEY UPDATE. In fact, MERGE is very powerful, especially in databases that greatly extend the SQL standard, such as SQL Server .

Remedy:
If you are doing UPSERT by building a chain of INSERT and UPDATE or SELECT ... FOR UPDATE and INSERT / UPDATE, think again. Instead of risking a race for resources, you can write a simpler MERGE request.

8. Using aggregate functions instead of window functions


Before the advent of windowing functions, the only way to aggregate data in SQL was to use GROUP BY along with aggregate functions in a projection. This works well in most cases, and if the aggregated data should be filled with regular data, then a grouped query can be written in an attached query.
But SQL: 2003 defines window functions that are implemented by many database providers. Window functions can aggregate data on ungrouped samples. In fact, each window function supports its own independent PARTITION BY operation, which is a great tool for building reports.

Using window functions will allow you to:
- Build more readable SQL (fewer highlighted GROUP BY expressions in sub queries)
- Improve performance since RDBMS can optimize window functions more easily.

Medicine:
When you write a GROUP BY expression in a sub query, think about whether it can be expressed by a window function?

9. Using sorting in memory with various parameters


The ORDER BY operator supports many types of expressions, including CASE, which can be very useful when defining a sort parameter. You should never sort data in Java memory just because:
- SQL sorting is too slow.
- SQL sorting cannot do this.

Remedy:
If you are sorting any SQL data in Java memory, think about whether it is possible to transfer this sorting to the database? This goes well with pagination in the database.

10. Insert multiple records one at a time


JDBC knows what a batch is, and you should use that. Do not INSERT thousands of records one after another, creating a new PreparedStatement each time. If all your records go to the same table, create a batch of INSERT queries with one SQL query and several related datasets. Depending on your database and its configuration, in order to keep the UNDO log clean, you may need to do a commit after a certain number of inserted records.

Remedy:
Always use the batch insert of large data sets.

Some interesting books on the subject

- SQL Antipatterns by Bill Karwin
- SQL Performance Explained by Markus Winand

Only registered users can participate in the survey. Please come in.

Would you like to see a translation of the continuation of the article?

  • 75.4% Yes 564
  • 24.5% No 184

Also popular now: