One of my favorite database books is SQL Antipatterns by Bill Karwin. The book lists many approaches taken in database design, querying and development that may be suboptimal, and gives different approaches to solve a particular issue.
For each issue, the author will list the objective, describe the antipattern and how to recognize it, legitimate use cases for the antipattern, and a suggestion on an alternate way of achieving the objective.
Here is a list of the antipatterns described, along with the solution to each. You’ll want to read the book to get more detail on each item.
Logical Database Design
1. Storing comma separated list of values.
Solution: Use an intersection table for a many to one relationship
2. Hierarchy – storing a parent ID – Adjacency list
Solution: Use a path structure (1/5/7/)
3. Always having an incrementing integer as the PK
Solution: Consider natural and compound keys
4. Not using constraints, thinking that leads to simplier design
Solution: Use constraints
5. Using the Entity-Attribute-Value design
Solution: Each entity has its own table
6. Using a “Dual-Purpose foreign key” – Storing an ID that could refer to values from multiple tables, plus a 2nd column to store the table name.
Solution: Set up a proper foreign key relationship
7. Repeating columns – Column1, Column2, Column3
Solution: Set up one to many relationship
8. Splitting data in separate tables based on values – Sales2015, Sales2016, Sales2017
Solution: Use partitioning
Physical Database Design
1. Using the Float data type to store fractional values when exact values are important
Solution: Solution: Use the Numeric type
2. Specifying valid values for a column in a check constraint
Solution: Solution: Put valid values in a table and create a foreign key
3. Storing paths to external files – Stored outside the database where they can’t be managed by the database system.
Solution: Store data in a Blob, or use Filestream to manage the files
4. No indexes or creating too many indexes
Solution: Analyze usage data to understand where indexes are needed
Query
1. Not allowing NULLs and using a different value to represent missing data
Solution: Use NULL to represent missing data
2. Listing columns in a SELECT clause that aren’t either listed in a GROUP BY clause or contained in an aggregate function (Not valid in SQL Server, which will raise an error)
Solution: Follow the rules for GROUP BY
3. In trying to pick a record at random, sorting the dataset by a random number (Like RAND or NEWID) and taking the first record.
Solution: Find other ways to pick a random value. In SQL Server, use TABLESAMPLE (1 ROW).
4. Simulating a search engine by using LIKE ‘%SearchTerm%’.
Solution: In SQL Server, use full-test search
5. Trying to return complex datasets all in one query – Lots of times there are unintended results, like Cartesian joins.
Solution: Break a complex problem into smaller steps
6. Depending on column order, with SELECT * or INSERT without explicitly listing columns.
Solution: Avoid wildcard queries and explicitly list columns for INSERTs
Application Development
1. Storing passwords in plain text
Solution: Store a salted hash of the password
2. Added unverified text to a SQL query – Open to SQL injection.
Solution: Filter input or use parameterized queries.
3. Backfilling any gaps in identity columns.
Solution: Treat keys as a uniwue indentifier, not as a row number.
4. Ignoring return values from a database API in an effort to have simpler or less code.
Solution: Store any dynamically generated SQL and return values from executing queries to aid in troubleshooting.
5. Not treating SQL as code – No source control, tests, etc.
Solution: Treat SQL as any other application code.
6. MVC – Using the Model as an Active Record.
Solution: Decouple the model from the tables.