知识库

BetaThis is a live doc! Anyone with edit access can make updates in real time without having to publish.

/

SQLite

Edited Dec 11, 2024

1 注意点
- 1.1 dynamic typing 动态类型
- 1.2 rowid
- 1.3 autoincrement
- 1.4 primary key
- 1.5 特殊关键字值
2 技巧
- 2.1 optimize / analyze
- 2.2 常用操作函数
- 2.3 .mode
- 2.4 .dump / .recover
3 值得阅读的重要文档

注意点

dynamic typing 动态类型

SQLite 中实际存储的值的数据类型可以和定义时指定的类型不同，The declared type of a column is used to determine the affinity of the column only.

Flexible typing is a feature of SQLite, not a bug. As of version 3.37.0 (2021-11-27), SQLite provides STRICT tables that do rigid type enforcement, for developers who prefer that kind of thing.

sqlite 的内部值类型有（注意这是 sqlite 内部存储值的方式，不是定义时声明列的类型）

NULL
INTEGER
REAL
TEXT
BLOB

所有值都会按照上述的五种类型之一存储

「特殊值」—— 某些内部函数可能期待特殊的值约定，他们仍然以五种类型之一存储，但是满足特殊的值规则

布尔值：使用 0 代表 false、1 代表 true
- sqlite 有特殊的 TRUE FALSE 值，但实际上仅仅是 1 / 0 的别名
时间日期
- TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
  - 注意 date 和 time 之间没有 T 分隔 —— 这是早期 ISO8601 的标准
- REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
- INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

sqlite 的列声明类型允许任意字符串！同样也允许完全不声明类型。当一个列存在声明类型时，插入的值会先尝试被转换为列的声明类型，如果转换成功则存储转换后的值，但如果转换失败，SQLite 不会报错，而是以传入的数据的原始数据类型存储内容。（唯一的例外：NULL 不可以被赋值到 NOT NULL 列 —— 但这实际上是约束的内容而不是类型转换）

这个转换规则叫做「Type Affinity」，具体可以参见文档

可以使用 select typeof(xxx) from table 来查看对应的值具体类型

如果想开启严格的类型检查，可以将表声明为 STRICT

在严格类型检查下

列的声明类型只能为下面之一
- INT
- INTEGER
- REAL
- TEXT
- BLOB
- ANY
插入的值必须和类型所匹配，否则报错 SQLITE_CONSTRAINT_DATATYPE
- ANY 可以接受任何类型
- NULL 可以赋值给任何没有 NOT NULL 约束的列
  - 注意如果 ANY NOT NULL 也是不可以赋值为 NULL 的
  - 特殊：如果 NULL 赋值给了 INTEGER PRIMARY KEY 的列，不会报错，而是会生成一个 id

另外注意一个特殊点 ANY，它仅在严格类型下是一个专门的类型，会原样保留任何传入的类型，在默认情况下（非严格类型检查），根据 Type Affinity 规则，它属于 NUMERIC 亲和，这带来的问题是如果传入一个数字字符串，在严格类型检查下它会被保留为字符串，但是在默认情况下它会被转换为数字保存

rowid

与其他主流数据库不同，sqlite3 的主键可能是「假的」，它存在 rowid 这个神奇的东西

为了表述方便，这里用「定义主键」表述用户定义的 primary key，用「索引主键」表述 sqlite3 内部索引与主表关联用的键

默认情况下（即不指定 WITHOUT ROWID），sqlite3 使用 rowid 作为索引主键（64-bit signed int）
1. 当定义主键的类型为 INTEGER 时，sqlite3 依然使用 rowid 作为索引主键，但是定义主键将成为 rowid 的 alias（别名）
  1. 类型必须为 INTEGER，任何其他变形（包括 INT 都不行）
  2. [BUG] 当定义主键的类型为 INTEGER + 利用 PRIMARY KEY DESC 定义时，它不是 rowid 的别名，甚至不再被当成主键
    1. 这一 bug 不影响 CREATE TABLE t(x INTEGER, y, z, PRIMARY KEY(x DESC)); 中的 x 成为 rowid 的别名
2. 如果需要访问或修改 rowid，可以利用 rowid oid _rowid_ 三种表述中的任意一种 —— 但是如果这个名称本就在表的 schema 中存在则它不可用来指代 rowid
3. rowid 的数据类型只能是数字 —— 与 sqlite3 中的弱/动态类型不同，当写入的类型不匹配时会报错；在插入数据时，如果不指定或指定 rowid 为 null，则 rowid 会自动创建；修改数据时，注意不可修改 rowid 为 null
4. 因为 rowid 的存在，primary key（除 integer primary key 以外）只是一个 unique index，因此利用 primary key 获取行的其他字段需要两次寻址
指定 WITHOUT ROWID 时，sqlite3 使用定义主键作为索引主键（完全不存在 rowid 这个东西）
1. 没错，这意味着，默认情况下 sqlite3 可以没有主键，只有在 WITHOUT ROWID 模式下才存在主键
2. 在 WITHOUT ROWID 时，AUTOINCREMENT 不可用
3. 在 WITHOUT ROWID 时，sqlite3_update_hook() 不会被触发
4. 注意尽量避免为 INTEGER PRIMARY KEY 定义 WITHOUT ROWID —— 官方文档：WITHOUT ROWID tables will work correctly (that is to say, they provide the correct answer) for tables with a single INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in that case. Hence, it is good design to avoid creating WITHOUT ROWID tables with single-column PRIMARY KEYs of type INTEGER.（更快的原因看下面一条）
5. A good rule-of-thumb is that the average size of a single row in a WITHOUT ROWID table should be less than about 1/20th the size of a database page. 原因：rowid tables are implemented as B*-Trees where all content is stored in the leaves of the tree, whereas WITHOUT ROWID tables are implemented using ordinary B-Trees with content stored on both leaves and intermediate nodes. Storing content in intermediate nodes causes each intermediate node entry to take up more space on the page and thus reduces the fan-out, increasing the search cost.

注意，sqlite3 的 rowid 不是什么「创新」，而是「设计缺陷」

autoincrement

因为有 rowid 的存在，sqlite3 一般不需要 autoincrement，官方文档也明确写了不建议

AUTOINCREMENT is not allowed on WITHOUT ROWID tables or on any table column other than INTEGER PRIMARY KEY.

rowid 一般是表中的上一个 +1，如果有表创建后删除了会存在复用已有 rowid 的情况

如果找到的 rowid 重复，再 +1，知道找到不重复的
如果表中没有记录，使用 1
如果没有手动指定的负值的 rowid，rowid 将始终大于 0
如果最后一条 rowid 为 int64 max，则下一个值将从所有正数中随机选取，如果经过 x 次仍然无法找到没有被使用过的则报错 SQLITE_FULL

而 autoincrement 保证了所有的值都是未被使用过的

利用 internal sqlite_sequence 表存储历史最大的 rowid，删除记录不会导致 rowid 复用
- 只有插入语句会影响 sequence 表，后续对记录 rowid 的 update 不会影响 sequence 表
- 如果一个插入的记录是被 rollback 的，则它对 sequence 表的影响也会被复原
如果试图使用的值已经存在了，尝试 +1 直至不重复（与很多其他 DB 系统会报错不同）
如果已经达到了 int64 max，报错 SQLITE_FULL

primary key

primary key 的主要内容在 rowid 中叙述了，这里不再重复

这里主要说明一个 sqlite 的 bug：According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a bug in some early versions, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY or the table is a WITHOUT ROWID table or a STRICT table or the column is declared NOT NULL, SQLite allows NULL values in a PRIMARY KEY column. SQLite could be fixed to conform to the standard, but doing so might break legacy applications. Hence, it has been decided to merely document the fact that SQLite allows NULLs in most PRIMARY KEY columns.

特殊关键字值

某些关键字（而非函数）直接代表了某种特殊值


1sqlite> select TRUE, FALSE, CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
2+------+-------+---------------------+--------------+--------------+
3| TRUE | FALSE |  CURRENT_TIMESTAMP  | CURRENT_DATE | CURRENT_TIME |
4+------+-------+---------------------+--------------+--------------+
5| 1    | 0     | 2024-12-11 05:07:04 | 2024-12-11   | 05:07:04     |
6+------+-------+---------------------+--------------+--------------+

特别注意 CURRENT_TIMESTAMP 输出的不是数字而是 ISO 8601 字符串

如果需要时间戳，使用 unixepoch()

技巧

optimize / analyze

https://www.sqlite.org/lang_analyze.html

https://www.sqlite.org/pragma.html#pragma_optimize

https://github.com/sqlite/sqlite/tree/master/ext/expert https://blog.sqlitecloud.io/the-sqlite-expert-extension

常用操作函数

https://www.sqlite.org/lang_datefunc.html

https://www.sqlite.org/json1.html

.mode

默认情况下 sqlite-cli 的展示格式为 list，即多个值以 | 分隔、不输出表头

为了好看，一般设置 .mode table

可以通过文档查看更多 mode

.dump / .recover

Use the ".dump" command to convert the entire contents of a database into a single UTF-8 text file. This file can be converted back into a database by piping it back into sqlite3.

Like the ".dump" command, ".recover" attempts to convert the entire contents of a database file to text. The difference is that instead of reading data using the normal SQL database interface, ".recover" attempts to reassemble the database based on data extracted directly from as many database pages as possible. If the database is corrupt, ".recover" is usually able to recover data from all uncorrupted parts of the database, whereas ".dump" stops when the first sign of corruption is encountered. .

If the ".recover" command recovers one or more rows that it cannot attribute to any database table, the output script creates a "lost_and_found" table to store the orphaned rows.

值得阅读的重要文档

https://www.sqlite.org/cli.html

https://www.sqlite.org/limits.html

https://www.sqlite.org/fileformat2.html https://fly.io/blog/sqlite-internals-btree/ https://torymur.github.io/sqlite-repr/

https://www.sqlite.org/lang_expr.html

https://www.sqlite.org/pragma.html

https://www.sqlite.org/optoverview.html https://www.sqlite.org/queryplanner-ng.html https://www.sqlite.org/queryplanner.html https://www.sqlite.org/eqp.html

https://www.sqlite.org/atomiccommit.html

https://www.sqlite.org/wal.html