JSON improvements in SQLite 3.38.0
SQLite 3.38.0 introduced improvements to JSON query syntax using ->
and ->>
operators that are similar to PostgreSQL JSON functions. In this post we will look into how this simplifies the query syntax.
Installation
The JSON functions are now built-ins. It is no longer necessary to use the -DSQLITE_ENABLE_JSON1 compile-time option to enable JSON support. JSON is on by default. Disable the JSON interface using the new -DSQLITE_OMIT_JSON compile-time option.
With the release of 3.38.0 JSON support is on by default. SQLite also provides pre-built binaries for Linux, Windows and Mac. For Linux you can get started with below. For other platforms please visit downloads page.
wget https://www.sqlite.org/2022/sqlite-tools-linux-x86-3380000.zip
unzip sqlite-tools-linux-x86-3380000.zip
cd sqlite-tools-linux-x86-3380000
rlwrap ./sqlite3
I found that support for readline is missing in the prebuilt binaries. If readline and other build utilities are installed in your machine you can download and compile SQLite binary.
wget https://www.sqlite.org/2022/sqlite-autoconf-3380000.tar.gz
tar -xvf sqlite-autoconf-3380000.tar.gz
cd sqlite-autoconf-3380000
./configure
make
./sqlite3
JSON enhancements proposal thread in SQLite forum
Sample data
In this post we will be using a sample data that stores the interests of a user as a JSON and see how the new operators provide support in querying. The table has an auto incrementing id as primary key with name as text and a JSON storing interests of the user. Let’s also assume the array of likes are stored in the order of preferences such that for John he likes skating the most and swimming as the last.
./sqlite3
SQLite version 3.38.0 2022-02-22 18:58:40
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table user(id integer primary key, name text, interests json);
sqlite> insert into user values(null, "John", '{"likes": ["skating", "reading", "swimming"], "dislikes": ["cooking"]}');
sqlite> insert into user values(null, "Kate", '{"likes": ["reading", "swimming"], "dislikes": ["skating"]}');
sqlite> insert into user values(null, "Jim", '{"likes": ["reading", "swimming"], "dislikes": ["cooking"]}');
sqlite> .mode column
sqlite> select * from user;
id name interests
-- ---- ------------------------------------------------------------
1 John {"likes": ["skating", "reading", "swimming"], "dislikes": ["cooking"]}
2 Kate {"likes": ["reading", "swimming"], "dislikes": ["skating"]}
3 Jim {"likes": ["reading", "swimming"], "dislikes": ["cooking"]}
Operator semantics
With the above schema in place we can now use the JSON operators to query. With ->
we can have the left part as a JSON component and the right part as a path expression. ->
also provides a JSON representation in return so it can be chained in nested lookups. In the below example we look for users who have reading as their first preference. Since the interests column is of JSON type we can use ->
operator along with $.likes
which is a path expression to get the likes attribute. Then we can pass it to ->>
which also takes a path expression to its right but returns value as SQLite datatype like text, integer etc. instead of JSON. Here $[0]
is used to access the first element of the array. As per docs path also supports negative indexing where $[#-1]
can be used to access last element of the array.
select id, name, interests from user
where interests->'$.likes'->>'$[0]' = 'reading';
id name interests
-- ---- -----------------------------------------------------------
2 Kate {"likes": ["reading", "swimming"], "dislikes": ["cooking"]}
3 Jim {"likes": ["reading", "swimming"], "dislikes": ["cooking"]}
select id, name, interests from user
where interests->'$.likes'->>'$[0]' = 'skating';
id name interests
-- ---- ------------------------------------------------------------
1 John {"likes": ["skating", "reading", "swimming"], "dislikes": ["cooking"]}
The above query can be further simplified removing $
and for integers even removing quotes.
select id, name, interests from user
where interests->'likes'->>'[0]' = 'skating';
id name interests
-- ---- ------------------------------------------------------------
1 John {"likes": ["skating", "reading", "swimming"], "dislikes": ["cooking"]}
-- As per reddit comment in r/sql for integers even [] is optional
-- Rules https://github.com/sqlite/sqlite/blob/a0318fd7b4fbedbce74f133fb0f84ff4a19ea075/src/json.c#L1554
select id, name, interests from user
where interests->'likes'->>0 = 'skating';
id name interests
-- ---- ------------------------------------------------------------
1 John {"likes": ["skating", "reading", "swimming"], "dislikes": ["cooking"]}
The catch with ->
and ->>
is that ->
returns a JSON representation and it also expects the right side to be a JSON. This might lead to some confusion like below example where using ->[0]
doesn’t return any value since we are comparing JSON representation with text value of “skating”
-- Returns no value
select id, name, interests from user
where interests->'likes'->'[0]' = 'skating';
-- Returns value
select id, name, interests from user
where interests->'likes'->'[0]' = '"skating"';
id name interests
-- ---- ------------------------------------------------------------
1 John {"likes": ["skating", "reading", "swimming"], "dislikes": ["cooking"]}
-- Returns value too with the string represented as JSON
select id, name, interests from user
where interests->'likes'->'[0]' = json_quote('skating');
id name interests
-- ---- ------------------------------------------------------------
1 John {"likes": ["skating", "reading", "swimming"], "dislikes": ["cooking"]}
We can also the operators to get specific fields like queries.
-- List of first and last preference of users with skating as their first preference
select id, name,
interests->'likes'->>'[0]' as first_preference,
interests->'likes'->>'$[#-1]' as last_preference
from user where interests->'likes'->>'[0]' = 'skating';
id name first_preference last_preference
-- ---- ---------------- ---------------
1 John skating swimming
We can also use the operators in sorting and grouping
-- List of users sorted by their first preference
select id, name,
interests->'likes'->>'[0]' as first_preference
interests->'likes'->>'$[#-1]' as last_preference
from user order by interests->'likes'->>'[0]';
id name first_preference last_preference
-- ---- ---------------- ---------------
2 Kate reading swimming
3 Jim reading swimming
1 John skating swimming
-- List of users grouped by their first preference
select id, name,
interests->'likes'->>'[0]' as first_preference,
interests->'likes'->>'$[#-1]' as last_preference
from user group by interests->'likes'->>'[0]';
id name first_preference last_preference
-- ---- ---------------- ---------------
2 Kate reading swimming
1 John skating swimming
Indexing
Index can also be created for a given expression thus making the query efficient. Once we create an index for querying by first preference the index is used for
-- Query plan without index
explain query plan select id, name, interests from user
where interests->'likes'->>'[0]' = 'skating';
QUERY PLAN
--SCAN user
-- Create index on first preference of a user
create index idx_first_preference on user(interests->'likes'->>'[0]');
explain query plan select id, name, interests from user
where interests->'likes'->>'[0]' = 'skating';
QUERY PLAN
--SEARCH user USING INDEX idx_first_preference (<expr>=?)
explain query plan select id, name, interests from user
where interests->'likes'->>'[1]' = 'skating';
QUERY PLAN
--SCAN user
Conclusion
Enabling JSON by default and the new operators in 3.38.0 improve adoption and ergonomics of using JSON in SQLite. PostgreSQL has more rich query support which will be hopefully added in future releases.