Vous n'êtes pas identifié(e).
Pages : 1
Effectivement le jit=off améliore beaucoup les performances merci.
Dans quel contexte la vue matérialisée aurait-elle eu du sens ?
Optimisation filtre géo distance avec PostGIS
Bonjour!
Je suis en train d'essayer d'optimiser une requête PostGIS et j'aurais voulu avoir quelques conseils/indications sur comment améliorer encore les performances.
Le but est de filtrer une table avec environ 800k items, afin d'obtenir les éléments à moins de X km d'un point (tous en France), et de calculer leur distance par rapport à ce point.
La table ressemble à ça:
CREATE TABLE IF NOT EXISTS general.worker
(
id varchar(20) PRIMARY KEY,
location point
...
);
Dans ce cas d'exemple, j'essaie de récupérer tous les éléments à moins de 5km d'un point donné.
=> 1ère requête:
select
id,
ST_Distance(location::geometry, ST_Point(2.34, 48.85)) as distance
from general.worker
where ST_DWithin(
ST_Transform(ST_SetSRID(location::geometry, 4326), 2154),
ST_Transform(ST_SetSRID(ST_Point(2.34, 48.85), 4326), 2154),
5000) is True
Ce qui me donne l'EXPLAIN suivant
"Gather (cost=1000.00..10204378.61 rows=88 width=8) (actual time=199.272..952.684 rows=32183 loops=1)"
" Output: (st_distance((location)::geometry, '0101000000B81E85EB51B80240CDCCCCCCCC6C4840'::geometry))"
" Workers Planned: 2"
" Workers Launched: 2"
" JIT for worker 0:"
" Functions: 4"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 1.469 ms, Inlining 155.731 ms, Optimization 110.002 ms, Emission 67.503 ms, Total 334.705 ms"
" JIT for worker 1:"
" Functions: 4"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 1.506 ms, Inlining 143.655 ms, Optimization 109.081 ms, Emission 49.523 ms, Total 303.766 ms"
" Buffers: shared hit=791 read=13298"
" -> Parallel Seq Scan on general.worker (cost=0.00..10203369.81 rows=37 width=8) (actual time=317.516..861.207 rows=10728 loops=3)"
" Output: st_distance((location)::geometry, '0101000000B81E85EB51B80240CDCCCCCCCC6C4840'::geometry)"
" Filter: (st_dwithin(st_transform(st_setsrid((worker.location)::geometry, 4326), 2154), '01010000206A08000021C9F8845FE223416E6108347F2C5A41'::geometry, '5000'::double precision) IS TRUE)"
" Rows Removed by Filter: 283335"
" Buffers: shared hit=791 read=13298"
" Worker 0: actual time=415.216..832.513 rows=8414 loops=1"
" Buffers: shared hit=385 read=3708"
" Worker 1: actual time=338.313..828.633 rows=9486 loops=1"
" Buffers: shared hit=352 read=3946"
"Planning Time: 19.797 ms"
"JIT:"
" Functions: 12"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 4.590 ms, Inlining 320.388 ms, Optimization 330.759 ms, Emission 164.167 ms, Total 819.903 ms"
"Execution Time: 956.960 ms"
=> 2e requête:
Pour optimiser j'ai tenté de créer une materialized view avec des indexes sur la location et de CLUSTER le tout.
CREATE MATERIALIZED VIEW general.worker_geo AS
(
select
id,
location,
ST_Transform(ST_SetSRID(location::geometry, 4326), 2154) as location_geom
from general.worker
);
CREATE INDEX worker_geo_location_geom_index
ON general.worker_geo USING GIST(location_geom);
CREATE UNIQUE INDEX worker_geo_worker_id_index
ON general.worker_geo (id);
CLUSTER general.worker_geo USING worker_geo_location_geom_index;
Pour ensuite faire
select
ST_Distance(location_geom, ST_Transform(ST_SetSRID(ST_Point(2.34, 48.85), 4326), 2154))
from general.worker_geo w
where ST_DWithin(
location_geom,
ST_Transform(ST_SetSRID(ST_Point(2.34, 48.85), 4326), 2154),
5000) is True
Ce qui me donne l'EXPLAIN suivant
Gather (cost=1000.00..8740426.28 rows=84 width=8) (actual time=153.547..505.123 rows=29478 loops=1)
" Output: (st_distance(location_geom, '01010000206A08000021C9F8845FE223416E6108347F2C5A41'::geometry))"
Workers Planned: 2
Workers Launched: 2
JIT for worker 0:
Functions: 4
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 1.156 ms, Inlining 118.552 ms, Optimization 94.700 ms, Emission 43.580 ms, Total 257.987 ms"
JIT for worker 1:
Functions: 4
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 1.289 ms, Inlining 117.147 ms, Optimization 90.090 ms, Emission 42.265 ms, Total 250.791 ms"
Buffers: shared read=9115 written=1
-> Parallel Seq Scan on general.worker_geo w (cost=0.00..8739417.88 rows=35 width=8) (actual time=220.390..403.979 rows=9826 loops=3)
" Output: st_distance(location_geom, '01010000206A08000021C9F8845FE223416E6108347F2C5A41'::geometry)"
" Filter: (st_dwithin(w.location_geom, '01010000206A08000021C9F8845FE223416E6108347F2C5A41'::geometry, '5000'::double precision) IS TRUE)"
Rows Removed by Filter: 269404
Buffers: shared read=9115 written=1
Worker 0: actual time=257.416..364.960 rows=8246 loops=1
Buffers: shared read=2163
Worker 1: actual time=250.417..363.704 rows=8226 loops=1
Buffers: shared read=2272
Planning Time: 40.821 ms
JIT:
Functions: 12
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 3.848 ms, Inlining 248.722 ms, Optimization 269.845 ms, Emission 140.212 ms, Total 662.626 ms"
Execution Time: 508.428 ms
Est-il possible d'améliorer encore les performances ?
Merci!
Ah d'accord, je dois avouer que je suis novice là dessus. Je dois jouer avec jit_optimize_above_cost et trouver le point qui correspond le mieux ?
Quelles sont les meilleurs pratiques ?
Après enquête, c'est le
jit=on
par défaut de PG 12 qui me ralentissait. Mes requêtes sont toutes beaucoup plus rapides maintenant.
En effet je suis confus...
En fait ce n'est pas la requête en elle même qui prend du temps mais mon client (une lib python) à priori pour récupérer les différents éléments...
Désolé du dérangement du coup !
Bonjour à tous,
Je me permets de vous poser une question concernant une requête qui me prend beaucoup plus de temps que raisonnable (env 3 secondes).
Je dois avouer que je ne comprends pas pourquoi elle prend si longtemps. J'ai l'impression que c'est le "lateral" qui pose problème sans vraiment être sûr à 100% ni savoir pourquoi.
Si vous avez des pistes d'améliorations je suis preneur !
Voici la requête en question:
with _test as (
SELECT ex.id,
ex.user_id,
COALESCE(tags.tags, '{}') as tags,
COALESCE(muscles.muscles, '{}') as muscles
from exercises ex,
lateral (
SELECT array_agg(t.id) as tags
from tags t
inner join exercise_tags e on t.id = e.tag_id
where e.exercise_id = ex.id
) tags,
lateral (
SELECT array_agg(t.id) as muscles
FROM muscles t
inner join exercise_muscles e on t.id = e.muscle_id
where e.exercise_id = ex.id
) muscles
)
SELECT * from _test ex
where user_id = $1
Les différentes tables (simplifiées) sont:
Exercises (~500 éléments)
CREATE TABLE IF NOT EXISTS exercises
(
id UUID PRIMARY KEY NOT NULL,
user_id UUID NOT NULL,
name VARCHAR(255) NOT NULL,
deleted BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_exercises_name_lower_unique
ON exercises (user_id, lower(name)) where deleted is false;
Muscles (~18 éléments)
CREATE TABLE IF NOT EXISTS muscles
(
id SERIAL PRIMARY KEY NOT NULL,
body_part VARCHAR(30) NOT NULL,
muscle_group VARCHAR(30) NOT NULL,
UNIQUE (body_part, muscle_group)
);
Tags (~100 éléments)
CREATE TABLE IF NOT EXISTS tags
(
id UUID PRIMARY KEY NOT NULL,
label VARCHAR(255) NOT NULL,
user_id UUID NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT now(),
data_type varchar(50) NOT NULL,
UNIQUE (label, data_type, user_id)
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_tags_label_lower_unique
ON general.tags (lower(label), data_type, user_id);
ainsi que les différentes tables de jointure:
CREATE TABLE IF NOT EXISTS exercise_tags (
tag_id UUID NOT NULL REFERENCES tags,
exercise_id UUID NOT NULL REFERENCES exercises,
PRIMARY KEY(tag_id, exercise_id)
};
CREATE TABLE IF NOT EXISTS exercise_muscles (
muscle_id INTEGER NOT NULL REFERENCES muscles,
exercise_id UUID NOT NULL REFERENCES exercises,
PRIMARY KEY(muscle_id, exercise_id)
};
Voici à quoi ressemble graphiquement le EXPLAIN
| Nested Loop (cost=10.53..1312.70 rows=123 width=96) (actual time=0.128..8.321 rows=123 loops=1) |
| -> Nested Loop (cost=4.29..541.54 rows=123 width=64) (actual time=0.057..3.378 rows=123 loops=1) |
| -> Seq Scan on exercises ex (cost=0.00..10.09 rows=123 width=32) (actual time=0.011..0.068 rows=123 loops=1) |
| Filter: (user_id = 'd0792a59-686c-461c-bf6f-6f886e86af0a'::uuid) |
| Rows Removed by Filter: 124 |
| -> Aggregate (cost=4.29..4.30 rows=1 width=32) (actual time=0.026..0.026 rows=1 loops=123) |
| -> Hash Join (cost=2.30..4.28 rows=2 width=16) (actual time=0.021..0.024 rows=1 loops=123) |
| Hash Cond: (t.id = e.tag_id) |
| -> Seq Scan on tags t (cost=0.00..1.77 rows=77 width=16) (actual time=0.002..0.009 rows=76 loops=40) |
| -> Hash (cost=2.28..2.28 rows=2 width=16) (actual time=0.016..0.016 rows=1 loops=123) |
| Buckets: 1024 Batches: 1 Memory Usage: 8kB |
| -> Seq Scan on exercise_tags e (cost=0.00..2.28 rows=2 width=16) (actual time=0.013..0.014 rows=1 loops=123) |
| Filter: (exercise_id = ex.id) |
| Rows Removed by Filter: 101 |
| -> Aggregate (cost=6.24..6.25 rows=1 width=32) (actual time=0.039..0.039 rows=1 loops=123) |
| -> Hash Join (cost=4.99..6.23 rows=4 width=4) (actual time=0.037..0.037 rows=0 loops=123) |
| Hash Cond: (t_1.id = e_1.muscle_id) |
| -> Seq Scan on muscles t_1 (cost=0.00..1.18 rows=18 width=4) (actual time=0.001..0.003 rows=18 loops=10) |
| -> Hash (cost=4.94..4.94 rows=4 width=4) (actual time=0.035..0.035 rows=0 loops=123) |
| Buckets: 1024 Batches: 1 Memory Usage: 8kB |
| -> Seq Scan on exercise_muscles e_1 (cost=0.00..4.94 rows=4 width=4) (actual time=0.033..0.034 rows=0 loops=123) |
| Filter: (exercise_id = ex.id) |
| Rows Removed by Filter: 235 |
| Planning Time: 1.316 ms |
| Execution Time: 8.388 ms |
Pages : 1