POSTGRESQL
PostgreSQL Full-Text Search: Concepts, SQL Usage, and Integration with Java and Python
PostgreSQL provides a powerful, native Full-Text Search (FTS) engine designed to efficiently index and search large volumes of textual data. Unlike simple string matching using LIKE or ILIKE, PostgreSQL Full-Text Search performs linguistic analysis, tokenization, stemming, stop-word removal, and relevance ranking.
This article starts by introducing Full-Text Search directly at the SQL level, then explains its internal concepts, and finally demonstrates recommended, production-grade usage from Java and Python applications using Spring Data JPA and Peewee.
Getting Started with PostgreSQL Full-Text Search (Pure SQL)
PostgreSQL Full-Text Search is built into the database and can be used without any external libraries or frameworks.
A Basic Example
Consider a simple table storing articles:
CREATE TABLE article (
id SERIAL PRIMARY KEY,
title TEXT,
body TEXT
);
A basic full-text search can be performed by converting text into a tsvector and comparing it with a tsquery:
SELECT *
FROM article
WHERE to_tsvector('portuguese', title || ' ' || body)
@@ plainto_tsquery('portuguese', 'postgres database');
This query finds rows whose text semantically matches the words “postgres” and “database”, regardless of word order, case, or inflection.
Why Not LIKE?
Traditional SQL searches often rely on:
WHERE body ILIKE '%postgres%'
This approach:
- Cannot use linguistic analysis
- Does not scale well on large datasets
- Fails to rank results by relevance
PostgreSQL Full-Text Search solves these limitations.
Using plainto_tsquery
plainto_tsquery is designed for user input. It automatically:
- Splits text into terms
- Applies the language dictionary
- Combines terms using logical AND
SELECT *
FROM article
WHERE to_tsvector('portuguese', body)
@@ plainto_tsquery('portuguese', 'powerful database');
Using to_tsquery (Advanced Syntax)
When the query syntax is controlled, to_tsquery allows explicit logical operators:
SELECT *
FROM article
WHERE to_tsvector('portuguese', body)
@@ to_tsquery('portuguese', 'postgres & (database | data)');
Prefix matching is also supported:
WHERE to_tsvector('portuguese', body)
@@ to_tsquery('portuguese', 'postgre:*');
Core Concepts of PostgreSQL Full-Text Search
PostgreSQL Full-Text Search relies on two specialized data types:
- tsvector
- A normalized and indexed representation of a document. Text is tokenized, lowercased, stripped of stop words, and stemmed according to a language dictionary.
- tsquery
- A structured search query containing terms and logical operators such as AND (&), OR (|), NOT (!), and prefix matching (:*).
Example Transformation
The text:
PostgreSQL is a powerful database
may be transformed into a tsvector similar to:
'databas':4 'power':3 'postgresql':1
A user query such as:
powerful database
becomes a tsquery equivalent to:
'power' & 'databas'
The @@ Operator
The @@ operator is the central operator of PostgreSQL Full-Text Search:
tsvector @@ tsquery
It evaluates whether a document matches a full-text query and returns a boolean result.
Conceptually, it can be read as:
"Does this indexed document match this search query?"
The @@ operator:
- Executes the full-text match
- Enables the use of GIN or GiST indexes
- Is required for ranking and highlighting
- Has no equivalent in standard SQL or JPQL
Without @@, PostgreSQL Full-Text Search cannot function.
Indexing with GIN
For performance, PostgreSQL Full-Text Search relies on inverted indexes, most commonly GIN (Generalized Inverted Index).
A recommended approach is to store a dedicated tsvector column and index it:
ALTER TABLE article
ADD COLUMN fts tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('portuguese', coalesce(title,'')), 'A') ||
setweight(to_tsvector('portuguese', coalesce(body,'')), 'B')
) STORED;
CREATE INDEX idx_article_fts
ON article USING GIN (fts);
This approach avoids triggers and guarantees consistency.
Ranking and Highlighting
PostgreSQL provides ranking functions such as ts_rank and ts_rank_cd to score documents by relevance.
Highlighting is handled by ts_headline, which extracts and emphasizes relevant fragments from the original text.
Using PostgreSQL Full-Text Search from Java (Spring Data JPA)
Spring Data JPA does not support PostgreSQL Full-Text Search operators directly in JPQL. The recommended approach is to use native SQL queries while keeping entity management and CRUD operations within JPA.
Entity Mapping
@Entity
@Table(name = "article")
public class Article {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String title;
@Column(columnDefinition = "text")
private String body;
@Column(name = "fts", insertable = false, updatable = false)
private String fts;
}
Repository with Full-Text Search
public interface ArticleRepository extends JpaRepository<Article, Long> {
@Query(value = """
SELECT a.*
FROM article a
WHERE a.fts @@ plainto_tsquery('portuguese', :query)
ORDER BY ts_rank(a.fts, plainto_tsquery('portuguese', :query)) DESC
""", nativeQuery = true)
List<Article> search(@Param("query") String query);
}
Using PostgreSQL Full-Text Search from Python (Peewee)
Peewee is a lightweight ORM that allows direct access to PostgreSQL operators and functions, making it well-suited for Full-Text Search when used with explicit SQL expressions.
Model Definition
from peewee import *
from playhouse.postgres_ext import PostgresqlExtDatabase
db = PostgresqlExtDatabase("example_db")
class BaseModel(Model):
class Meta:
database = db
class Article(BaseModel):
title = TextField()
body = TextField()
fts = TextField(null=True)
Full-Text Search Query
from peewee import fn, SQL
query_text = "postgres database"
query = (
Article
.select(
Article,
fn.ts_rank(
Article.fts,
fn.plainto_tsquery('portuguese', query_text)
).alias('rank')
)
.where(
Article.fts.op('@@')(
fn.plainto_tsquery('portuguese', query_text)
)
)
.order_by(SQL('rank').desc())
)
for article in query:
print(article.title, article.rank)
ORM vs Search Engines
PostgreSQL Full-Text Search is sufficient for most applications, including:
- Blog and article search
- Documentation systems
- Administrative and internal tools
- Moderate autocomplete requirements
Dedicated search engines such as Elasticsearch or OpenSearch are recommended only when requirements include advanced linguistic analysis, distributed indexing, or large-scale search analytics.
Best Practices
- Always use a dedicated
tsvectorcolumn - Always create a GIN index on the
tsvector - Prefer
plainto_tsqueryorwebsearch_to_tsqueryfor user input - Use
to_tsqueryonly when the query syntax is controlled - Keep Full-Text Search logic inside PostgreSQL
- Use ORM abstractions for CRUD and native SQL for search
Conclusion
PostgreSQL Full-Text Search is a mature, efficient, and production-ready solution that integrates naturally with both Java and Python ecosystems.
By starting with SQL fundamentals and layering ORM integration on top, applications can achieve high-performance, semantically rich search without the operational complexity of external search engines.
Web-Style Search with websearch_to_tsquery
PostgreSQL provides websearch_to_tsquery, a function designed to accept search strings written in a format similar to modern web search engines (such as Google).
Unlike to_tsquery, which requires explicit operators, and plainto_tsquery, which always combines terms with AND, websearch_to_tsquery supports a more expressive and user-friendly syntax.
Supported Syntax
websearch_to_tsquery understands:
- Space-separated terms (implicit AND)
- OR using the keyword
OR - Quoted phrases
- Unary minus for negation
Examples
Basic search (implicit AND):
SELECT *
FROM article
WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres database');
OR operator:
WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres OR mysql');
Quoted phrase:
WHERE fts @@ websearch_to_tsquery('portuguese', '"full text search"');
Negation:
WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres -mysql');
Why websearch_to_tsquery Matters
This function is particularly well-suited for direct user input because:
- It avoids syntax errors common with
to_tsquery - It feels natural to end users
- It reduces the need for custom query parsing in the application layer
For most user-facing search boxes, websearch_to_tsquery is the recommended choice.
FTS vs pg_trgm
PostgreSQL offers two powerful but fundamentally different approaches to text search: Full-Text Search (FTS) and trigram similarity via the pg_trgm extension.
Understanding their differences is essential for choosing the correct tool.
pg_trgm Overview
The pg_trgm extension indexes text by breaking it into overlapping sequences of three characters (trigrams). It enables fast similarity searches and partial matching.
To enable it:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
A typical trigram index:
CREATE INDEX idx_article_trgm
ON article
USING GIN (body gin_trgm_ops);
Example query:
SELECT *
FROM article
WHERE body ILIKE '%postgre%';
Key Differences
| Feature | PostgreSQL FTS (@@) | pg_trgm |
|---|---|---|
| Linguistic analysis | Yes (stemming, stop words) | No |
| Word-based search | Yes | No |
| Partial word matching | Limited (:*) | Yes |
| Typo tolerance | Low | High |
| Ranking by relevance | Yes (ts_rank) | Limited (similarity) |
| Language awareness | Yes | No |
| Index size | Moderate | Larger |
| Best use case | Semantic text search | Fuzzy / partial matching |
= When to Use Full-Text Search
Use PostgreSQL Full-Text Search when:
- Searching long documents or articles
- Language semantics matter
- You need relevance ranking
- Queries are word-based
- You want deterministic, explainable results
= When to Use pg_trgm
Use pg_trgm when:
- Implementing autocomplete
- Supporting typo-tolerant search
- Matching substrings or partial words
- Searching identifiers, codes, or names
Combining FTS and pg_trgm
In many real-world systems, the best solution is to combine both approaches.
Example:
- Use FTS (
@@) for main content search - Use
pg_trgmfor autocomplete or fallback fuzzy matching - Merge or rank results at the application layer
This hybrid approach often delivers search quality comparable to dedicated search engines while keeping all logic inside PostgreSQL.