POSTGRESQL

De Wiki Clusterlab.com.br
Revisão de 18h13min de 22 de dezembro de 2025 por Damato (discussão | contribs)
(dif) ← Edição anterior | Revisão atual (dif) | Versão posterior → (dif)
Ir para navegação Ir para pesquisar

PostgreSQL Full-Text Search: Concepts, SQL Usage, and Integration with Java and Python

PostgreSQL provides a powerful, native Full-Text Search (FTS) engine designed to efficiently index and search large volumes of textual data. Unlike simple string matching using LIKE or ILIKE, PostgreSQL Full-Text Search performs linguistic analysis, tokenization, stemming, stop-word removal, and relevance ranking.

This article starts by introducing Full-Text Search directly at the SQL level, then explains its internal concepts, and finally demonstrates recommended, production-grade usage from Java and Python applications using Spring Data JPA and Peewee.

Getting Started with PostgreSQL Full-Text Search (Pure SQL)

PostgreSQL Full-Text Search is built into the database and can be used without any external libraries or frameworks.

A Basic Example

Consider a simple table storing articles:

CREATE TABLE article (
  id SERIAL PRIMARY KEY,
  title TEXT,
  body TEXT
);

A basic full-text search can be performed by converting text into a tsvector and comparing it with a tsquery:

SELECT *
FROM article
WHERE to_tsvector('portuguese', title || ' ' || body)
      @@ plainto_tsquery('portuguese', 'postgres database');

This query finds rows whose text semantically matches the words “postgres” and “database”, regardless of word order, case, or inflection.

Why Not LIKE?

Traditional SQL searches often rely on:

WHERE body ILIKE '%postgres%'

This approach:

  • Cannot use linguistic analysis
  • Does not scale well on large datasets
  • Fails to rank results by relevance

PostgreSQL Full-Text Search solves these limitations.

Using plainto_tsquery

plainto_tsquery is designed for user input. It automatically:

  • Splits text into terms
  • Applies the language dictionary
  • Combines terms using logical AND
SELECT *
FROM article
WHERE to_tsvector('portuguese', body)
      @@ plainto_tsquery('portuguese', 'powerful database');

Using to_tsquery (Advanced Syntax)

When the query syntax is controlled, to_tsquery allows explicit logical operators:

SELECT *
FROM article
WHERE to_tsvector('portuguese', body)
      @@ to_tsquery('portuguese', 'postgres & (database | data)');

Prefix matching is also supported:

WHERE to_tsvector('portuguese', body)
      @@ to_tsquery('portuguese', 'postgre:*');

Core Concepts of PostgreSQL Full-Text Search

PostgreSQL Full-Text Search relies on two specialized data types:

tsvector
A normalized and indexed representation of a document. Text is tokenized, lowercased, stripped of stop words, and stemmed according to a language dictionary.
tsquery
A structured search query containing terms and logical operators such as AND (&), OR (|), NOT (!), and prefix matching (:*).

Example Transformation

The text:

PostgreSQL is a powerful database

may be transformed into a tsvector similar to:

'databas':4 'power':3 'postgresql':1

A user query such as:

powerful database

becomes a tsquery equivalent to:

'power' & 'databas'

The @@ Operator

The @@ operator is the central operator of PostgreSQL Full-Text Search:

tsvector @@ tsquery

It evaluates whether a document matches a full-text query and returns a boolean result.

Conceptually, it can be read as:

"Does this indexed document match this search query?"

The @@ operator:

  • Executes the full-text match
  • Enables the use of GIN or GiST indexes
  • Is required for ranking and highlighting
  • Has no equivalent in standard SQL or JPQL

Without @@, PostgreSQL Full-Text Search cannot function.

Indexing with GIN

For performance, PostgreSQL Full-Text Search relies on inverted indexes, most commonly GIN (Generalized Inverted Index).

A recommended approach is to store a dedicated tsvector column and index it:

ALTER TABLE article
ADD COLUMN fts tsvector
GENERATED ALWAYS AS (
  setweight(to_tsvector('portuguese', coalesce(title,'')), 'A') ||
  setweight(to_tsvector('portuguese', coalesce(body,'')), 'B')
) STORED;

CREATE INDEX idx_article_fts
ON article USING GIN (fts);

This approach avoids triggers and guarantees consistency.

Ranking and Highlighting

PostgreSQL provides ranking functions such as ts_rank and ts_rank_cd to score documents by relevance.

Highlighting is handled by ts_headline, which extracts and emphasizes relevant fragments from the original text.

Using PostgreSQL Full-Text Search from Java (Spring Data JPA)

Spring Data JPA does not support PostgreSQL Full-Text Search operators directly in JPQL. The recommended approach is to use native SQL queries while keeping entity management and CRUD operations within JPA.

Entity Mapping

@Entity
@Table(name = "article")
public class Article {

  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  private String title;

  @Column(columnDefinition = "text")
  private String body;

  @Column(name = "fts", insertable = false, updatable = false)
  private String fts;
}

Repository with Full-Text Search

public interface ArticleRepository extends JpaRepository<Article, Long> {

  @Query(value = """
    SELECT a.*
    FROM article a
    WHERE a.fts @@ plainto_tsquery('portuguese', :query)
    ORDER BY ts_rank(a.fts, plainto_tsquery('portuguese', :query)) DESC
    """, nativeQuery = true)
  List<Article> search(@Param("query") String query);
}

Using PostgreSQL Full-Text Search from Python (Peewee)

Peewee is a lightweight ORM that allows direct access to PostgreSQL operators and functions, making it well-suited for Full-Text Search when used with explicit SQL expressions.

Model Definition

from peewee import *
from playhouse.postgres_ext import PostgresqlExtDatabase

db = PostgresqlExtDatabase("example_db")

class BaseModel(Model):
    class Meta:
        database = db

class Article(BaseModel):
    title = TextField()
    body = TextField()
    fts = TextField(null=True)

Full-Text Search Query

from peewee import fn, SQL

query_text = "postgres database"

query = (
    Article
    .select(
        Article,
        fn.ts_rank(
            Article.fts,
            fn.plainto_tsquery('portuguese', query_text)
        ).alias('rank')
    )
    .where(
        Article.fts.op('@@')(
            fn.plainto_tsquery('portuguese', query_text)
        )
    )
    .order_by(SQL('rank').desc())
)

for article in query:
    print(article.title, article.rank)

ORM vs Search Engines

PostgreSQL Full-Text Search is sufficient for most applications, including:

  • Blog and article search
  • Documentation systems
  • Administrative and internal tools
  • Moderate autocomplete requirements

Dedicated search engines such as Elasticsearch or OpenSearch are recommended only when requirements include advanced linguistic analysis, distributed indexing, or large-scale search analytics.

Best Practices

  • Always use a dedicated tsvector column
  • Always create a GIN index on the tsvector
  • Prefer plainto_tsquery or websearch_to_tsquery for user input
  • Use to_tsquery only when the query syntax is controlled
  • Keep Full-Text Search logic inside PostgreSQL
  • Use ORM abstractions for CRUD and native SQL for search

Conclusion

PostgreSQL Full-Text Search is a mature, efficient, and production-ready solution that integrates naturally with both Java and Python ecosystems.

By starting with SQL fundamentals and layering ORM integration on top, applications can achieve high-performance, semantically rich search without the operational complexity of external search engines.

Web-Style Search with websearch_to_tsquery

PostgreSQL provides websearch_to_tsquery, a function designed to accept search strings written in a format similar to modern web search engines (such as Google).

Unlike to_tsquery, which requires explicit operators, and plainto_tsquery, which always combines terms with AND, websearch_to_tsquery supports a more expressive and user-friendly syntax.

Supported Syntax

websearch_to_tsquery understands:

  • Space-separated terms (implicit AND)
  • OR using the keyword OR
  • Quoted phrases
  • Unary minus for negation

Examples

Basic search (implicit AND):

SELECT *
FROM article
WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres database');

OR operator:

WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres OR mysql');

Quoted phrase:

WHERE fts @@ websearch_to_tsquery('portuguese', '"full text search"');

Negation:

WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres -mysql');

Why websearch_to_tsquery Matters

This function is particularly well-suited for direct user input because:

  • It avoids syntax errors common with to_tsquery
  • It feels natural to end users
  • It reduces the need for custom query parsing in the application layer

For most user-facing search boxes, websearch_to_tsquery is the recommended choice.

FTS vs pg_trgm

PostgreSQL offers two powerful but fundamentally different approaches to text search: Full-Text Search (FTS) and trigram similarity via the pg_trgm extension.

Understanding their differences is essential for choosing the correct tool.

pg_trgm Overview

The pg_trgm extension indexes text by breaking it into overlapping sequences of three characters (trigrams). It enables fast similarity searches and partial matching.

To enable it:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

A typical trigram index:

CREATE INDEX idx_article_trgm
ON article
USING GIN (body gin_trgm_ops);

Example query:

SELECT *
FROM article
WHERE body ILIKE '%postgre%';

Key Differences

Feature PostgreSQL FTS (@@) pg_trgm
Linguistic analysis Yes (stemming, stop words) No
Word-based search Yes No
Partial word matching Limited (:*) Yes
Typo tolerance Low High
Ranking by relevance Yes (ts_rank) Limited (similarity)
Language awareness Yes No
Index size Moderate Larger
Best use case Semantic text search Fuzzy / partial matching

= When to Use Full-Text Search

Use PostgreSQL Full-Text Search when:

  • Searching long documents or articles
  • Language semantics matter
  • You need relevance ranking
  • Queries are word-based
  • You want deterministic, explainable results

= When to Use pg_trgm

Use pg_trgm when:

  • Implementing autocomplete
  • Supporting typo-tolerant search
  • Matching substrings or partial words
  • Searching identifiers, codes, or names

Combining FTS and pg_trgm

In many real-world systems, the best solution is to combine both approaches.

Example:

  • Use FTS (@@) for main content search
  • Use pg_trgm for autocomplete or fallback fuzzy matching
  • Merge or rank results at the application layer

This hybrid approach often delivers search quality comparable to dedicated search engines while keeping all logic inside PostgreSQL.