POSTGRESQL: mudanças entre as edições

De Wiki Clusterlab.com.br
Ir para navegação Ir para pesquisar
(Criou página com '=PostgreSQL= <pre> \l /dt create database zproxy create user zproxy with password 'xpto'; grant all privileges on database zproxy to zproxy; \connect zproxy \password zproxy <...')
 
Sem resumo de edição
 
Linha 1: Linha 1:
=PostgreSQL=
= PostgreSQL Full-Text Search: Concepts, SQL Usage, and Integration with Java and Python =
<pre>
 
\l
PostgreSQL provides a powerful, native Full-Text Search (FTS) engine designed to efficiently index and search large volumes of textual data. Unlike simple string matching using LIKE or ILIKE, PostgreSQL Full-Text Search performs linguistic analysis, tokenization, stemming, stop-word removal, and relevance ranking.
/dt
 
create database zproxy
This article starts by introducing Full-Text Search directly at the SQL level, then explains its internal concepts, and finally demonstrates recommended, production-grade usage from Java and Python applications using Spring Data JPA and Peewee.
create user zproxy with password 'xpto';
 
grant all privileges on database zproxy to zproxy;
== Getting Started with PostgreSQL Full-Text Search (Pure SQL) ==
\connect zproxy
 
\password zproxy
PostgreSQL Full-Text Search is built into the database and can be used without any external libraries or frameworks.
</pre>
 
=== A Basic Example ===
 
Consider a simple table storing articles:
 
<syntaxhighlight lang="sql">
CREATE TABLE article (
  id SERIAL PRIMARY KEY,
  title TEXT,
  body TEXT
);
</syntaxhighlight>
 
A basic full-text search can be performed by converting text into a <code>tsvector</code> and comparing it with a <code>tsquery</code>:
 
<syntaxhighlight lang="sql">
SELECT *
FROM article
WHERE to_tsvector('portuguese', title || ' ' || body)
      @@ plainto_tsquery('portuguese', 'postgres database');
</syntaxhighlight>
 
This query finds rows whose text semantically matches the words “postgres” and “database”, regardless of word order, case, or inflection.
 
=== Why Not LIKE? ===
 
Traditional SQL searches often rely on:
 
<syntaxhighlight lang="sql">
WHERE body ILIKE '%postgres%'
</syntaxhighlight>
 
This approach:
* Cannot use linguistic analysis
* Does not scale well on large datasets
* Fails to rank results by relevance
 
PostgreSQL Full-Text Search solves these limitations.
 
=== Using plainto_tsquery ===
 
<code>plainto_tsquery</code> is designed for user input. It automatically:
* Splits text into terms
* Applies the language dictionary
* Combines terms using logical AND
 
<syntaxhighlight lang="sql">
SELECT *
FROM article
WHERE to_tsvector('portuguese', body)
      @@ plainto_tsquery('portuguese', 'powerful database');
</syntaxhighlight>
 
=== Using to_tsquery (Advanced Syntax) ===
 
When the query syntax is controlled, <code>to_tsquery</code> allows explicit logical operators:
 
<syntaxhighlight lang="sql">
SELECT *
FROM article
WHERE to_tsvector('portuguese', body)
      @@ to_tsquery('portuguese', 'postgres & (database | data)');
</syntaxhighlight>
 
Prefix matching is also supported:
 
<syntaxhighlight lang="sql">
WHERE to_tsvector('portuguese', body)
      @@ to_tsquery('portuguese', 'postgre:*');
</syntaxhighlight>
 
== Core Concepts of PostgreSQL Full-Text Search ==
 
PostgreSQL Full-Text Search relies on two specialized data types:
 
; tsvector
: A normalized and indexed representation of a document. Text is tokenized, lowercased, stripped of stop words, and stemmed according to a language dictionary.
 
; tsquery
: A structured search query containing terms and logical operators such as AND (&), OR (|), NOT (!), and prefix matching (:*).
 
=== Example Transformation ===
 
The text:
 
<code>PostgreSQL is a powerful database</code>
 
may be transformed into a <code>tsvector</code> similar to:
 
<code>'databas':4 'power':3 'postgresql':1</code>
 
A user query such as:
 
<code>powerful database</code>
 
becomes a <code>tsquery</code> equivalent to:
 
<code>'power' & 'databas'</code>
 
== The @@ Operator ==
 
The <code>@@</code> operator is the central operator of PostgreSQL Full-Text Search:
 
<syntaxhighlight lang="sql">
tsvector @@ tsquery
</syntaxhighlight>
 
It evaluates whether a document matches a full-text query and returns a boolean result.
 
Conceptually, it can be read as:
 
<blockquote>
"Does this indexed document match this search query?"
</blockquote>
 
The <code>@@</code> operator:
* Executes the full-text match
* Enables the use of GIN or GiST indexes
* Is required for ranking and highlighting
* Has no equivalent in standard SQL or JPQL
 
Without <code>@@</code>, PostgreSQL Full-Text Search cannot function.
 
== Indexing with GIN ==
 
For performance, PostgreSQL Full-Text Search relies on inverted indexes, most commonly GIN (Generalized Inverted Index).
 
A recommended approach is to store a dedicated <code>tsvector</code> column and index it:
 
<syntaxhighlight lang="sql">
ALTER TABLE article
ADD COLUMN fts tsvector
GENERATED ALWAYS AS (
  setweight(to_tsvector('portuguese', coalesce(title,'')), 'A') ||
  setweight(to_tsvector('portuguese', coalesce(body,'')), 'B')
) STORED;
 
CREATE INDEX idx_article_fts
ON article USING GIN (fts);
</syntaxhighlight>
 
This approach avoids triggers and guarantees consistency.
 
== Ranking and Highlighting ==
 
PostgreSQL provides ranking functions such as <code>ts_rank</code> and <code>ts_rank_cd</code> to score documents by relevance.
 
Highlighting is handled by <code>ts_headline</code>, which extracts and emphasizes relevant fragments from the original text.
 
== Using PostgreSQL Full-Text Search from Java (Spring Data JPA) ==
 
Spring Data JPA does not support PostgreSQL Full-Text Search operators directly in JPQL. The recommended approach is to use native SQL queries while keeping entity management and CRUD operations within JPA.
 
=== Entity Mapping ===
 
<syntaxhighlight lang="java">
@Entity
@Table(name = "article")
public class Article {
 
  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;
 
  private String title;
 
  @Column(columnDefinition = "text")
  private String body;
 
  @Column(name = "fts", insertable = false, updatable = false)
  private String fts;
}
</syntaxhighlight>
 
=== Repository with Full-Text Search ===
 
<syntaxhighlight lang="java">
public interface ArticleRepository extends JpaRepository<Article, Long> {
 
  @Query(value = """
    SELECT a.*
    FROM article a
    WHERE a.fts @@ plainto_tsquery('portuguese', :query)
    ORDER BY ts_rank(a.fts, plainto_tsquery('portuguese', :query)) DESC
    """, nativeQuery = true)
  List<Article> search(@Param("query") String query);
}
</syntaxhighlight>
 
== Using PostgreSQL Full-Text Search from Python (Peewee) ==
 
Peewee is a lightweight ORM that allows direct access to PostgreSQL operators and functions, making it well-suited for Full-Text Search when used with explicit SQL expressions.
 
=== Model Definition ===
 
<syntaxhighlight lang="python">
from peewee import *
from playhouse.postgres_ext import PostgresqlExtDatabase
 
db = PostgresqlExtDatabase("example_db")
 
class BaseModel(Model):
    class Meta:
        database = db
 
class Article(BaseModel):
    title = TextField()
    body = TextField()
    fts = TextField(null=True)
</syntaxhighlight>
 
=== Full-Text Search Query ===
 
<syntaxhighlight lang="python">
from peewee import fn, SQL
 
query_text = "postgres database"
 
query = (
    Article
    .select(
        Article,
        fn.ts_rank(
            Article.fts,
            fn.plainto_tsquery('portuguese', query_text)
        ).alias('rank')
    )
    .where(
        Article.fts.op('@@')(
            fn.plainto_tsquery('portuguese', query_text)
        )
    )
    .order_by(SQL('rank').desc())
)
 
for article in query:
    print(article.title, article.rank)
</syntaxhighlight>
 
== ORM vs Search Engines ==
 
PostgreSQL Full-Text Search is sufficient for most applications, including:
 
* Blog and article search
* Documentation systems
* Administrative and internal tools
* Moderate autocomplete requirements
 
Dedicated search engines such as Elasticsearch or OpenSearch are recommended only when requirements include advanced linguistic analysis, distributed indexing, or large-scale search analytics.
 
== Best Practices ==
 
* Always use a dedicated <code>tsvector</code> column
* Always create a GIN index on the <code>tsvector</code>
* Prefer <code>plainto_tsquery</code> or <code>websearch_to_tsquery</code> for user input
* Use <code>to_tsquery</code> only when the query syntax is controlled
* Keep Full-Text Search logic inside PostgreSQL
* Use ORM abstractions for CRUD and native SQL for search
 
== Conclusion ==
 
PostgreSQL Full-Text Search is a mature, efficient, and production-ready solution that integrates naturally with both Java and Python ecosystems.
 
By starting with SQL fundamentals and layering ORM integration on top, applications can achieve high-performance, semantically rich search without the operational complexity of external search engines.
 
== Web-Style Search with websearch_to_tsquery ==
 
PostgreSQL provides <code>websearch_to_tsquery</code>, a function designed to accept search strings written in a format similar to modern web search engines (such as Google).
 
Unlike <code>to_tsquery</code>, which requires explicit operators, and <code>plainto_tsquery</code>, which always combines terms with AND, <code>websearch_to_tsquery</code> supports a more expressive and user-friendly syntax.
 
=== Supported Syntax ===
 
<code>websearch_to_tsquery</code> understands:
 
* Space-separated terms (implicit AND)
* OR using the keyword <code>OR</code>
* Quoted phrases
* Unary minus for negation
 
=== Examples ===
 
Basic search (implicit AND):
 
<syntaxhighlight lang="sql">
SELECT *
FROM article
WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres database');
</syntaxhighlight>
 
OR operator:
 
<syntaxhighlight lang="sql">
WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres OR mysql');
</syntaxhighlight>
 
Quoted phrase:
 
<syntaxhighlight lang="sql">
WHERE fts @@ websearch_to_tsquery('portuguese', '"full text search"');
</syntaxhighlight>
 
Negation:
 
<syntaxhighlight lang="sql">
WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres -mysql');
</syntaxhighlight>
 
=== Why websearch_to_tsquery Matters ===
 
This function is particularly well-suited for direct user input because:
 
* It avoids syntax errors common with <code>to_tsquery</code>
* It feels natural to end users
* It reduces the need for custom query parsing in the application layer
 
For most user-facing search boxes, <code>websearch_to_tsquery</code> is the recommended choice.
 
== FTS vs pg_trgm ==
 
PostgreSQL offers two powerful but fundamentally different approaches to text search: Full-Text Search (FTS) and trigram similarity via the <code>pg_trgm</code> extension.
 
Understanding their differences is essential for choosing the correct tool.
 
=== pg_trgm Overview ===
 
The <code>pg_trgm</code> extension indexes text by breaking it into overlapping sequences of three characters (trigrams). It enables fast similarity searches and partial matching.
 
To enable it:
 
<syntaxhighlight lang="sql">
CREATE EXTENSION IF NOT EXISTS pg_trgm;
</syntaxhighlight>
 
A typical trigram index:
 
<syntaxhighlight lang="sql">
CREATE INDEX idx_article_trgm
ON article
USING GIN (body gin_trgm_ops);
</syntaxhighlight>
 
Example query:
 
<syntaxhighlight lang="sql">
SELECT *
FROM article
WHERE body ILIKE '%postgre%';
</syntaxhighlight>
 
=== Key Differences ===
 
{| class="wikitable"
! Feature !! PostgreSQL FTS (@@) !! pg_trgm
|-
| Linguistic analysis || Yes (stemming, stop words) || No
|-
| Word-based search || Yes || No
|-
| Partial word matching || Limited (:*) || Yes
|-
| Typo tolerance || Low || High
|-
| Ranking by relevance || Yes (ts_rank) || Limited (similarity)
|-
| Language awareness || Yes || No
|-
| Index size || Moderate || Larger
|-
| Best use case || Semantic text search || Fuzzy / partial matching
|}
 
=== When to Use Full-Text Search ==
 
Use PostgreSQL Full-Text Search when:
 
* Searching long documents or articles
* Language semantics matter
* You need relevance ranking
* Queries are word-based
* You want deterministic, explainable results
 
=== When to Use pg_trgm ==
 
Use <code>pg_trgm</code> when:
 
* Implementing autocomplete
* Supporting typo-tolerant search
* Matching substrings or partial words
* Searching identifiers, codes, or names
 
=== Combining FTS and pg_trgm ===
 
In many real-world systems, the best solution is to combine both approaches.
 
Example:
 
* Use FTS (<code>@@</code>) for main content search
* Use <code>pg_trgm</code> for autocomplete or fallback fuzzy matching
* Merge or rank results at the application layer
 
This hybrid approach often delivers search quality comparable to dedicated search engines while keeping all logic inside PostgreSQL.

Edição atual tal como às 18h13min de 22 de dezembro de 2025

PostgreSQL Full-Text Search: Concepts, SQL Usage, and Integration with Java and Python

PostgreSQL provides a powerful, native Full-Text Search (FTS) engine designed to efficiently index and search large volumes of textual data. Unlike simple string matching using LIKE or ILIKE, PostgreSQL Full-Text Search performs linguistic analysis, tokenization, stemming, stop-word removal, and relevance ranking.

This article starts by introducing Full-Text Search directly at the SQL level, then explains its internal concepts, and finally demonstrates recommended, production-grade usage from Java and Python applications using Spring Data JPA and Peewee.

Getting Started with PostgreSQL Full-Text Search (Pure SQL)

PostgreSQL Full-Text Search is built into the database and can be used without any external libraries or frameworks.

A Basic Example

Consider a simple table storing articles:

CREATE TABLE article (
  id SERIAL PRIMARY KEY,
  title TEXT,
  body TEXT
);

A basic full-text search can be performed by converting text into a tsvector and comparing it with a tsquery:

SELECT *
FROM article
WHERE to_tsvector('portuguese', title || ' ' || body)
      @@ plainto_tsquery('portuguese', 'postgres database');

This query finds rows whose text semantically matches the words “postgres” and “database”, regardless of word order, case, or inflection.

Why Not LIKE?

Traditional SQL searches often rely on:

WHERE body ILIKE '%postgres%'

This approach:

  • Cannot use linguistic analysis
  • Does not scale well on large datasets
  • Fails to rank results by relevance

PostgreSQL Full-Text Search solves these limitations.

Using plainto_tsquery

plainto_tsquery is designed for user input. It automatically:

  • Splits text into terms
  • Applies the language dictionary
  • Combines terms using logical AND
SELECT *
FROM article
WHERE to_tsvector('portuguese', body)
      @@ plainto_tsquery('portuguese', 'powerful database');

Using to_tsquery (Advanced Syntax)

When the query syntax is controlled, to_tsquery allows explicit logical operators:

SELECT *
FROM article
WHERE to_tsvector('portuguese', body)
      @@ to_tsquery('portuguese', 'postgres & (database | data)');

Prefix matching is also supported:

WHERE to_tsvector('portuguese', body)
      @@ to_tsquery('portuguese', 'postgre:*');

Core Concepts of PostgreSQL Full-Text Search

PostgreSQL Full-Text Search relies on two specialized data types:

tsvector
A normalized and indexed representation of a document. Text is tokenized, lowercased, stripped of stop words, and stemmed according to a language dictionary.
tsquery
A structured search query containing terms and logical operators such as AND (&), OR (|), NOT (!), and prefix matching (:*).

Example Transformation

The text:

PostgreSQL is a powerful database

may be transformed into a tsvector similar to:

'databas':4 'power':3 'postgresql':1

A user query such as:

powerful database

becomes a tsquery equivalent to:

'power' & 'databas'

The @@ Operator

The @@ operator is the central operator of PostgreSQL Full-Text Search:

tsvector @@ tsquery

It evaluates whether a document matches a full-text query and returns a boolean result.

Conceptually, it can be read as:

"Does this indexed document match this search query?"

The @@ operator:

  • Executes the full-text match
  • Enables the use of GIN or GiST indexes
  • Is required for ranking and highlighting
  • Has no equivalent in standard SQL or JPQL

Without @@, PostgreSQL Full-Text Search cannot function.

Indexing with GIN

For performance, PostgreSQL Full-Text Search relies on inverted indexes, most commonly GIN (Generalized Inverted Index).

A recommended approach is to store a dedicated tsvector column and index it:

ALTER TABLE article
ADD COLUMN fts tsvector
GENERATED ALWAYS AS (
  setweight(to_tsvector('portuguese', coalesce(title,'')), 'A') ||
  setweight(to_tsvector('portuguese', coalesce(body,'')), 'B')
) STORED;

CREATE INDEX idx_article_fts
ON article USING GIN (fts);

This approach avoids triggers and guarantees consistency.

Ranking and Highlighting

PostgreSQL provides ranking functions such as ts_rank and ts_rank_cd to score documents by relevance.

Highlighting is handled by ts_headline, which extracts and emphasizes relevant fragments from the original text.

Using PostgreSQL Full-Text Search from Java (Spring Data JPA)

Spring Data JPA does not support PostgreSQL Full-Text Search operators directly in JPQL. The recommended approach is to use native SQL queries while keeping entity management and CRUD operations within JPA.

Entity Mapping

@Entity
@Table(name = "article")
public class Article {

  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  private String title;

  @Column(columnDefinition = "text")
  private String body;

  @Column(name = "fts", insertable = false, updatable = false)
  private String fts;
}

Repository with Full-Text Search

public interface ArticleRepository extends JpaRepository<Article, Long> {

  @Query(value = """
    SELECT a.*
    FROM article a
    WHERE a.fts @@ plainto_tsquery('portuguese', :query)
    ORDER BY ts_rank(a.fts, plainto_tsquery('portuguese', :query)) DESC
    """, nativeQuery = true)
  List<Article> search(@Param("query") String query);
}

Using PostgreSQL Full-Text Search from Python (Peewee)

Peewee is a lightweight ORM that allows direct access to PostgreSQL operators and functions, making it well-suited for Full-Text Search when used with explicit SQL expressions.

Model Definition

from peewee import *
from playhouse.postgres_ext import PostgresqlExtDatabase

db = PostgresqlExtDatabase("example_db")

class BaseModel(Model):
    class Meta:
        database = db

class Article(BaseModel):
    title = TextField()
    body = TextField()
    fts = TextField(null=True)

Full-Text Search Query

from peewee import fn, SQL

query_text = "postgres database"

query = (
    Article
    .select(
        Article,
        fn.ts_rank(
            Article.fts,
            fn.plainto_tsquery('portuguese', query_text)
        ).alias('rank')
    )
    .where(
        Article.fts.op('@@')(
            fn.plainto_tsquery('portuguese', query_text)
        )
    )
    .order_by(SQL('rank').desc())
)

for article in query:
    print(article.title, article.rank)

ORM vs Search Engines

PostgreSQL Full-Text Search is sufficient for most applications, including:

  • Blog and article search
  • Documentation systems
  • Administrative and internal tools
  • Moderate autocomplete requirements

Dedicated search engines such as Elasticsearch or OpenSearch are recommended only when requirements include advanced linguistic analysis, distributed indexing, or large-scale search analytics.

Best Practices

  • Always use a dedicated tsvector column
  • Always create a GIN index on the tsvector
  • Prefer plainto_tsquery or websearch_to_tsquery for user input
  • Use to_tsquery only when the query syntax is controlled
  • Keep Full-Text Search logic inside PostgreSQL
  • Use ORM abstractions for CRUD and native SQL for search

Conclusion

PostgreSQL Full-Text Search is a mature, efficient, and production-ready solution that integrates naturally with both Java and Python ecosystems.

By starting with SQL fundamentals and layering ORM integration on top, applications can achieve high-performance, semantically rich search without the operational complexity of external search engines.

Web-Style Search with websearch_to_tsquery

PostgreSQL provides websearch_to_tsquery, a function designed to accept search strings written in a format similar to modern web search engines (such as Google).

Unlike to_tsquery, which requires explicit operators, and plainto_tsquery, which always combines terms with AND, websearch_to_tsquery supports a more expressive and user-friendly syntax.

Supported Syntax

websearch_to_tsquery understands:

  • Space-separated terms (implicit AND)
  • OR using the keyword OR
  • Quoted phrases
  • Unary minus for negation

Examples

Basic search (implicit AND):

SELECT *
FROM article
WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres database');

OR operator:

WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres OR mysql');

Quoted phrase:

WHERE fts @@ websearch_to_tsquery('portuguese', '"full text search"');

Negation:

WHERE fts @@ websearch_to_tsquery('portuguese', 'postgres -mysql');

Why websearch_to_tsquery Matters

This function is particularly well-suited for direct user input because:

  • It avoids syntax errors common with to_tsquery
  • It feels natural to end users
  • It reduces the need for custom query parsing in the application layer

For most user-facing search boxes, websearch_to_tsquery is the recommended choice.

FTS vs pg_trgm

PostgreSQL offers two powerful but fundamentally different approaches to text search: Full-Text Search (FTS) and trigram similarity via the pg_trgm extension.

Understanding their differences is essential for choosing the correct tool.

pg_trgm Overview

The pg_trgm extension indexes text by breaking it into overlapping sequences of three characters (trigrams). It enables fast similarity searches and partial matching.

To enable it:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

A typical trigram index:

CREATE INDEX idx_article_trgm
ON article
USING GIN (body gin_trgm_ops);

Example query:

SELECT *
FROM article
WHERE body ILIKE '%postgre%';

Key Differences

Feature PostgreSQL FTS (@@) pg_trgm
Linguistic analysis Yes (stemming, stop words) No
Word-based search Yes No
Partial word matching Limited (:*) Yes
Typo tolerance Low High
Ranking by relevance Yes (ts_rank) Limited (similarity)
Language awareness Yes No
Index size Moderate Larger
Best use case Semantic text search Fuzzy / partial matching

= When to Use Full-Text Search

Use PostgreSQL Full-Text Search when:

  • Searching long documents or articles
  • Language semantics matter
  • You need relevance ranking
  • Queries are word-based
  • You want deterministic, explainable results

= When to Use pg_trgm

Use pg_trgm when:

  • Implementing autocomplete
  • Supporting typo-tolerant search
  • Matching substrings or partial words
  • Searching identifiers, codes, or names

Combining FTS and pg_trgm

In many real-world systems, the best solution is to combine both approaches.

Example:

  • Use FTS (@@) for main content search
  • Use pg_trgm for autocomplete or fallback fuzzy matching
  • Merge or rank results at the application layer

This hybrid approach often delivers search quality comparable to dedicated search engines while keeping all logic inside PostgreSQL.