Class: Ast::Merge::ContentMatchRefiner

Inherits:

Object
MatchRefinerBase
Ast::Merge::ContentMatchRefiner

show all

Defined in:: lib/ast/merge/content_match_refiner.rb

Overview

Match refiner for text content-based fuzzy matching.

This refiner uses Levenshtein distance to pair nodes that have similar
but not identical text content. It’s useful for matching nodes where
the content has been slightly modified (typos, rewording, etc.).

Unlike signature-based matching which requires exact content hashes,
this refiner allows fuzzy matching based on text similarity. This is
particularly useful for:

Paragraphs with minor edits
Headings with slight rewording
Comments with updated text
Any text-based node type

Examples:

Basic usage

refiner = ContentMatchRefiner.new(threshold: 0.7)
matches = refiner.call(template_nodes, dest_nodes)

With specific node types

# Only match paragraphs and headings
refiner = ContentMatchRefiner.new(
  threshold: 0.6,
  node_types: [:paragraph, :heading]
)

With custom content extractor

refiner = ContentMatchRefiner.new(
  threshold: 0.7,
  content_extractor: ->(node) { node.text_content.downcase.strip }
)

Combined with other refiners

merger = SmartMerger.new(
  template,
  destination,
  match_refiner: [
    ContentMatchRefiner.new(threshold: 0.7, node_types: [:paragraph]),
    TableMatchRefiner.new(threshold: 0.5)
  ]
)

Constant Summary collapse

DEFAULT_WEIGHTS = Default weights for content similarity scoring

{
  content: 0.7,   # Text content similarity (Levenshtein)
  length: 0.15,   # Length similarity
  position: 0.15, # Position similarity in document
}.freeze

Constants inherited from MatchRefinerBase

MatchRefinerBase::DEFAULT_THRESHOLD

Instance Attribute Summary collapse

#content_extractor ⇒ Proc^? readonly
Custom content extraction function.
#weights ⇒ Hash readonly
Scoring weights.

Attributes inherited from MatchRefinerBase

#node_types, #threshold

Instance Method Summary collapse

#call(template_nodes, dest_nodes, context = {}) ⇒ Array<MatchResult>
Find matches between unmatched nodes based on content similarity.
#initialize(threshold: DEFAULT_THRESHOLD, node_types: [], weights: {}, content_extractor: nil, **options) ⇒ ContentMatchRefiner constructor
Initialize a content match refiner.

Methods inherited from MatchRefinerBase

#handles_type?

Constructor Details

#initialize(threshold: DEFAULT_THRESHOLD, node_types: [], weights: {}, content_extractor: nil, **options) ⇒ `ContentMatchRefiner`

Initialize a content match refiner.

Parameters:

threshold (Float) (defaults to: DEFAULT_THRESHOLD) —
Minimum score to accept a match (default: 0.5)
node_types (Array<Symbol>) (defaults to: []) —
Node types to process (empty = all)
weights (Hash) (defaults to: {}) —
Custom scoring weights
content_extractor (Proc, nil) (defaults to: nil) —
Custom function to extract text from nodes
Should accept a node and return a String
options (Hash) —
Additional options for forward compatibility

# File 'lib/ast/merge/content_match_refiner.rb', line 69

def initialize(
  threshold: DEFAULT_THRESHOLD,
  node_types: [],
  weights: {},
  content_extractor: nil,
  **options
)
  super(threshold: threshold, node_types: node_types, **options)
  @weights = DEFAULT_WEIGHTS.merge(weights)
  @content_extractor = content_extractor
end

Instance Attribute Details

#content_extractor ⇒ `Proc`^? (readonly)

Returns Custom content extraction function.

Returns:

(Proc, nil) —
Custom content extraction function



59
60
61

# File 'lib/ast/merge/content_match_refiner.rb', line 59

def content_extractor
  @content_extractor
end

#weights ⇒ `Hash` (readonly)

Returns Scoring weights.

Returns:

(Hash) —
Scoring weights



56
57
58

# File 'lib/ast/merge/content_match_refiner.rb', line 56

def weights
  @weights
end

Instance Method Details

#call(template_nodes, dest_nodes, context = {}) ⇒ `Array<MatchResult>`

Find matches between unmatched nodes based on content similarity.

Parameters:

template_nodes (Array) —
Unmatched nodes from template
dest_nodes (Array) —
Unmatched nodes from destination
context (Hash) (defaults to: {}) —
Additional context (may contain :template_analysis, :dest_analysis)

Returns:

(Array<MatchResult>) —
Array of content-based matches

# File 'lib/ast/merge/content_match_refiner.rb', line 87

def call(template_nodes, dest_nodes, context = {})
  template_filtered = filter_nodes(template_nodes)
  dest_filtered = filter_nodes(dest_nodes)

  return [] if template_filtered.empty? || dest_filtered.empty?

  # Build position information for scoring
  total_template = template_filtered.size
  total_dest = dest_filtered.size

  greedy_match(template_filtered, dest_filtered) do |t_node, d_node|
    t_idx = template_filtered.index(t_node) || 0
    d_idx = dest_filtered.index(d_node) || 0

    compute_content_similarity(
      t_node,
      d_node,
      t_idx,
      d_idx,
      total_template,
      total_dest,
    )
  end
end

Class: Ast::Merge::ContentMatchRefiner

Overview

Examples:

Basic usage

With specific node types

With custom content extractor

Combined with other refiners

Constant Summary collapse

Constants inherited from MatchRefinerBase

Instance Attribute Summary collapse

Attributes inherited from MatchRefinerBase

Instance Method Summary collapse

Methods inherited from MatchRefinerBase

Constructor Details

#initialize(threshold: DEFAULT_THRESHOLD, node_types: [], weights: {}, content_extractor: nil, **options) ⇒ ContentMatchRefiner

Instance Attribute Details

#content_extractor ⇒ Proc? (readonly)

#weights ⇒ Hash (readonly)

Instance Method Details

#call(template_nodes, dest_nodes, context = {}) ⇒ Array<MatchResult>

#initialize(threshold: DEFAULT_THRESHOLD, node_types: [], weights: {}, content_extractor: nil, **options) ⇒ `ContentMatchRefiner`

#content_extractor ⇒ `Proc`^? (readonly)

#weights ⇒ `Hash` (readonly)

#call(template_nodes, dest_nodes, context = {}) ⇒ `Array<MatchResult>`