Class: Ast::Merge::ContentMatchRefiner
- Inherits:
-
MatchRefinerBase
- Object
- MatchRefinerBase
- Ast::Merge::ContentMatchRefiner
- Defined in:
- lib/ast/merge/content_match_refiner.rb
Overview
Match refiner for text content-based fuzzy matching.
This refiner uses Levenshtein distance to pair nodes that have similar
but not identical text content. It’s useful for matching nodes where
the content has been slightly modified (typos, rewording, etc.).
Unlike signature-based matching which requires exact content hashes,
this refiner allows fuzzy matching based on text similarity. This is
particularly useful for:
- Paragraphs with minor edits
- Headings with slight rewording
- Comments with updated text
- Any text-based node type
Constant Summary collapse
- DEFAULT_WEIGHTS =
Default weights for content similarity scoring
{ content: 0.7, # Text content similarity (Levenshtein) length: 0.15, # Length similarity position: 0.15, # Position similarity in document }.freeze
Constants inherited from MatchRefinerBase
MatchRefinerBase::DEFAULT_THRESHOLD
Instance Attribute Summary collapse
-
#content_extractor ⇒ Proc?
readonly
Custom content extraction function.
-
#weights ⇒ Hash
readonly
Scoring weights.
Attributes inherited from MatchRefinerBase
Instance Method Summary collapse
-
#call(template_nodes, dest_nodes, context = {}) ⇒ Array<MatchResult>
Find matches between unmatched nodes based on content similarity.
-
#initialize(threshold: DEFAULT_THRESHOLD, node_types: [], weights: {}, content_extractor: nil, **options) ⇒ ContentMatchRefiner
constructor
Initialize a content match refiner.
Methods inherited from MatchRefinerBase
Constructor Details
#initialize(threshold: DEFAULT_THRESHOLD, node_types: [], weights: {}, content_extractor: nil, **options) ⇒ ContentMatchRefiner
Initialize a content match refiner.
69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/ast/merge/content_match_refiner.rb', line 69 def initialize( threshold: DEFAULT_THRESHOLD, node_types: [], weights: {}, content_extractor: nil, ** ) super(threshold: threshold, node_types: node_types, **) @weights = DEFAULT_WEIGHTS.merge(weights) @content_extractor = content_extractor end |
Instance Attribute Details
#content_extractor ⇒ Proc? (readonly)
Returns Custom content extraction function.
59 60 61 |
# File 'lib/ast/merge/content_match_refiner.rb', line 59 def content_extractor @content_extractor end |
#weights ⇒ Hash (readonly)
Returns Scoring weights.
56 57 58 |
# File 'lib/ast/merge/content_match_refiner.rb', line 56 def weights @weights end |
Instance Method Details
#call(template_nodes, dest_nodes, context = {}) ⇒ Array<MatchResult>
Find matches between unmatched nodes based on content similarity.
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
# File 'lib/ast/merge/content_match_refiner.rb', line 87 def call(template_nodes, dest_nodes, context = {}) template_filtered = filter_nodes(template_nodes) dest_filtered = filter_nodes(dest_nodes) return [] if template_filtered.empty? || dest_filtered.empty? # Build position information for scoring total_template = template_filtered.size total_dest = dest_filtered.size greedy_match(template_filtered, dest_filtered) do |t_node, d_node| t_idx = template_filtered.index(t_node) || 0 d_idx = dest_filtered.index(d_node) || 0 compute_content_similarity( t_node, d_node, t_idx, d_idx, total_template, total_dest, ) end end |