Class: Ast::Merge::Detector::FencedCodeBlock

Inherits:
Base
  • Object
show all
Defined in:
lib/ast/merge/detector/fenced_code_block.rb

Overview

Detects fenced code blocks with a specific language identifier.

This detector finds Markdown-style fenced code blocks (using ``` or ~~~)
that have a specific language identifier. It can be configured for any
language: ruby, json, yaml, mermaid, etc.

When to Use This Detector

Use FencedCodeBlock when:

  • Working with raw Markdown text without parsing to AST
  • Quick extraction from strings without parser dependencies
  • Custom text processing requiring line-level precision
  • Operating on source text directly (e.g., linters, formatters)

Do NOT use FencedCodeBlock when:

  • Working with parsed Markdown AST (use native code block nodes instead)
  • Integrating with markdown-merge’s CodeBlockMerger (it uses native nodes)
  • Using tree_haver’s unified Markdown backend API

Examples:

Detecting Ruby code blocks

detector = FencedCodeBlock.new("ruby", aliases: ["rb"])
regions = detector.detect_all(markdown_source)

Using factory methods

detector = FencedCodeBlock.ruby
detector = FencedCodeBlock.yaml
detector = FencedCodeBlock.json

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

#name, #strip_delimiters?

Constructor Details

#initialize(language, aliases: []) ⇒ FencedCodeBlock

Creates a new detector for the specified language.

Parameters:

  • language (String, Symbol)

    The language identifier (e.g., “ruby”, “json”)

  • aliases (Array<String, Symbol>) (defaults to: [])

    Alternative identifiers (e.g., [“rb”] for ruby)



47
48
49
50
51
52
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 47

def initialize(language, aliases: [])
  super()
  @language = language.to_s.downcase
  @aliases = aliases.map { |a| a.to_s.downcase }
  @all_identifiers = [@language] + @aliases
end

Instance Attribute Details

#aliasesArray<String> (readonly)

Returns Alternative language identifiers.

Returns:

  • (Array<String>)

    Alternative language identifiers



41
42
43
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 41

def aliases
  @aliases
end

#languageString (readonly)

Returns The primary language identifier.

Returns:

  • (String)

    The primary language identifier



38
39
40
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 38

def language
  @language
end

Class Method Details

.bashFencedCodeBlock

Creates a detector for Bash/Shell code blocks.

Returns:



196
197
198
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 196

def bash
  new("bash", aliases: ["sh", "shell", "zsh"])
end

.cssFencedCodeBlock

Creates a detector for CSS code blocks.

Returns:



214
215
216
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 214

def css
  new("css")
end

.htmlFencedCodeBlock

Creates a detector for HTML code blocks.

Returns:



208
209
210
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 208

def html
  new("html")
end

.javascriptFencedCodeBlock

Creates a detector for JavaScript code blocks.

Returns:



178
179
180
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 178

def javascript
  new("javascript", aliases: ["js"])
end

.jsonFencedCodeBlock

Creates a detector for JSON code blocks.

Returns:



154
155
156
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 154

def json
  new("json")
end

.markdownFencedCodeBlock

Creates a detector for Markdown code blocks (nested markdown).

Returns:



220
221
222
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 220

def markdown
  new("markdown", aliases: ["md"])
end

.mermaidFencedCodeBlock

Creates a detector for Mermaid diagram blocks.

Returns:



172
173
174
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 172

def mermaid
  new("mermaid")
end

.pythonFencedCodeBlock

Creates a detector for Python code blocks.

Returns:



190
191
192
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 190

def python
  new("python", aliases: ["py"])
end

.rubyFencedCodeBlock

Creates a detector for Ruby code blocks.

Returns:



148
149
150
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 148

def ruby
  new("ruby", aliases: ["rb"])
end

.sqlFencedCodeBlock

Creates a detector for SQL code blocks.

Returns:



202
203
204
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 202

def sql
  new("sql")
end

.tomlFencedCodeBlock

Creates a detector for TOML code blocks.

Returns:



166
167
168
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 166

def toml
  new("toml")
end

.typescriptFencedCodeBlock

Creates a detector for TypeScript code blocks.

Returns:



184
185
186
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 184

def typescript
  new("typescript", aliases: ["ts"])
end

.yamlFencedCodeBlock

Creates a detector for YAML code blocks.

Returns:



160
161
162
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 160

def yaml
  new("yaml", aliases: ["yml"])
end

Instance Method Details

#detect_all(source) ⇒ Array<Region>

Detects all fenced code blocks with the configured language.

Parameters:

  • source (String)

    The full document content

Returns:

  • (Array<Region>)

    All detected code blocks, sorted by start_line



71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 71

def detect_all(source)
  return [] if source.nil? || source.empty?

  regions = []
  lines = source.lines
  in_block = false
  start_line = nil
  content_lines = []
  current_language = nil
  fence_char = nil
  fence_length = nil
  indent = ""

  lines.each_with_index do |line, idx|
    line_num = idx + 1

    if !in_block
      # Match opening fence: ```lang or ~~~lang (optionally indented)
      match = line.match(/^(\s*)(`{3,}|~{3,})(\w*)\s*$/)
      if match
        indent = match[1] || ""
        fence = match[2]
        lang = match[3].downcase

        if @all_identifiers.include?(lang)
          in_block = true
          start_line = line_num
          content_lines = []
          current_language = lang
          fence_char = fence[0]
          fence_length = fence.length
        end
      end
    elsif line.match?(/^#{Regexp.escape(indent)}#{Regexp.escape(fence_char)}{#{fence_length},}\s*$/)
      # Match closing fence (must use same char, same indent, and at least same length)
      opening_fence = "#{fence_char * fence_length}#{current_language}"
      closing_fence = fence_char * fence_length

      regions << build_region(
        type: region_type,
        content: content_lines.join,
        start_line: start_line,
        end_line: line_num,
        delimiters: [opening_fence, closing_fence],
        metadata: {language: current_language, indent: indent.empty? ? nil : indent},
      )
      in_block = false
      start_line = nil
      content_lines = []
      current_language = nil
      fence_char = nil
      fence_length = nil
      indent = ""
    else
      # Accumulate content lines (strip the indent if present)
      content_lines << if indent.empty?
        line
      else
        # Strip the common indent from content lines
        line.sub(/^#{Regexp.escape(indent)}/, "")
      end
    end
  end

  # Note: Unclosed blocks are ignored (no region created)
  regions
end

#inspectString

Returns A description of this detector.

Returns:

  • (String)

    A description of this detector



140
141
142
143
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 140

def inspect
  aliases_str = @aliases.empty? ? "" : " aliases=#{@aliases.inspect}"
  "#<#{self.class.name} language=#{@language}#{aliases_str}>"
end

#matches_language?(lang) ⇒ Boolean

Check if a language identifier matches this detector.

Parameters:

  • lang (String)

    The language identifier to check

Returns:

  • (Boolean)

    true if the language matches



63
64
65
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 63

def matches_language?(lang)
  @all_identifiers.include?(lang.to_s.downcase)
end

#region_typeSymbol

Returns The region type (e.g., :ruby_code_block).

Returns:

  • (Symbol)

    The region type (e.g., :ruby_code_block)



55
56
57
# File 'lib/ast/merge/detector/fenced_code_block.rb', line 55

def region_type
  :"#{@language}_code_block"
end