Skip to content

Commit 31cf4af

Browse files
committed
Add tcllib scraper
Tcllib is effectively Tcl's extended stdlib and so is worth including alongside it. The only issue with this docset is that the `html` module page is broken because they embed unencoded html tags in bad places, causing the page to parse incorrectly.
1 parent 8b3f552 commit 31cf4af

File tree

7 files changed

+107
-0
lines changed

7 files changed

+107
-0
lines changed

lib/docs/filters/tcllib/clean_html.rb

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
module Docs
2+
class Tcllib
3+
class CleanHtmlFilter < Filter
4+
def call
5+
css("hr").remove()
6+
xpath("./div/text()").remove() # Navigation text content e.g. [ | | | ]
7+
css("div.markdown > a").remove() # Navigation links
8+
9+
10+
# Fix up ToC links
11+
css('a[name]').each do |node|
12+
node.parent['id'] = node['name']
13+
node.before(node.children).remove unless node['href']
14+
end
15+
16+
# Relies on the above ToC fixup
17+
keywords = at_css('#keywords')
18+
if !keywords.nil?
19+
keywords.next_sibling.remove()
20+
keywords.remove()
21+
css('a[href="#keywords"]').remove()
22+
end
23+
24+
# Downrank headings for styling
25+
css('h2').each do |node|
26+
node.name = 'h3'
27+
end
28+
css('h1').each do |node|
29+
node.name = 'h2'
30+
end
31+
32+
doc
33+
end
34+
end
35+
end
36+
end

lib/docs/filters/tcllib/entries.rb

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
module Docs
2+
class Tcllib
3+
class EntriesFilter < Docs::EntriesFilter
4+
def get_name
5+
# The first word after the `NAME` heading
6+
name = at_css('h1 + p')
7+
return name.content.strip.split[0]
8+
end
9+
10+
def get_type
11+
# The types are the categories as indicated on each page (and on the
12+
# root page, toc0.md)
13+
category = at_css('a[name="category"]')
14+
if !category.nil?
15+
return category.parent.next.next.content
16+
end
17+
return 'Unfiled'
18+
end
19+
end
20+
end
21+
end
22+

lib/docs/filters/tcllib/nop.rb

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
module Docs
2+
class Tcllib
3+
class NopFilter < Filter
4+
def call
5+
doc
6+
end
7+
end
8+
end
9+
end

lib/docs/scrapers/tcllib.rb

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
module Docs
2+
class Tcllib < UrlScraper
3+
self.name = 'Tcllib'
4+
self.type = 'simple'
5+
self.slug = 'tcllib'
6+
self.release = '2.0'
7+
self.base_url = 'https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/'
8+
self.root_path = 'toc0.md'
9+
self.links = {
10+
home: 'https://core.tcl-lang.org/tcllib/doc/trunk/embedded/index.md',
11+
code: 'https://sourceforge.net/projects/tcllib/files/tcllib/'
12+
}
13+
14+
html_filters.push 'tcllib/entries', 'tcllib/clean_html', 'title'
15+
# The docs have incorrect <base> elements, so we should just skip that
16+
html_filters.replace('apply_base_url', 'tcllib/nop')
17+
18+
options[:root_title] = 'Tcllib Documentation'
19+
options[:container] = '.content'
20+
options[:skip] = [
21+
# Full of broken links, path improperly duplicates "tcllib" segment
22+
'tcllib/toc.md',
23+
# The other ones aren't terribly useful
24+
'toc.md', 'toc1.md', 'toc2.md',
25+
# Keyword index
26+
'index.md'
27+
]
28+
29+
options[:attribution] = <<-HTML
30+
Licensed under the <a href="https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/tcllib/files/devdoc/tcllib_license.md">BSD license</a>
31+
HTML
32+
33+
34+
def get_latest_version(opts)
35+
doc = fetch_doc('https://core.tcl-lang.org/tcllib/doc/trunk/embedded/index.md', opts)
36+
doc.at_css('strong').content.scan(/([0-9.]+)/)[0][0]
37+
end
38+
end
39+
end

public/icons/docs/tcllib/16.png

387 Bytes
Loading

public/icons/docs/tcllib/16@2x.png

853 Bytes
Loading

public/icons/docs/tcllib/SOURCE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
https://commons.wikimedia.org/wiki/File:Tcl.svg

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy