You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

coderay.rb 9.7KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320
  1. # = CodeRay Library
  2. #
  3. # $Id: coderay.rb 227 2007-04-24 12:26:18Z murphy $
  4. #
  5. # CodeRay is a Ruby library for syntax highlighting.
  6. #
  7. # I try to make CodeRay easy to use and intuitive, but at the same time fully featured, complete,
  8. # fast and efficient.
  9. #
  10. # See README.
  11. #
  12. # It consists mainly of
  13. # * the main engine: CodeRay (Scanners::Scanner, Tokens/TokenStream, Encoders::Encoder), PluginHost
  14. # * the scanners in CodeRay::Scanners
  15. # * the encoders in CodeRay::Encoders
  16. #
  17. # Here's a fancy graphic to light up this gray docu:
  18. #
  19. # http://rd.cYcnus.de/coderay/scheme.png
  20. #
  21. # == Documentation
  22. #
  23. # See CodeRay, Encoders, Scanners, Tokens.
  24. #
  25. # == Usage
  26. #
  27. # Remember you need RubyGems to use CodeRay, unless you have it in your load path. Run Ruby with
  28. # -rubygems option if required.
  29. #
  30. # === Highlight Ruby code in a string as html
  31. #
  32. # require 'coderay'
  33. # print CodeRay.scan('puts "Hello, world!"', :ruby).html
  34. #
  35. # # prints something like this:
  36. # puts <span class="s">&quot;Hello, world!&quot;</span>
  37. #
  38. #
  39. # === Highlight C code from a file in a html div
  40. #
  41. # require 'coderay'
  42. # print CodeRay.scan(File.read('ruby.h'), :c).div
  43. # print CodeRay.scan_file('ruby.h').html.div
  44. #
  45. # You can include this div in your page. The used CSS styles can be printed with
  46. #
  47. # % coderay_stylesheet
  48. #
  49. # === Highlight without typing too much
  50. #
  51. # If you are one of the hasty (or lazy, or extremely curious) people, just run this file:
  52. #
  53. # % ruby -rubygems /path/to/coderay/coderay.rb > example.html
  54. #
  55. # and look at the file it created in your browser.
  56. #
  57. # = CodeRay Module
  58. #
  59. # The CodeRay module provides convenience methods for the engine.
  60. #
  61. # * The +lang+ and +format+ arguments select Scanner and Encoder to use. These are
  62. # simply lower-case symbols, like <tt>:python</tt> or <tt>:html</tt>.
  63. # * All methods take an optional hash as last parameter, +options+, that is send to
  64. # the Encoder / Scanner.
  65. # * Input and language are always sorted in this order: +code+, +lang+.
  66. # (This is in alphabetical order, if you need a mnemonic ;)
  67. #
  68. # You should be able to highlight everything you want just using these methods;
  69. # so there is no need to dive into CodeRay's deep class hierarchy.
  70. #
  71. # The examples in the demo directory demonstrate common cases using this interface.
  72. #
  73. # = Basic Access Ways
  74. #
  75. # Read this to get a general view what CodeRay provides.
  76. #
  77. # == Scanning
  78. #
  79. # Scanning means analysing an input string, splitting it up into Tokens.
  80. # Each Token knows about what type it is: string, comment, class name, etc.
  81. #
  82. # Each +lang+ (language) has its own Scanner; for example, <tt>:ruby</tt> code is
  83. # handled by CodeRay::Scanners::Ruby.
  84. #
  85. # CodeRay.scan:: Scan a string in a given language into Tokens.
  86. # This is the most common method to use.
  87. # CodeRay.scan_file:: Scan a file and guess the language using FileType.
  88. #
  89. # The Tokens object you get from these methods can encode itself; see Tokens.
  90. #
  91. # == Encoding
  92. #
  93. # Encoding means compiling Tokens into an output. This can be colored HTML or
  94. # LaTeX, a textual statistic or just the number of non-whitespace tokens.
  95. #
  96. # Each Encoder provides output in a specific +format+, so you select Encoders via
  97. # formats like <tt>:html</tt> or <tt>:statistic</tt>.
  98. #
  99. # CodeRay.encode:: Scan and encode a string in a given language.
  100. # CodeRay.encode_tokens:: Encode the given tokens.
  101. # CodeRay.encode_file:: Scan a file, guess the language using FileType and encode it.
  102. #
  103. # == Streaming
  104. #
  105. # Streaming saves RAM by running Scanner and Encoder in some sort of
  106. # pipe mode; see TokenStream.
  107. #
  108. # CodeRay.scan_stream:: Scan in stream mode.
  109. #
  110. # == All-in-One Encoding
  111. #
  112. # CodeRay.encode:: Highlight a string with a given input and output format.
  113. #
  114. # == Instanciating
  115. #
  116. # You can use an Encoder instance to highlight multiple inputs. This way, the setup
  117. # for this Encoder must only be done once.
  118. #
  119. # CodeRay.encoder:: Create an Encoder instance with format and options.
  120. # CodeRay.scanner:: Create an Scanner instance for lang, with '' as default code.
  121. #
  122. # To make use of CodeRay.scanner, use CodeRay::Scanner::code=.
  123. #
  124. # The scanning methods provide more flexibility; we recommend to use these.
  125. #
  126. # == Reusing Scanners and Encoders
  127. #
  128. # If you want to re-use scanners and encoders (because that is faster), see
  129. # CodeRay::Duo for the most convenient (and recommended) interface.
  130. module CodeRay
  131. # Version: Major.Minor.Teeny[.Revision]
  132. # Major: 0 for pre-release
  133. # Minor: odd for beta, even for stable
  134. # Teeny: development state
  135. # Revision: Subversion Revision number (generated on rake)
  136. VERSION = '0.7.6'
  137. require 'coderay/tokens'
  138. require 'coderay/scanner'
  139. require 'coderay/encoder'
  140. require 'coderay/duo'
  141. require 'coderay/style'
  142. class << self
  143. # Scans the given +code+ (a String) with the Scanner for +lang+.
  144. #
  145. # This is a simple way to use CodeRay. Example:
  146. # require 'coderay'
  147. # page = CodeRay.scan("puts 'Hello, world!'", :ruby).html
  148. #
  149. # See also demo/demo_simple.
  150. def scan code, lang, options = {}, &block
  151. scanner = Scanners[lang].new code, options, &block
  152. scanner.tokenize
  153. end
  154. # Scans +filename+ (a path to a code file) with the Scanner for +lang+.
  155. #
  156. # If +lang+ is :auto or omitted, the CodeRay::FileType module is used to
  157. # determine it. If it cannot find out what type it is, it uses
  158. # CodeRay::Scanners::Plaintext.
  159. #
  160. # Calls CodeRay.scan.
  161. #
  162. # Example:
  163. # require 'coderay'
  164. # page = CodeRay.scan_file('some_c_code.c').html
  165. def scan_file filename, lang = :auto, options = {}, &block
  166. file = IO.read filename
  167. if lang == :auto
  168. require 'coderay/helpers/file_type'
  169. lang = FileType.fetch filename, :plaintext, true
  170. end
  171. scan file, lang, options = {}, &block
  172. end
  173. # Scan the +code+ (a string) with the scanner for +lang+.
  174. #
  175. # Calls scan.
  176. #
  177. # See CodeRay.scan.
  178. def scan_stream code, lang, options = {}, &block
  179. options[:stream] = true
  180. scan code, lang, options, &block
  181. end
  182. # Encode a string in Streaming mode.
  183. #
  184. # This starts scanning +code+ with the the Scanner for +lang+
  185. # while encodes the output with the Encoder for +format+.
  186. # +options+ will be passed to the Encoder.
  187. #
  188. # See CodeRay::Encoder.encode_stream
  189. def encode_stream code, lang, format, options = {}
  190. encoder(format, options).encode_stream code, lang, options
  191. end
  192. # Encode a string.
  193. #
  194. # This scans +code+ with the the Scanner for +lang+ and then
  195. # encodes it with the Encoder for +format+.
  196. # +options+ will be passed to the Encoder.
  197. #
  198. # See CodeRay::Encoder.encode
  199. def encode code, lang, format, options = {}
  200. encoder(format, options).encode code, lang, options
  201. end
  202. # Highlight a string into a HTML <div>.
  203. #
  204. # CSS styles use classes, so you have to include a stylesheet
  205. # in your output.
  206. #
  207. # See encode.
  208. def highlight code, lang, options = { :css => :class }, format = :div
  209. encode code, lang, format, options
  210. end
  211. # Encode pre-scanned Tokens.
  212. # Use this together with CodeRay.scan:
  213. #
  214. # require 'coderay'
  215. #
  216. # # Highlight a short Ruby code example in a HTML span
  217. # tokens = CodeRay.scan '1 + 2', :ruby
  218. # puts CodeRay.encode_tokens(tokens, :span)
  219. #
  220. def encode_tokens tokens, format, options = {}
  221. encoder(format, options).encode_tokens tokens, options
  222. end
  223. # Encodes +filename+ (a path to a code file) with the Scanner for +lang+.
  224. #
  225. # See CodeRay.scan_file.
  226. # Notice that the second argument is the output +format+, not the input language.
  227. #
  228. # Example:
  229. # require 'coderay'
  230. # page = CodeRay.encode_file 'some_c_code.c', :html
  231. def encode_file filename, format, options = {}
  232. tokens = scan_file filename, :auto, get_scanner_options(options)
  233. encode_tokens tokens, format, options
  234. end
  235. # Highlight a file into a HTML <div>.
  236. #
  237. # CSS styles use classes, so you have to include a stylesheet
  238. # in your output.
  239. #
  240. # See encode.
  241. def highlight_file filename, options = { :css => :class }, format = :div
  242. encode_file filename, format, options
  243. end
  244. # Finds the Encoder class for +format+ and creates an instance, passing
  245. # +options+ to it.
  246. #
  247. # Example:
  248. # require 'coderay'
  249. #
  250. # stats = CodeRay.encoder(:statistic)
  251. # stats.encode("puts 17 + 4\n", :ruby)
  252. #
  253. # puts '%d out of %d tokens have the kind :integer.' % [
  254. # stats.type_stats[:integer].count,
  255. # stats.real_token_count
  256. # ]
  257. # #-> 2 out of 4 tokens have the kind :integer.
  258. def encoder format, options = {}
  259. Encoders[format].new options
  260. end
  261. # Finds the Scanner class for +lang+ and creates an instance, passing
  262. # +options+ to it.
  263. #
  264. # See Scanner.new.
  265. def scanner lang, options = {}
  266. Scanners[lang].new '', options
  267. end
  268. # Extract the options for the scanner from the +options+ hash.
  269. #
  270. # Returns an empty Hash if <tt>:scanner_options</tt> is not set.
  271. #
  272. # This is used if a method like CodeRay.encode has to provide options
  273. # for Encoder _and_ scanner.
  274. def get_scanner_options options
  275. options.fetch :scanner_options, {}
  276. end
  277. end
  278. # This Exception is raised when you try to stream with something that is not
  279. # capable of streaming.
  280. class NotStreamableError < Exception
  281. def initialize obj
  282. @obj = obj
  283. end
  284. def to_s
  285. '%s is not Streamable!' % @obj.class
  286. end
  287. end
  288. # A dummy module that is included by subclasses of CodeRay::Scanner an CodeRay::Encoder
  289. # to show that they are able to handle streams.
  290. module Streamable
  291. end
  292. end
  293. # Run a test script.
  294. if $0 == __FILE__
  295. $stderr.print 'Press key to print demo.'; gets
  296. code = File.read(__FILE__)[/module CodeRay.*/m]
  297. print CodeRay.scan(code, :ruby).html
  298. end