aboutsummaryrefslogtreecommitdiffstats
path: root/src/documentation/xdocs/plan/POI10Vision.xml
blob: c6be23a06db2d1964a210bf64c0a34a831a35706 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "document-v11.dtd">

<document>
 <header>
  <title>POI 1.0 Vision Document</title>
  <authors>
   <person name="Andrew C. Oliver" email="acoliver@apache.org"/>
   <person name="Marcus W. Johnson" email="mjohnson@apache.org"/>   
   </authors>
 </header>

 <body>

 <s1 title="Preface"> 
        <p> 
		(21-Jan-02) While this document is just full of useful project 
		introductory information and I do suggest those interested in getting
		involved in the project read it, it is woefully out of date.
	</p>
	<p>
		We deliberately allowed this document to run out of date because it 
		is a good reflection of what the original vision was for POI 1.0.
		You'll note that some of the terminology is not used in quite the same 
		way any longer.  I've made some minor corrections where reading this
		confused me.  An example: in some places this document may refer to
		POI API instead of POIFS API.  When this vision was written we had 
		an incomplete understanding of the project.  
	</p>
	<p>
		Lastly, the scope of the project expanded dramatically near the end 
		of the 1.0 cycle.  Our vision at the time was to focus merely on the
		Excel port (having no idea how the project would grow or be received)
		and provide the OLE 2 Compound Document port for others to port later
		formats.  We now plan to spearhead these ports under the umbrella of 
		the POI project.  So, you've been warned.  Read on, but just realize
		that we had a fuzzy view of things to come, and hindsight is 20-20.
	</p>
	<p>
		If I recall major holes were: a complete understanding of the format 
		of OLE 2 Compound Document format, Excel file format, and exactly how
		Cocoon 2 Serializers worked.  (that just about covers the whole range
		huh?)
	</p>
 </s1>

 <s1 title="1. Introduction">
  <s2 title="1.1 Purpose of this document">
	<p>
		The purpose of this document is to
		collect, analyze and define high-level requirements, user needs and
		features of the HSSF Serializer for Cocoon 2 and related libraries.
		The HSSF Serializer is a java class supporting the Serializer
		interface from the Cocoon 2 project and outputting in a compatible
		format of that used by the spreadsheet program Microsoft Excel '97.
		The HSSF Serializer will be responsible for converting XML
		spreadsheet-like documents into Excel-compatible XLS spreadsheets.
	</p>
  </s2>
  

  <s2 title="1.2 Project Overview">
	<p>
		Many web apps today hit a brick wall
		when it comes to the user request that they be able to easily
		manipulate their reports and data extracts in the popular Microsoft
		Excel spreadsheet format. This often causes inferior technologies to be
		chosen for the project simply because they easily support this
		format. This project seeks to extend existing XML, Java and Apache
		Cocoon 2 project technologies by: 	  
	</p>
	
		<ul>
			<li>
				providing an extensible library
				(POIFS) which reads/writes in a compatable format to OLE 2 Compound
				Document Format (aka Structured Storage Format) for easy
			  	implementation of other document types;
			</li>
			<li>
				providing a library (HSSF) for
				manipulating spreadsheet data and outputting it in a compatible
				format to Microsoft Excel XLS format;
			</li>
			<li>
				and providing a Cocoon 2
				Serializer (HSSFSerializer) for serializing XML documents as
				Excel-compatible spreadsheets.
			</li>
		</ul>
	
  </s2>
 </s1>
 <s1 title="2. User Description">
   <s2 title="2.1 User/Market Demographics">
	<p>
		There are a number of enthusiastic
		users of XML, UNIX and Java technology. Secondly, the Microsoft
		solution for outputting Office Document formats often involves
		actually manipulating the software as an OLE Server. This method
		provides extremely low performance, extremely high overhead and is
		only capable of handling one document at a time.
	</p>
	<ol>
		<li>
		Our intended audience for the HSSF
		Serializer portion of this project are developers writing reports or
		data extracts in XML format. 
		</li>
		<li>
		Our intended audience for the HSSF
		library portion of this project is ourselves as we are developing
		the Serializer and anyone who needs to write to Excel spreadsheets
		in a non-XML Java environment or who has specific needs not
		addressed by the Serializer.
		</li>
		<li>
		Our intended audience for the
		&quot;POIFS&quot; OLE 2 Compound Document format reader/writer is
		ourselves as we are writing the HSSF library and secondly, anyone
		wishing to provide other libraries for reading/writing OLE 2
		Compound Document Format in Java.
		</li>
	</ol>
   </s2>
   <s2 title="2.2. User environment">
	<p>
		The users of this software shall be
		developers in a Java environment on any Operating System or power
		users who are capable of XML document generation/deployment.		
	</p>
   </s2>
   <s2 title="2.3. Key User Needs">
	<p>
		The OLE 2 Compound Document format is
		undocumented for all practical purposes and cryptic for all
		impractical purposes. Developer needs in this area include
		documentation and an easy to use library for reading and writing in
		this format without requiring the developer to have intimate
		knowledge of the format. 		
	</p>
	<p>
		There is currently no good way to write
		to Microsoft Excel documents from Java or from a non-Microsoft
		Windows based platform for that matter. Developers need an easy to
		use library that supports a reasonable feature set and allows
		seperation of data from formatting/stylistic concerns.
	</p>
	<p>
		There is currently no good way to
		transform XML data to Microsoft Excel. Apache's Cocoon 2 project
		supplies a complete framework for XML, but nothing for outputting in
		Excel's XLS format. Developers and power users alike need a simple
		method to output XML documents to Excel through server-side
		processing.
	</p>
	

   </s2>
   <s2 title="2.4. Alternatives and Competition">
	<p>
        Originally there weren't any decent <link href="../hssf/alternatives.html">alternatives</link> for reading or writing
        to Excel.  This has changed somewhat.
	</p>
   </s2>
 </s1>
 <s1 title="3. Project Overview">
   <s2 title="3.1. Project Perspective">
	<p>
		The produced code shall be licensed by
		the Apache License as used by the Cocoon 2 project and maintained on
		a project page until such time as the Cocoon 2 developers accept it
		as a donation (at which time the copyright will be turned over to
		them).	
	</p>
   </s2>
   <s2 title="3.2. Project Position Statement">
	<p>
		For developers on a Java and/or XML
		environment this project will provide all the tools necessary for
		outputting XML data in the Microsoft Excel format. This project seeks
		to make the use of Microsoft Windows based servers unnecessary for
		file format considerations and to fully document the OLE 2 Compound
		Document format. The project aims not only to provide the tools for
		serializing XML to Excel's file format and the tools for writing to
		that file format from Java, but also to provide the tools for later
		projects to convert other OLE 2 Compound Document formats to pure
		Java APIs.
	</p>
   </s2>
   <s2 title="3.3. Summary of Capabilities">
	<p>
		HSSF Serializer for Apache Cocoon 2
	</p>
	<table>
		<tr>
			<td>
				Benefit
			</td>
			<td>
				Supporting Features
			</td>
		</tr>
		<tr>
			<td>
				Standard XML tag language for sheet data
			</td>
			<td>
				Serializer will transform documents utilizing a defined tag
				language
			</td>
		</tr>
		<tr>
			<td>
				Utilize XML to output in Excel
			</td>
			<td>
				Serializer will output in Excel
			</td>
		</tr>
		<tr>
			<td>
				Java API to output in Excel on any platform
			</td>
			<td>
				The project will develop an API that outputs in Excel using
				pure Java.
			</td>
		</tr>
		<tr>
			<td>
				Make it easy for developers to port other OLE 2 Compound
				Document-based formats to Java.
			</td>
			<td>
				The POIFS library will contain both a high-level abstraction
				along with low-level constructs. The project will fully document
				the OLE 2 Compound Document Format.
			</td>
		</tr>
	</table>   
   </s2>
   <s2 title="3.4. Assumptions and Dependencies">
	<ul>
		<li>
			The HSSF Serializer will run on
			any Java 2 supporting platform with Apache Cocoon 2 installed along
			with the HSSF and POIFS APIs.
		</li>
		<li>
			The HSSF API requires a Java 2
			implementation and the POI API.
		</li>
		<li>
			The POIFS API requires a Java 2
			implementation.
		</li>
	</ul>	
   </s2>
 </s1>
 <s1 title="4. Project Features">
	<p> 
		The POIFS API will include:
	</p>
	<ul>
		<li>
			Low level structures representing
			the structures in a POI filesystems.
		</li>
		<li>
			A low-level API for
			creating/manipulating POI filesystems.
		</li>
		<li>
			A set of high level interfaces
			abstracting the user from the POI filesystem constructs and
			representing it as a standard filesystem (Files, directories, etc)
		</li>
	</ul>	
	<p>
		The HSSF API will include:
	</p>
	<ul>
		<li>
			Low level structures representing
			the structures in an Excel file.
		</li>
		<li>
			A low-level API for creating and
			manipulating Excel files and writing them into POI filesystems.
		</li>
		<li>
			A high level model and style
			interface for manipulating spreadsheet data without knowing anything
			about the Excel format itself.
		</li>
	</ul>	
   <s2 title="4.1 POI Filesystem API">
	<p>
		The POI Filesystem API includes:
	</p>
	<ul>
		<li>An implementation of Big Blocks</li>
		<li>An implementation of Small Blocks</li>
		<li>An implementation of Header Blocks</li>
		<li>An implementation of Block Allocation Tables</li>
		<li>An implementation of Property Sets</li>
		<li>An implementation of the POI 
		filesystem including functions to get and set the above constructs;
		compound functions for reading/writing files/directories.
		</li>
		<li>An abstraction of the POI
		filesystem providing interfaces representing Files, Directories,
		FileSystems in normal terminology and encapulating the above
		constructs.
		</li>
		<li>Full documentation of the POI file
		format.
		</li>
		<li>Full documentation of the APIs and
		interfaces provided through Javadoc, user documentation (aimed at
		developers using the APIs)
		</li>
		<li>Examples aimed at teaching the
		user to write code using POI. (titled: recipes for POI)
		</li>
		<li>Performance specifications.
		(Example POI filesystems rated by some measure of complexity along
		with system specifications and execution times for given operations)
		</li>
	</ul>
   </s2>
   <s2 title="4.2 HSSF API">
	<p>
		The HSSF API includes:
	</p>
	<ul>
		<li>An implementation of Record
		(binary 2 byte type followed by 2 byte size (n) followed by n bytes)</li>
		<li>Implementations of many standard
		record types mapping the data bytes to fields along with methods to
		reserialize those fields</li>
		<li>An implementation of the HSSF File
		including functions to get/set the above constructs, create a blank
		file with the minimum required record types and mappings between
		getting/setting data and style in a workbook to the creation of
		record types, and read HSSF files.</li>
		<li>An abstraction of the HSSF file
		format providing interfaces representing the HSSF File, HSSF
		Workbook, HSSF Sheet, HSSF Column, HSSF Formulas in a manner
		seperating the data from the styling and encapsulating the above
		constructs.</li>
		<li>Full documentation of the HSSF
		file format (which will be a subset of the Excel '97 File format).
		This must be done with care for legal reasons.</li>
		<li>Full documentation of the APIs and
		interfaces provided through Javadoc, user documentation (aimed at
		developers using the apis).</li>
		<li>Examples aimed at teaching
		developers to use the APIs. 
		</li>
		<li>Performance specifications.
		(Example files rated by some measure of complexity along with system
		specifications and execution times for given operations - possibly
		the same files used for POI's tests)</li>
	</ul>
   </s2>
   <s2 title="4.3 HSSF Serializer">
	<p>
		The HSSF Serializer subproject:
	</p>
	<ul>
		<li>A class supporting the Cocoon 2
		Serializer Interface.</li>
		<li>An interface between the SAX
		events and the HSSF APIs.</li>
		<li>A specified tag language for using
		with the Serializer.</li>
		<li>Documentation on the tag language
		for the HSSF Serializer</li>
		<li>Normal javadocs.</li>
		<li>Example XML files</li>
		<li>Performance specifications.
		(Example XML docs and stylesheets rated by some measure of
		complexity along with system specifications and execution times)</li>
	</ul>	
   </s2>
 </s1>
 <s1 title="5. Other Product Requirements">
   <s2 title="5.1. Applicable Standards">
	<p>
		All Java code will be 100% pure Java.
	</p>
   </s2>
   <s2 title="5.2. System Requirements">
	<p>
		The minimum system requirements for POIFS are:
	</p>
	<ul>
		<li>64 Mbytes memory</li>
		<li>Java 2 environment</li>
		<li>Pentium or better processor (or equivalent on other platforms)</li>
	</ul>	
	<p>
		The minimum system requirements for HSSF are:
	</p>
	<ul>
		<li>64 Mbytes memory</li>
		<li>Java 2 environment</li>
		<li>Pentium or better processor (or equivalent on other platforms)</li>
		<li>POIFS API</li>
	</ul>	
	<p>
		The minimum system requirements for the HSSF Serializer are:
	</p>
	<ul>
		<li>64 Mbytes memory</li>
		<li>Java 2 environment</li>
		<li>Pentium or better processor (or equivalent on other platforms)</li>
		<li>Cocoon 2</li>
		<li>HSSF API</li>
		<li>POI API</li>
	</ul>	
   </s2>
   <s2 title="5.3. Performance Requirements">
	<p>
		All components must perform well enough
		to be practical for use in a webserver environment (especially
		Cocoon2/Tomcat/Apache combo)
	</p>
   </s2>
   <s2 title="5.4. Environmental Requirements">
	<p>
		The software will run primarily in
		developer environments. We should make some allowances for
		not-highly-technical users to write XML documents for the HSSF
		Serializer. All other components will assume intermediate Java 2
		knowledge. No XML knowledge will be required except for using the
		HSSF Serializer. As much documentation as is practical shall be
		required for all components as XML is relatively new, and the
		concepts introduced for writing spreadsheets and to POI filesystems
		will be brand new to Java and many Java developers.
	</p>
   </s2>
 </s1>
 <s1 title="6. Documentation Requirements">
   <s2 title="6.1 POI Filesystem">
	<p>
		The filesystem as read and written by
		POI shall be fully documented and explained so that the average Java
		developer can understand it.
	</p>
   </s2>
   <s2 title="6.2. POI API">
	<p>
		The POI API will be fully documented
		through Javadoc. A walkthrough of using the high level POI API shall
		be provided. No documentation outside of the Javadoc shall be
		provided for the low-level POI APIs.	
	</p>
   </s2>
   <s2 title="6.3. HSSF File Format">
	<p>
		The HSSF File Format as implemented by
		the HSSF API will be fully documented. No documentation will be
		provided for features that are not supported by HSSF API that are
		supported by the Excel 97 File Format. Care will be taken not to
		infringe on any &quot;legal stuff&quot;.	
	</p>
   </s2>
   <s2 title="6.4. HSSF API">
	<p>
		The HSSF API will be documented by
		javadoc. A walkthrough of using the high level HSSF API shall be
		provided. No documentation outside of the Javadoc shall be provided
		for the low level HSSF APIs.	
	</p>
   </s2>

   <s2 title="6.5. HSSF Serializer">
	<p>
		The HSSF Serializer will be documented
		by javadoc. 	
	</p>
   </s2>

   <s2 title="6.6 HSSF Serializer Tag language">
	<p>
		The XML tag language along with
		function and usage shall be fully documented. Examples will be
		provided as well.
	</p>
   </s2>
 </s1>
 <s1 title="7. Terminology">
   <s2 title="7.1 Filesystem">
	<p>
		filesystem shall refer only to the POI formatted archive.
	</p>
   </s2>
   <s2 title="7.2 File">
	<p>
		file shall refer to the embedded data stream within a 
		POI filesystem. This will be the actual embedded document. 
	</p>
   </s2>
 </s1>
</body>
</document>