HSSF是POI对Excel-97(-2007)文件操作的纯Java实现。XSSF是POI对Excel 2007 OOXML(.xlsx)文件操作的纯Java实现。从POI 3.8版本开始,提供了一种基于XSSF的低内存占用的API(SXSSF)。
POI提供两种读写API模型:事件模型以及用户模型。事件模型是基于流的方式实现,使用sax(simple api for XML)模型进行内容解析,对CPU以及内存的损耗小,但没法进行写操作。用户模型基于内存树的方式实现,使用DOM进行excel的解析,对CPU以及内存的损耗大,但能够以面向对象的方式进行操作,可读可写。文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
Event API (HSSF Only)文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
它可以以比较小的内存来读一个xls文件,为了使用这个API,你需要:文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
(1)创立一个org.apache.poi.hssf.eventmodel.HSSFRequest的实例文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
(2)实现org.apache.poi.hssf.eventmodel.HSSFListener接口的类。文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
(3)HSSFRequest.addListener(yourlistener,recordid)注册(2)中实现的类,recordid应当是org.apache.poi.hssf.record的属性。或者也可认使用HSSFRequest.addListerForAllRecords(yourlistener)。文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
(4)创立一个org.apache.poi.poifs.filesystem.FileSystem实例, 并把XLS文件通过输入流方式输入。文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
(5)可以把(4)中创立的实例通过HSSFEventFactory.processWorkbookEvents(request,Filesystem)办法或HSSFEventFactory.processEvents(request,inputstream)与绑定。文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
(6)这样listener就会调用processRecord(Record)办法,直到整个文件读完。文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
/**文章源自微观生活(93wg.com)微观生活-https://93wg.com/17023.html
* This example shows how to use the event API for reading a file.
*/
public class EventExample
implements HSSFListener
{
private SSTRecord sstrec;
/**
* This method listens for incoming records and handles them as required.
* @param record The record that was found while reading.
*/
public void processRecord(Record record)
{
switch (record.getSid())
{
// the BOFRecord can represent either the beginning of a sheet or the workbook
case BOFRecord.sid:
BOFRecord bof = (BOFRecord) record;
if (bof.getType() == bof.TYPE_WORKBOOK)
{
System.out.println(\"Encountered workbook\");
// assigned to the class level member
} else if (bof.getType() == bof.TYPE_WORKSHEET)
{
System.out.println(\"Encountered sheet reference\");
}
break;
case BoundSheetRecord.sid:
BoundSheetRecord bsr = (BoundSheetRecord) record;
System.out.println(\"New sheet named: \" + bsr.getSheetname());
break;
case RowRecord.sid:
RowRecord rowrec = (RowRecord) record;
System.out.println(\"Row found, first column at \"
+ rowrec.getFirstCol() + \" last column at \" + rowrec.getLastCol());
break;
case NumberRecord.sid:
NumberRecord numrec = (NumberRecord) record;
System.out.println(\"Cell found with value \" + numrec.getValue()
+ \" at row \" + numrec.getRow() + \" and column \" + numrec.getColumn());
break;
// SSTRecords store a array of unique strings used in Excel.
case SSTRecord.sid:
sstrec = (SSTRecord) record;
for (int k = 0; k < sstrec.getNumUniqueStrings(); k++)
{
System.out.println(\"String table value \" + k + \" = \" + sstrec.getString(k));
}
break;
case LabelSSTRecord.sid:
LabelSSTRecord lrec = (LabelSSTRecord) record;
System.out.println(\"String cell found with value \"
+ sstrec.getString(lrec.getSSTIndex()));
break;
}
}
/**
* Read an excel file and spit out what we find.
*
* @param args Expect one argument that is the file to read.
* @throws IOException When there is an error processing the file.
*/
public static void main(String[] args) throws IOException
{
// create a new file input stream with the input file specified
// at the co妹妹and line
FileInputStream fin = new FileInputStream(args[0]);
// create a new org.apache.poi.poifs.filesystem.Filesystem
POIFSFileSystem poifs = new POIFSFileSystem(fin);
// get the Workbook (excel part) stream in a InputStream
InputStream din = poifs.createDocumentInputStream(\"Workbook\");
// construct out HSSFRequest object
HSSFRequest req = new HSSFRequest();
// lazy listen for ALL records with the listener shown above
req.addListenerForAllRecords(new EventExample());
// create our event factory
HSSFEventFactory factory = new HSSFEventFactory();
// process our events based on the document input stream
factory.processEvents(req, din);
// once all the events are processed close our file input stream
fin.close();
// and our document input stream (don\'t want to leak these!)
din.close();
System.out.println(\"done.\");
}
}
Event API(XSSF以及SAX)
使用较少的内存读取XLSX文件。
import java.io.InputStream;
import java.util.Iterator;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
public class ExampleEventUserModel {
public void processOneSheet(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
// To look up the Sheet Name / Sheet Order / rID,
// you need to process the core Workbook stream.
// Normally it\'s of the form rId
InputStream sheet2 = r.getSheet(\"rId2\");
InputSource sheetSource = new InputSource(sheet2);
parser.parse(sheetSource);
sheet2.close();
}
public void processAllSheets(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
Iterator<inputstream> sheets = r.getSheetsData();
while(sheets.hasNext()) {
System.out.println(\"Processing new sheet:\\n\");
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
System.out.println(\"\");
}
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
XMLReader parser =
XMLReaderFactory.createXMLReader(
\"org.apache.xerces.parsers.SAXParser\"
);
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}
/**
* See org.xml.sax.helpers.DefaultHandler javadocs
*/
private static class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if(name.equals(\"c\")) {
// Print the cell reference
System.out.print(attributes.getValue(\"r\") + \" - \");
// Figure out if the value is an index in the SST
String cellType = attributes.getValue(\"t\");
if(cellType != null && cellType.equals(\"s\")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// Clear contents cache
lastContents = \"\";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => contents of a cell
// Output after we\'ve seen the string contents
if(name.equals(\"v\")) {
System.out.println(lastContents);
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
lastContents += new String(ch, start, length);
}
}
public static void main(String[] args) throws Exception {
ExampleEventUserModel example = new ExampleEventUserModel();
example.processOneSheet(args[0]);
example.processAllSheets(args[0]);
}
}</inputstream>
SXSSF
SXSSF位于org.apache.poi.xssf.streaming包下,用于spreadsheets比较大时,api以及XSSF兼容。SXSSF通过一个滑动窗口来限制走访Row的数量从而到达低内存占用,不在窗口的rows不可再走访。
可以在创立SXSSFWorkbook(int windowSize)指定窗口大小,或者针对sheet设置窗口大小
?
1
SXSSFSheet
当通过createRow()创立一个新Row时,窗口中的行数已经超过了固定的大小,索引最小的会被flush。
默许的窗口大小是100,当窗口大小是-1时,表示窗口无穷大。SXSSF会生成临时文件,所以必需通过dispose办法进行删除了。
SXSSFWorkbook 默许使用inline strings而不是一个同享的string table,这样就不用在内存中保留数据,但这样可能会使发生的文档与客户端不兼容。如果开启String table可能会占用大量内存,这类掂量需要用户抉择。
此外是不是会损耗内存取决于你要使用的特性,比如:merged region , hyperlinks, co妹妹ents等。
import junit.framework.Assert;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
public static void main(String[] args) throws Throwable {
SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
Sheet sh = wb.createSheet();
for(int rownum = 0; rownum < 1000; rownum++){
Row row = sh.createRow(rownum);
for(int cellnum = 0; cellnum < 10; cellnum++){
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
}
// Rows with rownum < 900 are flushed and not accessible
for(int rownum = 0; rownum < 900; rownum++){
Assert.assertNull(sh.getRow(rownum));
}
// ther last 100 rows are still in memory
for(int rownum = 900; rownum < 1000; rownum++){
Assert.assertNotNull(sh.getRow(rownum));
}
FileOutputStream out = new FileOutputStream(\"/temp/sxssf.xlsx\");
wb.write(out);
out.close();
// dispose of temporary files backing this workbook on disk
wb.dispose();
}
SXSSF会把sheet数据放在临时(一个sheet一个临时文件),所以临时文件可能很大,例如一个20MB的csv数据,xml临时文件可能多在1G,所以如果临时文件的大小是一个问题,可使SXSSF使用gzip压缩。
SXSSFWorkbook wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true); // temp files will be gzipped
以上就是微观生活(93wg.com)关于“java poi 操作excel实例教程”的详细内容,希望对大家有所帮助!
评论