Java解析Excel文件(.xlsx和.xls格式均适用)

简介:

Excel文件的解析与读取普通的文本文件不同,就像xml文件的解析,最常用的Java4J方式也是jar包支持的,
根据apache官方文档:http://poi.apache.org/spreadsheet/index.html,“Also please be aware that as the new XSSF supported Excel 2007 OOXML (.xlsx) files are XML based, the memory footprint for processing them is higher than for the older HSSF supported (.xls) binary files”,很显然,
.xls格式的excel文件需要HSSF支持,需要相应的poi.jar,.xlsx格式的excel文件需要XSSF支持,需要poi-ooxml.jar,
然而,该jar包还依赖其他jar包:poi-ooxml-schemas和xml,但是,在用普通web项目上,我添加了这两个jar包还是出错=.=
所以最后决定创建maven web工程来写,只需要添加了poi和poi-ooxml的依赖即可,maven的jar管理功能真的很惊艳~开心~

    <!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi</artifactId>
      <version>3.15</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi-ooxml</artifactId>
      <version>3.15</version>
    </dependency>

代码实例:

1.index.jsp

<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
    <title>解析excel文件</title>
</head>
<body>
    Hello 2018~
</body>
</html>

2.readExcelServlet

import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

@WebServlet(name = "readExcelServlet", urlPatterns = "/readExcelServlet")
public class readExcelServlet extends HttpServlet {
    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        request.setCharacterEncoding("UTF-8");

        //excel文件路径
        String excelPath = "E:\\readExcelMaven\\test.xlsx";

        try {
            //String encoding = "GBK";
            File excel = new File(excelPath);
            if (excel.isFile() && excel.exists()) {   //判断文件是否存在

                String[] split = excel.getName().split("\\.");  //.是特殊字符,需要转义!!!!!
                Workbook wb;
                //根据文件后缀(xls/xlsx)进行判断
                if ( "xls".equals(split[1])){
                    FileInputStream fis = new FileInputStream(excel);   //文件流对象
                    wb = new HSSFWorkbook(fis);
                }else if ("xlsx".equals(split[1])){
                    wb = new XSSFWorkbook(excel);
                }else {
                    System.out.println("文件类型错误!");
                    return;
                }

                //开始解析
                Sheet sheet = wb.getSheetAt(0);     //读取sheet 0

                int firstRowIndex = sheet.getFirstRowNum()+1;   //第一行是列名,所以不读
                int lastRowIndex = sheet.getLastRowNum();
                System.out.println("firstRowIndex: "+firstRowIndex);
                System.out.println("lastRowIndex: "+lastRowIndex);

                for(int rIndex = firstRowIndex; rIndex <= lastRowIndex; rIndex++) {   //遍历行
                    System.out.println("rIndex: " + rIndex);
                    Row row = sheet.getRow(rIndex);
                    if (row != null) {
                        int firstCellIndex = row.getFirstCellNum();
                        int lastCellIndex = row.getLastCellNum();
                        for (int cIndex = firstCellIndex; cIndex < lastCellIndex; cIndex++) {   //遍历列
                            Cell cell = row.getCell(cIndex);
                            if (cell != null) {
                                System.out.println(cell.toString());
                            }
                        }
                    }
                }
            } else {
                System.out.println("找不到指定的文件");
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        doPost(request, response);
    }
}

3.test.xlsx和test.xls

这里写图片描述

这就是一个Workbook,如果是.xls文件,就是HSSFWorkbook,如果是.xlsx文件,则是XSSFWorkbook

4.运行结果

访问:http://localhost:8080/readExcelServlet,可以看到控制台的打印结果:

firstRowIndex: 1
lastRowIndex: 5
rIndex: 1
周二
南京
我
学习
难过
rIndex: 2
周三
南京
我
学习
难过
rIndex: 3
周四
南京
我
学习
难过
rIndex: 4
周五
南京
我
学习
难过
rIndex: 5
周六
南京
我
玩
开心

5.总结

  • 1). 顺序:读取workbook——>读取工作表——>然后根据先行、后列的顺序,依次解析。
  • 2). 通过getFirstRowNum()getFirstCellNum()获取的初始行号和列号是物理值,默认从0开始,与表格中看到的相差1。
  • 3). “.”是特殊字符,用它进行字符串分割时需要转义~

这里写图片描述