Convert multi-page xls / xslx to csv using PHPExcel

In my projects, it was often necessary to collect data from different sources in the CSV format, and while I did not need to get data from several pages of xls tables, I just needed fgetcsv () / fputcsv (). But then the day came nevertheless when the task “to receive data from all pages of the document” was set for me. And, as usual, I began to look for a ready-made solution so as not to build my own “bicycle”. But, unfortunately, I did not find exactly what I needed: there was a similar solution that displayed a multi-page document on the screen, but another library was used, which, as I understood it, did not support the xslx format (Excel 2007 +). After looking a bit more for other options, I realized that it was bad and decided to deal with the library on my own. Having combined some tips on working with the PHPExcel library into one, I got the following script.

First, we need PHPExcel itself . Immediately, I note that the library is excellently installed through composer, however, the full version of the library is not explicitly indicated anywhere. As a selection method, I indicated version 1.8 with the addition of the “inaccuracy” attribute.

In my composer.json, which I added to the require-dev block: {}, I got the following record: At the moment version 1.8.1 is installed. Since the PHPExcel library inherits the SPL that PHP has had since version 5.3, I decided to use Iterators instead of standard traversal of the array of rows and cells of the document using foreach (). We connect the library, load the document and determine some initial data:

"require-dev": {
"phpoffice/phpexcel": "~1.8"
},






/** Include PHPExcel */
include_once '../Classes/PHPExcel.php';
$callStartTime = microtime(true);
$tmpFileName = microtime(true);
$format = 'Y-m-d';
// Load PHPExcel object
$objPHPExcel =  PHPExcel_IOFactory::load('multipage.xls');

Next, we get an Iterator of pages, to crawl which we do not need to know their number:

$sheetsIterator = $objPHPExcel->getWorksheetIterator();

An iterator is bypassed with the help of such a simple construction:

while( $sheetsIterator->valid()) {
    $pageNumber = $sheetsIterator->key();
    $pageContent = $sheetsIterator->current();
    $sheetsIterator->next();

Likewise, rows and cells of a document were received and bypassed. How beautiful it is to get data from a cell, I found here on Habr, in the article Universal reading of cells in PHPExcel . I will not detail all the checks who have a desire - you can read in this article.

Writing to a CSV file, I do the same through this library in this way.

// Create new object to write converted data and separate documents sheets
    $csvPagePhpExcel = new PHPExcel();
    // HERE Add Data to Object 
    // Creating CSV writer Object and save data to file
    $objWriter = PHPExcel_IOFactory::createWriter($csvPagePhpExcel, 'CSV');
    $objWriter->save($currentTmpFileName);

I will show the filling of the object for writing to the file below in the full version of the script. The only thing I can add: if you need to write dates again in xls / xlsx documents and specify explicit formatting , then when preparing the PHPExcel object, you can use the following construction:

            if ($isDate) {
                $csvPagePhpExcel->getActiveSheet()->getStyle($cellIterator->key().$rowIterator->key())->getNumberFormat()->setFormatCode(PHPExcel_Style_NumberFormat::FORMAT_DATE_YYYYMMDD2);
                $csvPagePhpExcel->getActiveSheet()->setCellValue($cellIterator->key().$rowIterator->key(), $cellValue);
            } else {
                $csvPagePhpExcel->getActiveSheet()->setCellValue($cellIterator->key().$rowIterator->key(), $cellValue);
            }

Where the format is set using the constant PHPExcel_Style_NumberFormat :: FORMAT_DATE_YYYYMMDD2 . In this case, this is the yyyy-mm-dd format, which can be used immediately in the MySql query.

By the way, here are all available library constants:

/* Pre-defined formats */
    const FORMAT_GENERAL                 = 'General';
    const FORMAT_TEXT                    = '@';
    const FORMAT_NUMBER                  = '0';
    const FORMAT_NUMBER_00               = '0.00';
    const FORMAT_NUMBER_COMMA_SEPARATED1 = '#,##0.00';
    const FORMAT_NUMBER_COMMA_SEPARATED2 = '#,##0.00_-';
    const FORMAT_PERCENTAGE              = '0%';
    const FORMAT_PERCENTAGE_00           = '0.00%';
    const FORMAT_DATE_YYYYMMDD2          = 'yyyy-mm-dd';
    const FORMAT_DATE_YYYYMMDD           = 'yy-mm-dd';
    const FORMAT_DATE_DDMMYYYY           = 'dd/mm/yy';
    const FORMAT_DATE_DMYSLASH           = 'd/m/y';
    const FORMAT_DATE_DMYMINUS           = 'd-m-y';
    const FORMAT_DATE_DMMINUS            = 'd-m';
    const FORMAT_DATE_MYMINUS            = 'm-y';
    const FORMAT_DATE_XLSX14             = 'mm-dd-yy';
    const FORMAT_DATE_XLSX15             = 'd-mmm-yy';
    const FORMAT_DATE_XLSX16             = 'd-mmm';
    const FORMAT_DATE_XLSX17             = 'mmm-yy';
    const FORMAT_DATE_XLSX22             = 'm/d/yy h:mm';
    const FORMAT_DATE_DATETIME           = 'd/m/y h:mm';
    const FORMAT_DATE_TIME1              = 'h:mm AM/PM';
    const FORMAT_DATE_TIME2              = 'h:mm:ss AM/PM';
    const FORMAT_DATE_TIME3              = 'h:mm';
    const FORMAT_DATE_TIME4              = 'h:mm:ss';
    const FORMAT_DATE_TIME5              = 'mm:ss';
    const FORMAT_DATE_TIME6              = 'h:mm:ss';
    const FORMAT_DATE_TIME7              = 'i:s.S';
    const FORMAT_DATE_TIME8              = 'h:mm:ss;@';
    const FORMAT_DATE_YYYYMMDDSLASH      = 'yy/mm/dd;@';
    const FORMAT_CURRENCY_USD_SIMPLE     = '"$"#,##0.00_-';
    const FORMAT_CURRENCY_USD            = '$#,##0_-';
    const FORMAT_CURRENCY_EUR_SIMPLE     = '[$EUR ]#,##0.00_-';

As a result, I got a script that writes each page of the document to a separate csv file, while it receives the correct cell values ​​regardless of the availability of external data and formats the date in a suitable format.

Here it is in full:

getWorksheetIterator();
$tmpFileName = microtime(true);
// Date format ready to import in SQL database
$format = 'Y-m-d';
while( $sheetsIterator->valid()) {
    $currentTmpFileName = "/tmp/{$tmpFileName}_sheet_{$sheetsIterator->key()}.csv";
    echo $sheetsIterator->key() . '
'; // Get current sheet rows $rowIterator = $sheetsIterator->current()->getRowIterator(); // Create new object to write converted data and separate documents sheets $csvPagePhpExcel = new PHPExcel(); while ($rowIterator->valid()) { // Get Cells from current Rows $cellIterator = $rowIterator->current()->getCellIterator(); echo '
' . $rowIterator->key() .'-'; while ($cellIterator->valid()) { $cellValue = $cellIterator->current()->getCalculatedValue(); //check is date if(PHPExcel_Shared_Date::isDateTime($cellIterator->current())) { $cellValue = date($format, PHPExcel_Shared_Date::ExcelToPHP($cellValue)); } //for incorrect formulas take old value if((substr($cellValue,0,1) === '=' ) && (strlen($cellValue) > 1)){ $cellValue = $cellIterator->current()->getOldCalculatedValue(); } $currentCellNum = PHPExcel_Cell::columnIndexFromString($cellIterator->key()); echo $cellIterator->key() . '(' . $currentCellNum . ') => ' . $cellValue; $csvPagePhpExcel->getActiveSheet()->setCellValue($cellIterator->key().$rowIterator->key(), $cellValue); $cellIterator->next(); } $rowIterator->next(); } // Creating CSV writer Object and save data to file $objWriter = PHPExcel_IOFactory::createWriter($csvPagePhpExcel, 'CSV'); $objWriter->save($currentTmpFileName); // clearing trash $csvPagePhpExcel->__destruct(); unset($csvPagePhpExcel); $objWriter = ''; unset($objWriter); $sheetsIterator->next(); } $callEndTime = microtime(true); $callTime = $callEndTime - $callStartTime; echo $callTime;

At the testing stage, all values ​​are displayed on the screen, then anyone who does not need to naturally can remove the output.

Criticism, additions and corrections are welcome. Thank you all, and I will be very happy if my article helps someone and reduces the few hours of work.

Also popular now: