If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
Ok, so I ran a few tests, and tbh 300,000+ records is going to be hard to test from my end. I set up a very simple program using a counter. I used a Dou loop to swing through the code, so that there was no 'extra' processing going on. The 4 fields used were all static, a header line was included that did have a graphic on it.
I should note that all of these tests were done interactively and would have different results if done in batch.
My first test was 50 loops. the .xlsx file was created and emailed in less than 30 seconds. Worked great.
The next test was 500 loops. less than a minute later I had my spread sheet.
5,000 loops had a short pause but still I had the spread sheet in less than 2 minutes.
I cranked the number of loops up to 50,000 and got an immediate jump from 2 minutes to 16 minutes.
no memory error, a pause that most of my users will hang me over (lol)
From there, I turned it up to 100k loops. At 20 minutes I had to stop the test because the performance of the system was starting to be effected.
Now I'm curious if I can get it done at all, since I have to spend some time logged in over the weekend, I'm going to see if I can't get it pushed up to 300k records. (I expect it's going to take longer than your hour to get done at the rate I was going earlier)
I'll update more on Monday if I manage to figure out how to counteract your error Clarky, sorry its taken as long as it has.
No apology neccessary. You are helping me way more than I would ever expect anyone to.
Just an FYI, my spreadsheet has 12 columns, but is pretty basic beyond that. My program errors out at around 189488 rows. It really bogs the system down towards the end. At one point, my job was showing over 100% of the system(not good).
I did a couple of more tests over the weekend (and Monday) and I couldn't get mine to blow up. It was very slow finishing, but I did manage to get it to finish but only up to 100k rows. Without seeing code, I'm at a loss for suggestions. I can keep testing until I do manage to blow my code up, but I'm not sure what that would do to help. The performance at 100k+ records is so abysmal that I'd be tempted to go a different route and dump out a .csv and import that into Excel. Or, can you break on something and create multiple sheets? That may improve your performance quite a bit and prevent the bomb.
how many columns are you using ARobb? from my experience POI is VERY BAD when many columns are specified (which is opposite my thinking, i would have thought the number of rows would be a bigger performance killer) i have no idea why the additional columns generate so much extra overhead but it does.
@Clarky: are you cleaning up objects after they are no longer needed in the process? perhaps that might be some of the problem you're experiencing. (i'm no java expert by any means but those java objects can eat up memory and resources. what i normally do is once the row is written to the spreadsheet, get rid of the "cell" objects (not the formatting/style objects since those can be reused) and start the next row, etc.
I'm not anti-social, I just don't like people -Tommy Holden
4 static columns, I figured that would be the semi-minimum for a workable spread sheet. I've noticed that as well, but the databases are so small here, I normally don't have to worry about performance too much.
4 static columns, I figured that would be the semi-minimum for a workable spread sheet. I've noticed that as well, but the databases are so small here, I normally don't have to worry about performance too much.
Andy
yeah i generally try to use the least amount of columns possible when generating these to send to users. if it is one that requires more than 6 columns i usually set it up to run at off-peak hours due to the resources required (well unless it's only going to be a few rows like less than 100).
I'm not anti-social, I just don't like people -Tommy Holden
As Tom noted, 12 columns is going to be brutal on performance, and may actually be why my tests didn't blow up now that I'm thinking about it. You've 3 times as many columns as I was trying. *runs off to try a new round of testing*
Well, now I'm totally stumped. I expanded my spreadsheet to have 12 columns, like yours. Ran with 100 loops initially and all was fine. A bit slow, but a good run regardless. That was with ss_begin_object_group(10000);
I increased the loop value to 1000 and decreased the ss_begin_object_group() to 100. Again, slow, but it did finish up. After that successful completion, I decreased the value on my ss_begin_object_group() to 10. even with it set to 10 I've had successful runs at 1k loops and 5k loops. Have you checked to make sure your using the most reason release of the POI .jar files (and all that good jazz)....
I have a question. I am using the POI Java classes and have more than 100 columns in my resulting Excel. All was well until I had it all formatting with five rows of headings and my data coming through properly, then I started getting the following message the first time I open my results: Microsoft Excel "We found a problem with some content in 'SSR.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes." When I select 'Yes' all seems to be correct and complete. How can I get rid of the message so when I deploy the change, everyone doesn't think it is an untrustable Excel. Thanks in advance for any help, Linda
This reply by ARobb73 has been deleted by mjhaston
A cursory glance across Google suggests that it might be in the formatting of one of the numeric columns. But honestly I can't be sure. There's not a lot to run with other than a LOT of columns.
Comment