[Ala-portal] Problems loading huge DwC Archive
melecoq
melecoq at gbif.fr
Wed Jun 1 05:05:19 CEST 2016
Dear all,
I'm still hold with my dataset with more than 20 millions occurrences.
I understood that the issue was due to the large size of the Zipfile.
It's too big for the ZipFile Java Api.
I did a little trick and I was able to create the data resource. I
integrate the DwC Archive with occurrence and verbatim files with just
15 occurrences, then I changed those files with the real ones and it
seems to work.
Now, the problem is when I try to load the Zip File into the Cassandra
using load function from the biocache, I got a Java out of memory heap
error because the code use the RAM to download, unzip and read the file.
Unfortunately, 4 Go (zip file) and 23 Go (unzip file) is too big for it.
Do you know if there is another way to do it ? Can I unzip the file and
run the loading after it ? Or can I "manually" integrate data into the
Cassandra ?
Thanks in advance for your help.
Cheers,
Marie
More information about the Ala-portal
mailing list