[Ala-portal] Problems loading huge DwC Archive

melecoq melecoq at gbif.fr
Wed Jun 1 05:05:19 CEST 2016


 Dear all,
 I'm still hold with my dataset with more than 20 millions occurrences.

 I understood that the issue was due to the large size of the Zipfile. 
 It's too big for the ZipFile Java Api.
 I did a little trick and I was able to create the data resource. I 
 integrate the DwC Archive with occurrence and verbatim files with just 
 15 occurrences, then I changed those files with the real ones and it 
 seems to work.

 Now, the problem is when I try to load the Zip File into the Cassandra 
 using load function from the biocache, I got a Java out of memory heap 
 error because the code use the RAM to download, unzip and read the file. 
 Unfortunately, 4 Go (zip file) and 23 Go (unzip file) is too big for it.

 Do you know if there is another way to do it ? Can I unzip the file and 
 run the loading after it ? Or can I "manually" integrate data into the 
 Cassandra ?

 Thanks in advance for your help.
 Cheers,
 Marie


More information about the Ala-portal mailing list