User Tools

Site Tools


faq

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
faq [2018/12/19 11:14]
flack
faq [2018/12/21 15:00] (current)
flack [Can I get a Mcule database SDF file in smaller chunks?]
Line 33: Line 33:
 If you have access to a unix based system and awk you can use the below commands to split large, gzipped SDF files into smaller chunks. If you have access to a unix based system and awk you can use the below commands to split large, gzipped SDF files into smaller chunks.
  
-To split an sdf.gz file into multiple uncompressed chunks, use a command like this:+To split an sdf.gz file into multiple ​**uncompressed chunks**, use a command like this:
 <​code>​ <​code>​
 gzip -dc your.sdf.gz | awk -v name=<​chunk_name>​ -v ext=sdf -v size=<​size>​ '​BEGIN{size=size}(NR==1){file1=sprintf("​%s%0.10d.%s",​name,​counter,​ext)}{print $0 > file1}{if($0=="​$$$$"​){file2=sprintf("​%s%0.10d.%s",​name,​int(++counter/​size),​ext);​{if(file1!=file2){close(file1);​file1=file2}}}}'​ gzip -dc your.sdf.gz | awk -v name=<​chunk_name>​ -v ext=sdf -v size=<​size>​ '​BEGIN{size=size}(NR==1){file1=sprintf("​%s%0.10d.%s",​name,​counter,​ext)}{print $0 > file1}{if($0=="​$$$$"​){file2=sprintf("​%s%0.10d.%s",​name,​int(++counter/​size),​ext);​{if(file1!=file2){close(file1);​file1=file2}}}}'​
Line 40: Line 40:
 Just replace your.sdf.gz with your filename, <​chunk_name>​ with the name of the files you want and <​size>​ with the intended chunk size. Just replace your.sdf.gz with your filename, <​chunk_name>​ with the name of the files you want and <​size>​ with the intended chunk size.
  
-For example to split the Mcule Purchasable (Full) sdf.gz file into 1M uncompressed chunks use:+For example to split the Mcule Purchasable (Full) sdf.gz file into 1M **uncompressed chunks** use:
 <​code>​ <​code>​
 gzip -dc mcule_purchasable_full_180817.sdf.gz | awk -v name=mcule_purchasable_full_180817__ -v ext=sdf -v size=1000000 '​BEGIN{size=size}(NR==1){file1=sprintf("​%s%0.10d.%s",​name,​counter,​ext)}{print $0 > file1}{if($0=="​$$$$"​){file2=sprintf("​%s%0.10d.%s",​name,​int(++counter/​size),​ext);​{if(file1!=file2){close(file1);​file1=file2}}}}'​ gzip -dc mcule_purchasable_full_180817.sdf.gz | awk -v name=mcule_purchasable_full_180817__ -v ext=sdf -v size=1000000 '​BEGIN{size=size}(NR==1){file1=sprintf("​%s%0.10d.%s",​name,​counter,​ext)}{print $0 > file1}{if($0=="​$$$$"​){file2=sprintf("​%s%0.10d.%s",​name,​int(++counter/​size),​ext);​{if(file1!=file2){close(file1);​file1=file2}}}}'​
Line 46: Line 46:
  
  
-To split an sdf.gz file into multiple gzip compressed chunks, use a command like this:+To split an sdf.gz file into multiple ​**gzip compressed chunks**, use a command like this:
 <​code>​ <​code>​
 gzip -dc your.sdf.gz | awk -v name=<​chunk_name>​ -v ext=sdf.gz -v size=<​size>​ '​BEGIN{size=size}(NR==1){file1=sprintf("​%s%0.10d.%s",​name,​counter,​ext)}{print $0 | "gzip -9 > "​file1""​}{if($0=="​$$$$"​){file2=sprintf("​%s%0.10d.%s",​name,​int(++counter/​size),​ext);​{if(file1!=file2){close("​gzip -9 > "​file1""​);​file1=file2}}}}'​ gzip -dc your.sdf.gz | awk -v name=<​chunk_name>​ -v ext=sdf.gz -v size=<​size>​ '​BEGIN{size=size}(NR==1){file1=sprintf("​%s%0.10d.%s",​name,​counter,​ext)}{print $0 | "gzip -9 > "​file1""​}{if($0=="​$$$$"​){file2=sprintf("​%s%0.10d.%s",​name,​int(++counter/​size),​ext);​{if(file1!=file2){close("​gzip -9 > "​file1""​);​file1=file2}}}}'​
Line 53: Line 53:
 Just replace your.sdf.gz with your filename, <​chunk_name>​ with the name of the files you want and <​size>​ with the intended chunk size. Just replace your.sdf.gz with your filename, <​chunk_name>​ with the name of the files you want and <​size>​ with the intended chunk size.
  
-For example to split the Mcule Purchasable (Full) sdf.gz file into 1M gzip compressed chunks use:+For example to split the Mcule Purchasable (Full) sdf.gz file into 1M **gzip compressed chunks** use:
 <​code>​ <​code>​
 gzip -dc mcule_purchasable_full_180817.sdf.gz | awk -v name=mcule_purchasable_full_180817__ -v ext=sdf.gz -v size=1000000 '​BEGIN{size=size}(NR==1){file1=sprintf("​%s%0.10d.%s",​name,​counter,​ext)}{print $0 | "gzip -9 > "​file1""​}{if($0=="​$$$$"​){file2=sprintf("​%s%0.10d.%s",​name,​int(++counter/​size),​ext);​{if(file1!=file2){close("​gzip -9 > "​file1""​);​file1=file2}}}}'​ gzip -dc mcule_purchasable_full_180817.sdf.gz | awk -v name=mcule_purchasable_full_180817__ -v ext=sdf.gz -v size=1000000 '​BEGIN{size=size}(NR==1){file1=sprintf("​%s%0.10d.%s",​name,​counter,​ext)}{print $0 | "gzip -9 > "​file1""​}{if($0=="​$$$$"​){file2=sprintf("​%s%0.10d.%s",​name,​int(++counter/​size),​ext);​{if(file1!=file2){close("​gzip -9 > "​file1""​);​file1=file2}}}}'​
faq.txt ยท Last modified: 2018/12/21 15:00 by flack