User Tools

Site Tools


faq

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
faq [2018/12/19 01:40] flackfaq [2022/06/29 06:38] flack
Line 1: Line 1:
 ====== Frequently asked questions (FAQ) ====== ====== Frequently asked questions (FAQ) ======
 +
 +====How can I get access to a Linux system if my operating system is Windows?====
 +
 +If you are on Windows 10, we suggest to install WSL to get access to a Linux distribution (for example Ubuntu). You can find WSL install instructions here:
 +
 +[[https://docs.microsoft.com/en-us/windows/wsl/install-win10]]
  
 ====Can I get a Mcule database SMILES file in smaller chunks?==== ====Can I get a Mcule database SMILES file in smaller chunks?====
  
-If you have access to a unix based system and the split command you can use the below commands to split large files into smaller chunks.+If you have access to a Linux system and the split command you can use the below commands to split large files into smaller chunks.
  
 To split a smi.gz / smiles.gz file into multiple **uncompressed chunks** use a command like this: To split a smi.gz / smiles.gz file into multiple **uncompressed chunks** use a command like this:
Line 18: Line 24:
 To split a smi.gz / smiles.gz file into multiple **gzip compressed chunks** use a command like this: To split a smi.gz / smiles.gz file into multiple **gzip compressed chunks** use a command like this:
 <code> <code>
-gzip -dc your.smi.gz | split --verbose --lines=<size> --numeric-suffixes --suffix-length=<suffix_length> --additional-suffix='.smi' --filter='gzip -9> $FILE' - your__+gzip -dc your.smi.gz | split --verbose --lines=<size> --numeric-suffixes --suffix-length=<suffix_length> --additional-suffix='.smi.gz' --filter='gzip -9> $FILE' - your__
 </code> </code>
  
 For example to split the Mcule Purchasable (Full) smi.gz file into 1M **gzip compressed chunks** use: For example to split the Mcule Purchasable (Full) smi.gz file into 1M **gzip compressed chunks** use:
 <code> <code>
-gzip -dc mcule_purchasable_full_180817.smi.gz | split --verbose --lines=1000000 --numeric-suffixes --suffix-length=10 --additional-suffix='.smi' --filter='gzip -9> $FILE' - mcule_purchasable_full_180817__+gzip -dc mcule_purchasable_full_180817.smi.gz | split --verbose --lines=1000000 --numeric-suffixes --suffix-length=10 --additional-suffix='.smi.gz' --filter='gzip -9> $FILE' - mcule_purchasable_full_180817__
 </code> </code>
  
 If you have pigz installed on your system you can replace gzip with pigz in the commands above to speed up the process, especially when you want compressed chunks. You can typically install it with <code>sudo apt install pigz</code> or a similar command. If you have pigz installed on your system you can replace gzip with pigz in the commands above to speed up the process, especially when you want compressed chunks. You can typically install it with <code>sudo apt install pigz</code> or a similar command.
 +
 +If you are on Windows 10, we suggest to install WSL to get access to a Linux distribution (for example Ubuntu). You can find WSL install instructions here:
 +
 +[[https://docs.microsoft.com/en-us/windows/wsl/install-win10]]
  
  
 ====Can I get a Mcule database SDF file in smaller chunks?==== ====Can I get a Mcule database SDF file in smaller chunks?====
  
-If you have access to a unix based system and awk you can use the below commands to split large, gzipped SDF files into smaller chunks.+If you have access to a Linux system and awk you can use the below commands to split large, gzipped SDF files into smaller chunks.
  
-To split an sdf.gz file into multiple uncompressed chunks, use a command like this:+To split an sdf.gz file into multiple **uncompressed chunks**, use a command like this:
 <code> <code>
 gzip -dc your.sdf.gz | awk -v name=<chunk_name> -v ext=sdf -v size=<size> 'BEGIN{size=size}(NR==1){file1=sprintf("%s%0.10d.%s",name,counter,ext)}{print $0 > file1}{if($0=="$$$$"){file2=sprintf("%s%0.10d.%s",name,int(++counter/size),ext);{if(file1!=file2){close(file1);file1=file2}}}}' gzip -dc your.sdf.gz | awk -v name=<chunk_name> -v ext=sdf -v size=<size> 'BEGIN{size=size}(NR==1){file1=sprintf("%s%0.10d.%s",name,counter,ext)}{print $0 > file1}{if($0=="$$$$"){file2=sprintf("%s%0.10d.%s",name,int(++counter/size),ext);{if(file1!=file2){close(file1);file1=file2}}}}'
Line 40: Line 50:
 Just replace your.sdf.gz with your filename, <chunk_name> with the name of the files you want and <size> with the intended chunk size. Just replace your.sdf.gz with your filename, <chunk_name> with the name of the files you want and <size> with the intended chunk size.
  
-For example to split the Mcule Purchasable (Full) sdf.gz file into 1M uncompressed chunks use:+For example to split the Mcule Purchasable (Full) sdf.gz file into 1M **uncompressed chunks** use:
 <code> <code>
 gzip -dc mcule_purchasable_full_180817.sdf.gz | awk -v name=mcule_purchasable_full_180817__ -v ext=sdf -v size=1000000 'BEGIN{size=size}(NR==1){file1=sprintf("%s%0.10d.%s",name,counter,ext)}{print $0 > file1}{if($0=="$$$$"){file2=sprintf("%s%0.10d.%s",name,int(++counter/size),ext);{if(file1!=file2){close(file1);file1=file2}}}}' gzip -dc mcule_purchasable_full_180817.sdf.gz | awk -v name=mcule_purchasable_full_180817__ -v ext=sdf -v size=1000000 'BEGIN{size=size}(NR==1){file1=sprintf("%s%0.10d.%s",name,counter,ext)}{print $0 > file1}{if($0=="$$$$"){file2=sprintf("%s%0.10d.%s",name,int(++counter/size),ext);{if(file1!=file2){close(file1);file1=file2}}}}'
Line 46: Line 56:
  
  
-To split an sdf.gz file into multiple gzip compressed chunks, use a command like this:+To split an sdf.gz file into multiple **gzip compressed chunks**, use a command like this:
 <code> <code>
 gzip -dc your.sdf.gz | awk -v name=<chunk_name> -v ext=sdf.gz -v size=<size> 'BEGIN{size=size}(NR==1){file1=sprintf("%s%0.10d.%s",name,counter,ext)}{print $0 | "gzip -9 > "file1""}{if($0=="$$$$"){file2=sprintf("%s%0.10d.%s",name,int(++counter/size),ext);{if(file1!=file2){close("gzip -9 > "file1"");file1=file2}}}}' gzip -dc your.sdf.gz | awk -v name=<chunk_name> -v ext=sdf.gz -v size=<size> 'BEGIN{size=size}(NR==1){file1=sprintf("%s%0.10d.%s",name,counter,ext)}{print $0 | "gzip -9 > "file1""}{if($0=="$$$$"){file2=sprintf("%s%0.10d.%s",name,int(++counter/size),ext);{if(file1!=file2){close("gzip -9 > "file1"");file1=file2}}}}'
Line 53: Line 63:
 Just replace your.sdf.gz with your filename, <chunk_name> with the name of the files you want and <size> with the intended chunk size. Just replace your.sdf.gz with your filename, <chunk_name> with the name of the files you want and <size> with the intended chunk size.
  
-For example to split the Mcule Purchasable (Full) sdf.gz file into 1M gzip compressed chunks use:+For example to split the Mcule Purchasable (Full) sdf.gz file into 1M **gzip compressed chunks** use:
 <code> <code>
 gzip -dc mcule_purchasable_full_180817.sdf.gz | awk -v name=mcule_purchasable_full_180817__ -v ext=sdf.gz -v size=1000000 'BEGIN{size=size}(NR==1){file1=sprintf("%s%0.10d.%s",name,counter,ext)}{print $0 | "gzip -9 > "file1""}{if($0=="$$$$"){file2=sprintf("%s%0.10d.%s",name,int(++counter/size),ext);{if(file1!=file2){close("gzip -9 > "file1"");file1=file2}}}}' gzip -dc mcule_purchasable_full_180817.sdf.gz | awk -v name=mcule_purchasable_full_180817__ -v ext=sdf.gz -v size=1000000 'BEGIN{size=size}(NR==1){file1=sprintf("%s%0.10d.%s",name,counter,ext)}{print $0 | "gzip -9 > "file1""}{if($0=="$$$$"){file2=sprintf("%s%0.10d.%s",name,int(++counter/size),ext);{if(file1!=file2){close("gzip -9 > "file1"");file1=file2}}}}'
Line 61: Line 71:
  
 Please note that the process can take a while. Please note that the process can take a while.
 +
 +If you are on Windows 10, we suggest to install WSL to get access to a Linux distribution (for example Ubuntu). You can find WSL install instructions here:
 +
 +[[https://docs.microsoft.com/en-us/windows/wsl/install-win10]]
  
  
faq.txt · Last modified: 2024/04/09 08:33 by rkiss