Virtuoso Servers installation, setup, data upload, and querying on Windows

Virtuoso is an innovative enterprise grade multi-model data server for agile enterprises & individuals. It delivers an unrivaled platform agnostic solution for data management, access, and integration.The unique hybrid server architecture of Virtuoso enables it to offer traditionally distinct server functionality within a single product offering that covers the following areas:
•    Relational Data Management
•    RDF Data Management
•    XML Data Management
•    Free Text Content Management & Full Text Indexing
•    Document Web Server
•    Linked Data Server
•    Web Application Server
•    Web Services Deployment (SOAP or REST)
*information taken from http://virtuoso.openlinksw.com/
The choice of Virtuoso SPARQL endpoints is because of its best performance for large data sets . Please see the below article for details. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.161.7773&rep=rep1&type=pdf
a.    Virtuoso  server download
Open Link virtuoso server can be downloaded from the following urlhttp://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSDownload. Please make sure that you have downloaded the right server according to your system operating system and specification (32 bit, 64 bit).
b.    Virtuoso  startup
Virtuoso is a portable server; therefore, it does not require any installation. Just you need to extract the zip file into some directory D:\ahmadchan in my case.  The zip extraction will create a folder with the name virtuoso-opensource containing all of the server files. Within the virtuoso-opensourcego to database directory and copy virtuoso.ini file into bin directory.
Now go to D:\ahmadchan\virtuoso-opensource\bin using command line and run the following command
Virtuoso-t  -f
A virtuoso server will be started as show in Figure 3. A SPARQL endpoint will be available at http://localhost:8890/sparql or http://your.system.ip.address/sparql as shown in Figure4.  You can directly run various SPARQL using this public interface and also you can get results using API calls. The details of the API calls can be found in the virtuoso web page. The http port for the server is 8890 which you can change in virtuoso.ini file if required.  You can also write the above command in a .bat file say start.bat within bin directory.  Then just go to bin directory and click on start.bat file. The virtuoso server will be automatically started without using command line.
c.    Loading data into virtuoso SPARQL endpoint
Various methods for inserting RDF data into virtuoso server is given at http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFInsert.  In this manual we will explain 2 of the methods.
i.    RDF upload using virtuoso conductor
This method is useful for uploading small RDF files (e.g. 100 or 200 MB files).  In this method only one file can be uploaded at one time.
Please follow the following steps .
1.    Go to the link http://localhost:8890/ and click on conductor on the left side.
2.    Type dbaas both login account and password.
3.    Click on the Linked Data tab and then Quad Store Upload
4.    Select your RDF file (only one at a time), give a proper graph name and click upload
5.    If there is no syntax error, the RDF file will be added to the virtuoso server which you can query using the public interface given at http://localhost:8890/sparql .
ii.    Using Bulk Load
Bulk load is useful for fast, large number of RDF files upload. Since our Linked TCGA contains a very large number of files, therefore this method is highly recommended for uploading tumors data.
Prerequisites
•    If your Virtuoso release is prior to the commercial 06.02.3129 or open source 6.1.3 releases, then the Virtuoso Bulk Loader functions need to be loaded manually.
•    The directory containing the data set files must be included in the DirsAllowed parameter defined in the virtuoso INI file (D:\ahmadchan\virtuoso-opensource\bin\virtuoso.ini), after which the Virtuoso server must be restarted. In our case, RDF files are stored the directory D:\ahmadchan\Clinical Data and the DirsAllowedparameter contains the default values given below.
DirsAllowed    = ., ../vad,
We need to append D:\ahmadchan\Clinical Data directory into the allowed list. This can be done by appending the directory location. The DirsAllowed parameter will become
DirsAllowed= ., ../vad,D:\ahmadchan\Clinical Data
*Please mind backward slash for windows systems.
•    The Virtuoso Server should be appropriately configured to use sufficient memory and other system resources as detailed in the Virtuoso RDF Performance Tuning Guide, or the load may take an unacceptably long time, approaching forever.  Specially, the NumberOfBuffer and MaxDirtyBuffersparameters of the virtuoso INI file should be set according to the system specification.  In our case we are using a system of 4GB RAM so we need to uncomment the corresponding two lines (highlighted in red in text given below) given in the D:\ahmadchan\virtuoso-opensource\bin\virtuoso.inifile.
;; When running with large data sets, one should configure the Virtuoso. process to use between 2/3 to 3/5 of free system memory and to stripe .storage on all available disks.
;; Uncomment next two lines if there is 2 GB system memory free
;       NumberOfBuffers          = 170000
;       MaxDirtyBuffers          = 130000
;; Uncomment next two lines if there is 4 GB system memory free
;       NumberOfBuffers          = 340000
;       MaxDirtyBuffers          = 250000
;; Uncomment next two lines if there is 8 GB system memory free
;       NumberOfBuffers          = 680000
;       MaxDirtyBuffers          = 500000
;; Uncomment next two lines if there is 16 GB system memory free
;       NumberOfBuffers          = 1360000
;       MaxDirtyBuffers          = 1000000
;; Uncomment next two lines if there is 32 GB system memory free
;       NumberOfBuffers          = 2720000
;       MaxDirtyBuffers          = 2000000
;; Uncomment next two lines if there is 48 GB system memory free
;       NumberOfBuffers          = 4000000
;       MaxDirtyBuffers          = 3000000
;; Uncomment next two lines if there is 64 GB system memory free
;       NumberOfBuffers          = 5450000
;       MaxDirtyBuffers          = 4000000
;; Note the default settings will take very little memory
;; but will not result in very good performance
Bulk loading process
Once a proper configuration has been done in the INI file,  follow the following steps for upload.
1.    Go to bin folder and click on isql (D:\ahmadchan\virtuoso-opensource\isql).
2.    Run the following command to clear any previous load list of files
SQL>delete from db.dba.load_list;
3.    Enter the following command by providing the appropriate input values
SQL>ld_dir(‘<sourcefilename-or-directory>’,'<file name pattern>’,’graph iri’);
Please mind forward slash. In our case the parameters are given below.
SQL>ld_dir (‘D:/ahmadchan/Clinical Data’, ‘*.nt’, ‘http://cbakerlab.unbsj.ca&#8217;);
4.    Next enter the command
SQL> select * from DB.DBA.load_list;
5.    Finally, enter the command to start the bulk load and wait for completion.
SQL>rdf_loader_run();
6.    After successful upload  don’t forget to run the shut down command otherwise the file will be not completely uploaded
SQL> shutdown;

Source: Virtuoso Manual

Advertisements

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: