Archive | Uncategorized RSS for this section

Building a SADI service in Python

Source: https://code.google.com/p/sadi/source/browse/wiki/BuildingServicesInPython.wiki?spec=svn2357&r=2357

Building services in Python is very easy. They are also easy to publish as WSGI applications or incorporated into other frameworks such as Pylons and TurboGears. We will be creating a Python version of the service described in What is a SADI service?.

Requirements

SADI requires Python 2.6 or greater, and can be installed using easy_install:

easy_install sadi

It can also be installed using pip:

pip install sadi

Python SADI services can be added to Pylons or TurboGears applications as Controllers or can be run by themselves. SADI Services can also be used as WSGI applications.

Defining input and output OWL classes

Your service’s input and output OWL classes describe its interface to the world. The property restrictions on the input class define the properties your service needs to operate and the property restrictions on the output class define the properties your service will attach.

As described in What is a SADI service?, your input and output classes must each be identified by a URL that resolves to the class definition. Before you can proceed, you must have created the ontology describing your input and output and hosted it such that this is the case. Ontology design is beyond the scope of this document, but see DefiningInputAndOutputOWLClasses for some SADI-specific tips.

For our example, our input class will be [http://sadiframework.org/examples/hello.owl#NamedIndividual]. The class definition is reproduced below:

1

This class specifies a single property restriction: that there is at least one value of the [http://xmlns.com/foaf/0.1/name] property. As suggested in DefiningInputAndOutputOWLClasses, this is a necessary and sufficient condition for class membership (indicated by the `owl:equivalentClass` construct), allowing any individual with a `foaf:name` to be dynamically identified as a `NamedIndividual`.

For our example, our output class will be [http://sadiframework.org/examples/hello.owl#GreetedIndividual]. The class definition is reproduced below:

This class also specifies a single property restriction, indicating that the service will attach a value of the [http://sadiframework.org/examples/hello.owl#greeting] property that is an `xsd:string`.

Note that, as mentioned in DefiningInputAndOutputOWLClasses, the [hello.owl http://sadiframework.org/examples/hello.owl%5D ontology completely specifies the properties it uses: it defines the `greeting` property and imports the `foaf:name` property.

Creating the Python SADI service

A python SADI service is very simple, and takes very few lines to define in a python file. Create a file called example.py to edit. First, we need to import the SADI and RDFlib modules:

import sadi
from rdflib import *

Next, we need to define the namespaceswe need to use in our code:

hello=Namespace("http://sadiframework.org/examples/hello.owl#")
foaf=Namespace("http://xmlns.com/foaf/0.1/")

We also define our class. We need to set some service metadata (including organization), input and output classes, and the service body, called process():

class ExampleService(sadi.Service):
label = "Hello, world"
serviceDescriptionText = 'A simple "Hello, World" service that reads a name and attaches a greeting.'
comment = 'A simple "Hello, World" service that reads a name and attaches a greeting.'
serviceNameText = "Hello, world (python)"
name = "example"

def getOrganization(self):
result = self.Organization()
result.add(RDFS.label,Literal("Example Organization"))
result.add(sadi.mygrid.authoritative, Literal(False))
result.add(sadi.dc.creator, URIRef('mailto:john.smith@example.com'))
return result

def getInputClass(self):
return hello.NamedIndividual

def getOutputClass(self):
return hello.GreetedIndividual

def process(self, input, output):
pass

Finally, in order to execute the service by itself, add a main method:

resource = ExampleService()

if __name__ == "__main__":
sadi.serve(resource, port=9090)

This lets you run your service from the command line, which can help with testing.

Adding business logic

Your service is now ready for some business logic. SADI in python uses the [https://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#module-rdflib.resource resource] module from [https://rdflib.readthedocs.org/en/latest/ rdflib], which acts very much like the Resource objects in Jena. Read the [http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html rdflib documentation] for further information about how to work with it. Business logic goes in the process() function. Turning a Named Individual into a GreetedIndividual is simple:

def process(self, input, output):
output.set(hello.greeting, Literal("Hello, "+input.value(foaf.name).value))

That’s all we need to implement our SADI service!

Running your service

In order to test your service, it must first be running. If you are working from the command line, execute the following command:

$ python example.py

Note: while your service is running, you will not be able to execute further commands in that terminal window.

Testing your service

In order to test your service, you need an input RDF document and the corresponding expected output. For our example, we will use the following RDF input, available at [http://sadiframework.org/test/hello-input.rdf].


"Guy Incognito";
a .

Save this input to a file called “exampleInput.ttl”. Here is the corresponding expected output:

@prefix ns1: .
@prefix rdf: .
@prefix rdfs: .
@prefix xml: .
@prefix xsd: .

a ns1:GreetedIndividual ;
ns1:greeting "Hello, Guy Incognito" .

If you are working from the command line, execute the following curl command to test the service (remember that the SADI service is running in your original terminal, so you will have to open a new terminal and change to the sadi.service.skeleton directory):

$ curl -s -H Content-Type:text/turtle -H Accept:text/turtle -X POST --data-binary @exampleInput.ttl http://localhost:9090/

_Note: if you are copying-and-pasting the above command, be sure that it all appears on one line._

Writing Asynchronous SADI Services

There are two ways to write an asynchronous SADI service: the easy way or the hard way. The easy way provides an execution thread for each instance that is submitted. It does nothing fancy with thread pooling (yet), but provides a simple interface that looks just like a synchronous SADI service. The hard way provides a different interface, but provides more control over how the service is executed, allowing implementers to write the request to secondary submission services (like a message queueing system or workflow submission system) and check for status when the user asks for it.

Asynchronous SADI Services the Easy Way

Instead of implementing the `process()` function, instead implement the `async_process()` function:

def async_process(self, input, output):
output.set(hello.greeting, Literal(“Hello, “+input.value(foaf.name).value))

That’s it. You now have an asynchronous SADI service. Do not attempt to also override the `process()` function as well, as that will short-circuit the asynchronous functionality.

Asynchronous SADI Services the Hard Way

To do more advanced handling of asynchronous requests, you will need to write two methods: `defer()` and `result()`:

def defer(self, input, task):
submit_to_external_processor(input,task)

def result(self, task):
try:
graph = check_external_processor(task)
return graph
if graph == None:
raise sadi.IncompleteError()
except:
raise sadi.HTTPError('404 Not Found')

`defer()` accepts a task and passes it on to whatever mechanism it uses. The input data is available, as before, via the input resource, while a URI for the task (the actual URL that the client will access the results from) should be used to identify the task in the future. When a user asks for the status of a task, they will use that URI to identify it. `result()` is called whenever the user asks for the result of processing. If the task is complete, return a RDFlib Graph object that contains the result. If it is not, raise a `sadi.IncompleteError`. If the task is unheard of, then raise a 404 Not Found error using `sadi.HTTPError`.

SADI Attachment Support

SADI in Python supports the inclusion of attachments using the [http://tools.ietf.org/html/rfc2387 multipart/related Content Type]. To submit data with attachments, the Content-Type header must be set to “multipart/related” and include a boundry parameter. The first unnamed part that is has a parseable RDF content type will be treated as the input graph. All named attachments that have a content disposition that contains a valid URI will be available for access via the `sadi.Service.get()` function.

Services that call `self.get(“http://example.com/#”,input)` or pass a `rdflib.URIRef` object will get back a [http://werkzeug.pocoo.org/docs/wrappers/#werkzeug.wrappers.Response werkzeug.wrappers.Response] object containing the requested content from the attachment. If there is no attachment for the requested URI, the URI will be treated as a URL and a download attempt will be made using [http://docs.python.org/2/library/urllib2.html urllib2].

*Technical Note:* the input object must be passed to `sadi.Service.get()` in order to provide access to its underlying graph, which in our implementation, include references to all named attachments. This is to keep SADI service implementations re-entrant, so that more than one request can be processed concurrently.

Deploying and registering your service

The SADI service can be deployed to Apache using mod_wsgi using [https://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide these instructions], or can be incorporated into other WSGI applications such as [http://docs.pylonsproject.org/projects/pylons-webframework/en/latest/wsgi_support.html Pylons], [http://turbogears.org/2.1/docs/main/WSGIAppControllers.html Turbogears], or [http://ckan.org CKAN].

Once your service has been deployed, you can register it by visiting [http://sadiframework.org/registry/] and submitting the URL of your service in the form on that page. If you wish to unregister your service later, simply undeploy it and resubmit the now invalid URL.

Integrating your service into TurboGears 2.3

[http://turbogears.org/ TurboGears] has some capabilities that allow for things like URL routing that make traditional WSGI applications unsuitable as controllers. Integrating a SADI service therefore requires a little bit of shim code to work. The main issue is that the request body seems to become unavailable by the time the service is actually called. We need to intercept that body before it goes away. To start, add the following line to the {{{_call_}}} method of the BaseController class in your project:

class BaseController(TGController):
def __call__(self, environ, start_response):
environ['request_body'] = start_response.request.body

The next step is to define a {{{wsgi_wrap}}} function that shims the service:


def wsgi_wrap(fn):
'''Decorate a WSGI application so that it can work within a TurboGears controller.'''
def call(self):
tglocals = request.environ['tg.locals']
def start_response(status, headers, exc_info=None):
response.status = status
response.headers.update(headers)
if exc_info:
response.headerlist = exc_info
tglocals.request.body = request.environ['request_body']
return fn(self,tglocals.request.environ, start_response)
return call

Finally, to add a SADI service to a TurboGears controller, instantiate it privately and call it via a controller method. If we could decorate callable classes, this would be even simpler, but this will work:

class MyController(BaseController):
_example = ExampleService()

@expose()
@wsgi_wrap
def example(self,environ, start_response):
return self._example(environ, start_response)

Integrating your service into Flask
[http://flask.pocoo.org/ Flask] is a very simple python microframework that punches well above its weight. It is much simpler to integrate SADI services into Flask, as it supports easy integration with all WSGI services. If you have implemented the example in the tutorial and instantiated it at `resource` and have instantiated a Flask application at `app`, you can route the path `/example` to the service like this:

@app.route("/example",methods=['POST','GET'])
def example():
return resource

32 Points on How to Write a Research Paper

Common review criteria for a research paper
1. Significance:
How important is the contribution to advancing CS knowledge?
2. Originality:
How novel/clever is the idea/approach/solution?
3. Technical Soundness:
Is the proposed technique correct? Does it solve the problem?
4. Clarity/presentation:
Does it clearly describe the proposed idea/approach/solution that the work can be reproduced/replicated by others?
5. Related work:
Are existing techniques sufficiently discussed and contrasted?
________________________________________
Ingredients/Components of a research paper:
6. “WHY”/motivation:
o Why is the problem important?
o Why are existing techniques not sufficient?
o Why is the proposed approach going to solve the problem?
7. “WHAT”/problem statement:
o What is the *specific* problem the paper is trying to solve?
8. “HOW”/approach:
o How is the problem solved?
o What is the proposed algorithm that solves the problem?
9. Evaluation:
o Empirical and/or theoretical
o Empirical: criteria, data, procedures, results, analysis
o Theoretical: criteria, analysis
10. Conclusion:
o Summary of findings
o Limitations and possible improvements
________________________________________
Ingredients of “HOW”/approach
11. Motivation of the solution/algorithm: why does it solve the problem?
12. Pseudo-code
13. Explanation of pseudo-code
14. Example
________________________________________
Ingredients of evaluation
15. Empirical
o criteria: what metric is used to measure success, what better means?
o data: description of data used, how they are obtained, …
o procedures: how the experiments are conducted
o results: measurements obtained from the experiments
o analysis: interpret the results, make comparison, draw conclusion, discuss lessons learned,…
16. Theoretical
o criteria: what metric is used to measure success, what better means?
o proof: when worst/best case occurs, how to generate the first equation, define variables, mathematical derivation
o analysis: interpret the results, make comparison, draw conclusion, discuss lessons learned,…
________________________________________
Related Work
17. discuss related techniques
18. cite the sources in the sentence when the techniques are first discussed
19. discuss their limitation/weakness
________________________________________
References
20. different types of sources: conference paper, journal paper, book chapter, pages in a book, technical report.
21. different styles
________________________________________
Introduction
22. motivation: general (why the problem is important), specific (why existing techniques are not sufficient)
23. problem statement: goal (think about what questions you are trying to answer)
24. overview of proposed approach
25. contributions (key findings) briefly
26. organization of the paper
________________________________________
Conclusion
27. summary of findings
28. limitations and possible improvements
________________________________________
Abstract (one paragraph)
29. motivation
30. problem statement
31. proposed approach
32. contributions

(Thanks Dr. Philip K. Chan)

How to upgrade R in ubuntu?

Follow the instructions from here

sudo gedit /etc/apt/sources.list

This will open up your sources.list file in gedit, where you can add the following line.

deb http://cran.cnr.berkeley.edu/bin/linux/ubuntu/ version/

Replace version/ with whatever version of Ubuntu you are using (eg, precise/, oneric/, and so on). If you’re getting a “Malformed line error”, check to see if you have a space between /ubuntu/ and version/.

Fetch the secure APT key with gpg –keyserver keyserver.ubuntu.com –recv-key E084DAB9 or gpghkp://keyserver keyserver.ubuntu.com:80 –recv-key E084DAB9.

Feed it to apt-key with gpg -a –export E084DAB9 | sudo apt-key add

Update your sources and upgrade your installation with sudo apt-get update && sudo apt-get upgrade.

Since R is already installed, you should be able to upgrade it with this method.

Note that you don’t have to necessarily use the Berkeley mirror. You may get a list of all mirrors here : http://cran.r-project.org/mirrors.html

How to get Tess4j running on Ubuntu 10.04

In order to get a working Tess4j on an Ubuntu 10.04 machine, the following steps might help:

At first, install Ghostscript with apt-get to get the PDF to image conversion functionality:

apt-get install ghostscript

For building from source in the next steps, some additional packages are required. If they are not present already, install them with:

sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlib1g-dev

Because the package shipped with Ubuntu 10.04 is too old, get Leptonica sources (current version, at least 1.67), build them and install:

wget http://www.leptonica.com/source/leptonica-1.69.tar.gz
tar xzf leptonica-1.69.tar.gz
cd leptonica-1.69
./autogen.sh
./configure
make
sudo make install
sudo ldconfig

There is a package for Tesseract-OCR too, but it does not (at the time of this writing) contain the shared object library equivalent to the DLL provided by Tess4j. So in the last step, get the sources from subversion repository, patch with the C-API patch, build them and install:

svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
wget https://groups.google.com/group/tesseract-dev/attach/e59d99d242b3b275/001-tesseract-capi.patch?part=4&authuser=0
cd tesseract-ocr
patch -p0 < ../001-tesseract-capi.patch
./autogen.sh
./configure
make
sudo make install
sudo make install-langs
sudo ldconfig

source:http://left4dev.blogspot.ca/2012/09/how-to-get-tess4j-running-on-ubuntu-1004.html