doc/ServiceDescription/LocalSearchDocumentation

Local Search

The Local Search component aims at providing a simplified abstraction layer for accessing NEPOMUK's RDFRepository and possibly other components which may be involved in the search process.

This abstraction layer basically enables the NEPOMUK community to build arbitrarily complex extensions to the query processing workflow, without changing the corresponding API. This has the advantage that other components who rely on this API remain completely untouched.

Overview

Features

In contrast to querying the RDF repository directly using SPARQL, Local Search makes generally no assumption on the query language used for search. Local Search is not meant to replace or duplicate any existing component, but will indeed built upon them. However, it is not intended to completely support all features from the underlying components, but rather provides a facility for everyday-queries. Local Search should be regarded the NEPOMUK pendant for an interpreter of queries one would type into a (semantically enhanced) Google(TM) search box. The core features of Local Search are:

  • Support for arbitrary query languages (language can be specified as a parameter to the query).
  • Support for full-text search (via LuceneSail) and ranking (via Metadata Ranker).
  • Support for search shortcuts of more complex SPARQL queries (e.g., my recent documents). This is to some degree comparable to stored procedures from the database world.

Local Search integrates in the NEPOMUK environment as follows. Besides the current, direct way to access the data in the RDF repository, the user, any NEPOMUK component or outside application may search for it via Local Search. In turn, Local Search uses the RDF repository and possibly other components, such as the Metadata Ranker, to retrieve information matching a query.

API

Local Search's API was designed with simplicity in mind. To especially reduce web service footprint, API methods use String parameters and return values whenever possible. Later, the API will also provide methods with complex parameters which may increase performance when called from within one Java virtual machine.

Basically, the API only needs one method: query(). It takes a query expressed as a string and the query language identifier (also as a string) which uniquely specifies the meaning of the query string. The returned result always is a table of values, including a header line describing the column names. The caller is in charge of interpreting the results.

In addition to that, a Query object model supports more complex queries, which may not even be serialized as a string.

Besides the actual query, the API also provides means for specifying arbitrary query options, such as sorting parameters, limiting filters, namespace parameters etc. To allow arbitrary extensions, these options may be specified as triples (in N-Triples string representation), separate from the query. All parameter triples share a common subject, localsearch:Query. Options are then added as predicate-object pairs to this subject. Simple options may use a literal as the object; more complex options may be added by referencing other resources.

Additional Query Languages

In addition to the native query language capabilities of the RDF repository (SPARQL, SERQL), also other query languages may be considered useful in the context of Desktop Search. Currently, three additional query languages are considered: Fulltext query, N-Triples QBE and Search shortcuts.

Fulltext Query

The first additional query language which is considered in Local Search besides the already existing ones is the language used in search engines' search boxes. With queries like Belfast Meeting one can simply search for documents containing the given words.

This is the language that regular web users accustomed to by using web search engines like Google and Yahoo and it was also adopted by most desktop search engines. Hence, it is also supported in Local Search. Here, the entered keywords will be matched against the full-text of available document resources.

More elaborate queries can be expressed using wild cards (word truncation) and boolean operators, e.g. (Belf* OR Berlin) Meeting. The syntax used for this is identical to Lucene's Queryparser; search itself is conveyed using LuceneSail from within the RDF store.

N-Triples QBE

This query language is based on a Query-by-Example (QBE) N-triples notation. This query language will demonstrate how "Google-ish" semantic queries may look like by using Local Search. Assume you have the following triples in the RDF repository:

a:TheQuickBrownFox b:jumpsOver c:TheLazyDog . c:TheLazyDog foaf:name "Lazy Dog" .

To query these triples, one will now only need to partially specify triples again in the query. For instance, to check "who jumped over the lazy dog?", the query would simply be b:jumpsOver c:TheLazyDog . When two of the three parts of a triple are specified, they are automatically treated as predicate-object pairs; thus, the resources which are to be retrieved are the subjects which match these constraint. Any other pairing of triple parts may be specified by using a wildcard or a variable, such as a:TheQuickBrownFox b:jumpsOver ?X . -- "over whom does the quick brown fox jump?".

Search shortcuts

Search shortcuts are arbitrary strings which are mapped to valid queries that are understood by the RDF repository. Currently, this feature is experimental, but there are plans to extend manageability to dynamically set and update mappings at runtime. Prominent shortcuts could be my recent documents or, eventually, repeat query. The latter would require session handling, which is not yet integrated, however. In a further step, search shortcuts may be extended by parameters (e.g., show last 3 documents, basically resembling something like stored procedures from the database world ("PL/SQL for the Semantic Web"). Right now, it is not yet decided to what extent this should be supported within Local Search.

Authors/Developers

Christian Kohlschütter, L3S. Development, Documentation.

Installation Instructions

LocalSearch consists of two packages, the API (package org.semanticdesktop.services.localsearch within the org.semanticdesktop.services plugin) and the implementation (org.semanticdesktop.nepomuk.comp.localsearch).

LocalSearch currently depends solely on the NEPOMUK Middleware (all necessary plugins are included there).

Using the component

Javadoc Documentation is available here. Users of LocalSearch should only use classes within the org.semanticdesktop.services.localsearch.* package hierarchy. The classes under org.semanticdesktop.nepomuk.comp.localsearch.* are implementation-specific and only for developers interested in tweaking with LocalSearch's internals.

A demo package on how to include LocalSearch into your projects is available at org.semanticdesktop.nepomuk.comp.localsearch.demo. You should also have a look at the PSEW Searchinterface, which makes heavy use of LocalSearch.

The WSDL service description can be found here.

Examples

To demonstrate the features of LocalSearch, here are a few examples:

Retrieving the LocalSearch instance

ServiceReference sr = context.getServiceReference(LocalSearch.class.getName());
LocalSearch ls = (LocalSearch) context.getService(sr);

Setting default query options

QueryOptions qo = new QueryOptions();
qo.setRepositoryId("main"); // Use the "main" repository.

Free-text query, results sorted by descending score

qo.setOrderCriterion("http://lucene.apache.org/ontology/score", QueryOptions.SORT_ORDER_DESCENDING);
QueryResultTable qrt = ls.query("hello world", org.semanticdesktop.services.localsearch.LocalSearch#QUERY_LANGUAGE_LUCENESAIL, qo);

N-Triples query with additional namespace

qo.setNamespace("foaf", "http://xmlns.com/foaf/0.1/");
QueryResultTable qrt = ls.query("foaf:name \"John Doe\"", org.semanticdesktop.services.localsearch.LocalSearch#QUERY_LANGUAGE_NTRIPLES, qo);

N-Triples query with additional namespace, expressed by a TriplesQuery?

qo.setNamespace("foaf", "http://xmlns.com/foaf/0.1/");

TriplesQuery tq = new TriplesQuery();
tq.addTriple(new Triple(VariableTriplePart.SUBJECT,
    URITriplePart.newPrefixURI("foaf:name"),
    LiteralTriplePart.stringLiteral("John Doe")));

QueryResultTable qrt = ls.query(tq, qo);

Search shortcut with a maximum of 5 results

qo.setMaxResults(5);
QueryResultTable qrt = ls.query("my recent documents", org.semanticdesktop.services.localsearch.LocalSearch#QUERY_LANGUAGE_SHORTCUTS, qo);

SPARQL query (SPARQL's query options override QueryOptions?'s parameters)

qo.setMaxResults(5);
QueryResultTable qrt = ls.query("SELECT ?o WHERE { ?s ?p ?o . } LIMIT 100", org.semanticdesktop.services.localsearch.LocalSearch#QUERY_LANGUAGE_SPARQL, qo);

Developing the component

Full Javadoc Documentation is available here.

SVN browsing

https://dev.nepomuk.semanticdesktop.org/repos/trunk/java/org.semanticdesktop.nepomuk.comp.localsearch/

https://dev.nepomuk.semanticdesktop.org/repos/trunk/java/org.semanticdesktop.services/src/org/semanticdesktop/services/localsearch/