An Incremental and Distributed Inference Method for Large-Scale Ontologies Based on MapReduce Paradigm
With the upcoming data deluge of semantic data, the fast growth of ontology bases has brought significant challenges in performing efficient and scalable reasoning. Traditional centralized reasoning methods are not sufficient to process large ontologies. Distributed reasoning methods are thus required to improve the scalability and performance of inferences. This paper proposes an incremental and distributed inference method for large-scale ontologies by using MapReduce, which realizes high-performance reasoning and runtime searching, especially for incremental knowledge base. By constructing transfer inference forest and effective assertional triples, the storage is largely reduced and the reasoning process is simplified and accelerated. Finally, a prototype system is implemented on a Hadoop framework and the experimental results validate the usability and effectiveness of the proposed approach.
Distributed reasoning methods on computing RDF closure for reasoning, which takes much time (usually several hours or even days for large ) and space (generally the ontology size is more thanoriginal data size).
Moreover, each time when new RDF arrives, full rereasoning over the entire dataset is needed to compute the new RDF closure. This process occurs at every which is too time-consuming in practice.
WebPIE newly-arrived RDF triples and old ones but fails consider the relations between them, thus resulting in a number of duplicated triples during the reasoning thereby its performance
A centralized architecture executed on a single machine or local server when dealing with large datasets, distributed reasoning approaches executed on multiple computing nodes have thus emerged to improve the scalability and speed of inferences.
DISADVANTAGES OF EXISTING SYSTEM:
The data volume of RDF closure is ordinarily larger than original RDF data.
The storage of RDF closure is thus not a small amount and the query on it takes nontrivial time.
The data volume increases and the ontology base is updated, these methods require the recomputation of the entire RDF closure every time when new data arrive.
Which takes much time (usually several hours or even days for large ) and space (generally the ontology size is more thanoriginal data size).
We propose an incremental and distributed inference method (IDIM) for large-scale RDF datasets via MapReduce. The choice of MapReduce is motivated by the fact that it can limit data exchange and alleviate load balancing problems by dynamically scheduling jobs on computing nodes.
In order to store the incremental RDF triples more efficiently, we present two novel concepts, i.e., transfer inference forest (TIF) and effective assertional triples (EAT). Their use can largely reduce the storage and simplify the reasoning process.
Based on TIF/EAT, we need not compute and store RDF closure, and the reasoning time so significantly decreases that a user’s online query can be answered timely, which is more efficient than existing methods to our best knowledge. More importantly, the update of TIF/EAT needs only minimum computation since the relationship between new triples and existing ones is fully used, which is not found in the existing literature.
In order to store the incremental RDF triples more efficiently, we present two novel concepts, transfer inference forest and effective assertional triples. Their use can largely reduce the storage and simplify the reasoning process.
ADVANTAGES OF PROPOSED SYSTEM:
Linear scalability, automatic failover support, and convenient backup of MapReduce jobs
Distributed data on the web make it difficult to acquire appropriate triples for appropriate inferences
Which can well leverage the old and new data to minimize the updating time and reduce the reasoning time when facing big RDF datasets.
To speed up the updating process with newly-arrived data and fulfill the requirements of end-users for online queries
System : Pentium IV 2.4 GHz.
Hard Disk : 40 GB.
Floppy Drive : 44 Mb.
Monitor : 15 VGA Colour.
Ram : 512 Mb.
Operating system : Windows 7/UBUNTU.
Coding Language : Java 1.7 ,Hadoop 0.8.1
IDE : Eclipse
Database : MYSQL
Bo Liu, Member, IEEE, Keman Huang, Jianqiang Li, and MengChu Zhou, Fellow, IEEE, “An Incremental and Distributed Inference Method for Large-Scale Ontologies Based on MapReduce Paradigm”, IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 1, JANUARY 2015.