Semantic Infrastructure
Semantic Ontology Server – This component is responsible for providing Ontology, RDF1 and Semantic related services. It encompasses a Triple-Store 2 , and several services that help in managing, querying and analyzing the stored information. Specifically the services are:
Vocabulary3 Registry– Those vocabularies can be used for data validation, inference, entailment, specification for visual representation and more.
Link Analysis/ Graph API – Provide implementation of SNA 4algorithms to help uncover hidden relationships, identify patterns and reveal network structure as they reflect from the data.
RDF Store Federation- Allows accessing multiple sources of data (could be triple-stores, or external applications whose data is being dynamically converted to RDF) as a single, virtual store.
Citer Toolkit – Citer Toolkit provides implementation of infrastructure capabilities for the entire platform. It contains a myriad of algorithm-implementations and data-structures. It’s notable sub-components are:
Clustering Engine - Capable of finding separating a large amount of documents into smaller, sometimes hierarchical, groups. Within each group, the documents are similar. This is helpful in automatically creating a navigation mechanism for a large search-result-set.
Natural Language Processing (NLP) - This module provides Language Identification, Tokenization and Morphological analysis that are the basis for metadata-extraction and text-analytics.
Entity Extraction - This module is capable of identifying named-entities within text (people, organizations, dates etc.)
Profiling Engine - Profiler is a scalable, high-performance text profiling tool, capable of matching thousands of documents a minute against a large set of profiles. This is useful for document-routing, filtering and real-time alerting
.
Citer Knowledge Services
Enterprise Integration Facilities- A group of mechanism that allows Citer solutions to integrate into an enterprise environment.
Security Authentication and Authorization: Built around Java™ JAAS, the platform can integrate with the enterprise Certification Authorities, Login Mechanisms etc. to provide user identification, authentication and information level access-lists.
Task Scheduler - Orchestrates and automates Citer Tasks.
Event Notification Mechanism: Allows registration of Synchronous and Asynchronous events to be fired to an ESB (Enterprise Service Bus) as means for application integration.
Subscriptions and Alerting - Allows delivery of Citer notifications to users via various types of transports, like e-mail or SMS. This helps sending alerts on incoming data to users who may be interested in that information based on their profile.
Database and Enterprise Application RDFizers: a set of interfaces and translation mechanisms that allow information from different databases and applications to be exposed as RDF, thus allow seamless information integration.
Intelligence Services - A set of tools and services that facilitate gathering, analysis, retrieval and distribution of information.
Gatherer (Focused Crawler) - A unique, high-performance internet crawler which chooses the link-traversal order using the semantics encapsulated in the Ontology and according to a textual-analysis of web-pages.
Among other capabilities, the Gatherer is capable of
- Identifying the Main-Body-0f-Text within web-pages
- Emulating cookie-based sessions
- Accessing HTTP, HTTPS, NNTP,FTP, POP3, File-System, Exchange-Stores – as well as many other information sources (Protocol Handling).
- Identifying near-duplicate documents
- Metadata-Extraction: A set of extractors (summarizers) that can extract syntactic (Author, Title, Date etc.) as-well-as Semantic metadata (related entities, document feature-set, tags) from documents
- Document Summarization: Produces an ontology-inclined, readable, document summary for every document
- Document Format Handling: Allows text-extraction from various file-types like: MS-Word, PDF, RTF, HTML, XML, Text, MS-Excel, MS-Powerpoint, OpenDoc etc.
- Semi-Automatic-Classification: Allows creation of named nested-queries (Topics) for automatically classifying documents into a pre-defined topic-tree
- Content Filtering: Allows filtering out irrelevant information based on ontological disambiguation and profiling
- Reporting: allows creation of automatic reports based on pre-defined templates
- Content Indexing and Search: Citer Platform provides first-class enterprise search capabilities, using a scalable, built-in search-engine