5 Common Techniques Used in Text Analysis Tools

According to a study by the International Data Group (IDG), unstructured data is growing at an alarming rate of 62% per year. The same study also suggests that by 2022, close to 93% of all data in the digital...

Read More

Why Enterprises Need a Semantic Search Platform

The biggest challenge enterprises face when using a keyword based search platform is that a major portion of organizational data comprises of unstructured data. According to the Market Pulse Survey by...

Read More

Limitations of Search Databases in Medical Literature

Given the exponential growth in medical literature, finding relevant information sooner is critical. Researchers, with more content and less time to analyse it, need systems that are smart and intelligent enough to...

Read More

An Intuitive Way to Search with Natural Language Processing (NLP)

We know that computers understand programming languages but how about making them understand human language, the language that you and me speak? Natural Language Processing (NLP) is a field of study that makes this...

Read More

Working with Enterprise Search Relevancy Challenges

When enterprise searches are built from scratch, evaluation of the search quality remains key challenges of organizations implementing it. It always gives a feel of living in the darkness all the time. Such implementations demand enormous efforts and time. The chart below demonstrates a typical...

Read More

SOLR Security with ManifoldCF

This article explains how to implement SOLR “document level security” using Manifold Connector Framework. ManifoldCF is an open source framework for pulling content out of a repository and sending it on to targets such as SOLR via a plug-in style and connector-based...

Read More

Building Docker image with Solr

There are two ways to build a Docker image:

Running an image, modifying and committing it. This requires access to the live container.Using Dockerfile to build it.

Read More

Getting Started with Docker

According to the Docker's website, "Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications."

Read More

Using Solr and TikaOCR to search text inside an image

Tesseract is probably the most accurate open source OCR engine available and with Apache Tika 1.7 you can now use the awesome Tesseract OCR parser within Tika!

Read More

Ontologies Vs Taxonomies Vs Thesauri, and its place on the Semantic Web

Ontology :

An ontology formally defines a common set of terms that are used to describe and represent a domain. An ontology is domain specific, and it is used to describe and represent an area of knowledge. It contains...

Read More