Blog 4: Elasticsearch Query Deep Dive: Text, Keyword, and Vector Searches with Python

Kshitij Kutumbe
3 min readDec 7, 2023

--

Welcome to the fourth installment of “Mastering Elasticsearch with Python: A Beginner’s Guide.” This post is dedicated to an in-depth exploration of Elasticsearch’s versatile querying capabilities. We will delve into text search, keyword search, and vector search, covering their underlying concepts, practical applications, and Python code examples.

Part 1: Text Search in Depth

Understanding Text Search

  • Fundamentals: Text search in Elasticsearch is primarily about finding documents that contain specific text. It leverages analyzers to process text.

The Role of Analyzers

  • What Are Analyzers?: Analyzers in Elasticsearch are used to convert text into tokens or terms, which are then indexed. An analyzer consists of a tokenizer and zero or more filters.
  • Example: Standard Analyzer, Custom Analyzer.

Mapping for Text Search

  • Creating an Index with Custom Analyzer:
settings = {
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}

es.indices.create(index='my_text_index', body=settings)

Performing a Text Search

  • Search Query Using Match:
query = {
"query": {
"match": {
"content": "Elasticsearch guide"
}
}
}
response = es.search(index="my_text_index", body=query)

Part 2: Keyword Search and Concepts

Understanding Keyword Search

  • Essentials: Keyword search is designed for exact matches, like tags or categories. Unlike text fields, keyword fields are not analyzed.

Mapping for Keyword Fields

  • Index Creation with Keyword Fields:
mapping = {
"mappings": {
"properties": {
"tag": {
"type": "keyword"
}
}
}
}
es.indices.create(index='my_keyword_index', body=mapping)

Executing a Keyword Search

  • Term Query for Exact Match:
query = {
"query": {
"term": {
"tag": "python"
}
}
}
response = es.search(index="my_keyword_index", body=query)

Part 3: Vector Storage and Search

Introduction to Vector Search

  • Context: Vector search in Elasticsearch allows you to perform searches on dense vector fields. These are used in machine learning models, similarity scoring, etc.

Mapping for Vector Fields

  • Creating an Index for Vectors:
vector_mapping = {
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 128 # Dimension of the vector
}
}
}
}
es.indices.create(index='my_vector_index', body=vector_mapping)

Performing Vector Search

  • Indexing a Document with a Vector:
doc = {
"my_vector": [0.5, 0.8, 1.0, ...] # 128-dimensional vector
}
es.index(index='my_vector_index', id=1, document=doc)

KNN Search on Vectors:

knn_query = {
"query": {
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "cosineSimilarity(params.query_vector, 'my_vector') + 1.0",
"params": {"query_vector": [0.3, 0.6, 0.9, ...]} # Sample query vector
}
}
}
}
response = es.search(index='my_vector_index', body=knn_query)

Conclusion

This comprehensive exploration provides a deeper understanding of Elasticsearch’s text, keyword, and vector search capabilities. We’ve covered the critical aspects of mapping, querying, and the nuances of search types, all within the Python context.

Coming Up Next: In our next blog, we’ll dive into Elasticsearch’s aggregation framework and explore how it can be utilized for advanced data analysis and insights.

https://kshitijkutumbe.medium.com/blog-5-exploring-elasticsearch-aggregations-for-advanced-data-analysis-with-python-b585f2dcfe89

Key Takeaways:

  • Text searches are powerful for full-text queries, relying heavily on analyzers for tokenization.
  • Keyword searches are ideal for exact matches, such as tags or statuses.
  • Vector searches are cutting-edge, useful for similarity and relevance scoring in machine learning applications.

Experiment with these search types in your projects and share your experiences or queries in the comments. Stay tuned for more advanced Elasticsearch features!

--

--

Kshitij Kutumbe
Kshitij Kutumbe

Written by Kshitij Kutumbe

Data Scientist | NLP | GenAI | RAG | AI agents | Knowledge Graph | Neo4j kshitijkutumbe@gmail.com www.linkedin.com/in/kshitijkutumbe/

No responses yet