Blog 3: Mastering Elasticsearch Indexing and Mapping with Python

Kshitij Kutumbe
3 min readDec 6, 2023

--

Welcome to the third installment of our series, “Mastering Elasticsearch with Python: A Beginner’s Guide.” In this post, we’re delving deeply into the world of Elasticsearch indexing and mapping. We will cover various aspects of indexing, including creating indices with custom mappings, understanding field types, exploring analysis, and implementing bulk operations in Python.

Part 1: Introduction to Elasticsearch Indexing

Understanding an Index in Elasticsearch

  • Definition: An index in Elasticsearch is a collection of documents with similar characteristics. It’s a key concept in Elasticsearch, analogous to a database in traditional relational databases.
  • Purpose: Indices provide an organized structure to store and retrieve data efficiently.

Index Creation Basics

  • Creating an Index: Indices are created to store documents. Each index has settings and mappings.

Part 2: Detailed Mapping and Analysis

Mapping in Depth

  • What is Mapping?: Mapping is the process of defining how a document and its fields are stored and indexed. Think of it as defining the schema for a table in a SQL database.

Field Types in Mapping

  • Text vs. Keyword:
  • text fields are analyzed and tokenized, ideal for full-text search.
  • keyword fields are not analyzed and are used for exact matches, aggregations, and sorting.
  • Other Common Field Types:
  • date: For date and time.
  • long, integer: Numeric data types.
  • boolean: True or False values.
  • nested: For nested objects or arrays.

Custom Mapping Creation

  • Example: Creating an index with a specific mapping.
mapping = {
"mappings": {
"properties": {
"title": {"type": "text"},
"published_on": {"type": "date"},
"content": {"type": "text"},
"tags": {"type": "keyword"}
}
}
}

es.indices.create(index='my_articles', body=mapping, ignore=400)

Analysis: The Heart of Full-Text Search

  • Understanding Analysis: Analysis is the process that converts text into tokens or terms stored in an inverted index for searching. It involves two stages: tokenization and text analysis.

Tokenizers and Analyzers

  • Tokenizers: Splits text into individual terms or tokens.
  • Analyzers: A combination of tokenizers and filters. Elasticsearch offers built-in analyzers like standard, simple, whitespace, and more.

Creating Custom Analyzers

  • Example: Creating an index with a custom analyzer.
custom_analyzer = {
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}

es.indices.create(index='custom_analysis', body=custom_analyzer, ignore=400)

Part 3: Advanced Indexing Techniques

Bulk Indexing for Efficiency

  • Why Bulk Indexing?: Bulk indexing is a method to index multiple documents in a single request, which is more efficient than individual indexing, especially for large datasets.

Implementing Bulk Indexing in Python

  • Code Example:
from elasticsearch.helpers import bulk

documents = [
{"_index": "my_articles", "_id": 1, "_source": {"title": "Elasticsearch Basics", "published_on": "2023-03-01", "content": "Learning Elasticsearch", "tags": ["search", "database"]}},
# Add more documents here
]

success, _ = bulk(es, documents)
print(f"Successfully indexed {success} documents")

Retrieving and Updating Mapping

  • Get Current Mapping:
current_mapping = es.indices.get_mapping(index='my_articles')
print(current_mapping)

Updating Mapping: Note that some aspects of mapping, like field types, cannot be changed after creation without reindexing.

Part 4: Conclusion

We’ve explored the intricate details of Elasticsearch indexing and mapping in this blog. Understanding these concepts is crucial for effectively using Elasticsearch’s powerful search capabilities.

Up Next: In our next blog, we’ll delve into constructing complex search queries in Elasticsearch and how to harness its full-text search capabilities using Python.

https://kshitijkutumbe.medium.com/blog-4-elasticsearch-query-deep-dive-text-keyword-and-vector-searches-with-python-e42586003abd

Key Takeaways:

  • Indexing in Elasticsearch is a multifaceted process, crucial for organizing and searching data.
  • Mapping defines how data is stored and indexed, with various field types and custom analyzers enhancing search capabilities.
  • Bulk operations are efficient for large-scale indexing tasks.

Practice these concepts and feel free to share your experiences or queries in the comments below. Happy indexing!

--

--

Kshitij Kutumbe
Kshitij Kutumbe

Written by Kshitij Kutumbe

Data Scientist | NLP | GenAI | RAG | AI agents | Knowledge Graph | Neo4j kshitijkutumbe@gmail.com www.linkedin.com/in/kshitijkutumbe/

No responses yet