AI Data Engineer, Hyderabad

AALUCKS Talent Pro

Full-time

Hyderabad, Telangana, IndiaINR 1,500,000 - 3,200,000/year

Position: AI Data Engineer, Hyderabad

Department: Information Technology | Role: Full-time | Experience: 5 to 8 Years | Number of Positions: 1 | Location: Hyderabad

Skillset:

AI Modules, Vector Databases, HNSW indexes, Metadata, Scalar Quantization, Python, Apache Spark, Flint, Kafka, LlamaIndex, LangChain, Data Version Control, Airflow, Excellent English communication skills

Job Description:

Position Overview:

We are seeking a hardcore, hands-on AI Data Engineer to build the high-performance data infrastructure required to power autonomous AI agents. You won't just be moving data from A to B; you will be architecting Dynamic Context Windows, managing Real-time Semantic Indexes, and building Self-Cleaning Data Pipelines that feed our "Super Employee" agents.

Key Responsibilities:

• Vector & Graph ETL: Design and maintain pipelines that transform unstructured data (PDFs, emails, logs, chats) into optimized embeddings for Vector Databases (Pinecone, Weaviate, Milvus).

• Semantic Data Modeling: Engineer data structures that optimize for Retrieval-Augmented Generation (RAG), ensuring agents find the "needle in the haystack" in milliseconds.

• Knowledge Graph Construction: Build and scale Knowledge Graphs (Neo4j) to represent complex relationships in our trading and support data that standard vector search misses.

• Automated Data Labeling & Synthetic Data: Implement pipelines using LLMs to auto-label datasets or generate synthetic edge cases for agent training and evaluation.

• Stream Processing for Agents: Build real-time data "listeners" (Kafka/Flink) that feed live context to agents, allowing them to react to market or support events as they happen.

• Data Reliability & "Drift" Detection: Build monitoring for "Embedding Drift", identifying when the statistical distribution of your data changes and the agent's "knowledge" becomes stale.

Qualifications:

• Vector Database Mastery: Expert-level configuration of HNSW indexes, scalar quantization, and metadata filtering strategies within Pinecone, Milvus, or Qdrant.

• Advanced Python & Rust: Proficiency in Python for AI logic and Rust (or C++) for high-performance data processing and custom embedding functions.

• Big Data Ecosystem: Hands-on experience with Apache Spark, Flink, and Kafka in a high-throughput environment (Trading/FinTech preferred).

• LLM Data Tooling: Deep experience with Unstructured.io, LlamaIndex, or LangChain for document parsing and chunking strategy optimization.

• MLOps & DataOps: Mastery of DVC (Data Version Control) and Airflow/Prefect for managing complex, non-linear AI data workflows.

• Embedding Models: Understanding of how to fine-tune embedding models (e.g., BGE, Cohere, or OpenAI) to better represent domain-specific (Trading) terminology.

Additional qualifications:

• Chunking Strategy Architect: You don't just "split text." You implement Semantic Chunking and Parent-Child retrieval strategies to maximize LLM context relevance.

• Cold/Warm/Hot Storage Strategy: Managing cost and latency by tiering data between Vector DBs (Hot), SQL/NoSQL (Warm), and S3/Data Lakes (Cold).

• Privacy & Redaction Pipelines: Building automated PII (Personally Identifiable Information) redaction into the ingestion layer to ensure agents never "see" or "leak" sensitive user data.

Additional Information:

Why Join us?

• Opportunity to lead transformative initiatives, modernizing legacy systems and shaping the future of trading technology.

• Work with cutting-edge technologies in a dynamic, fast-paced environment.

• Competitive compensation, professional growth opportunities, and the chance to work with industry-leading experts.

• Interview process- 2 to 3 Technical rounds

• Work mode - Hybrid model working (3 days work from office)

Required Qualification:

Bachelor of Engineering - Bachelor of Technology (B.E./B.Tech.) - IT/CS/ E&CE/ MCA

With a highly advanced Fintech MNC

Apply for this job

Resume/CV*

Click or drag file to this area to upload your Resume

Please make sure to upload a PDF

First Name*

Last Name*

Email*

Phone Number*

The hiring team may use this number to contact you about this job.

What is your current CTC?*

What is your expected CTC (Max budget is 32 LPA, based on current CTC)?*

What is your shortest possible notice period (Max 30 days of notice period acceptable)?*

What is your current location (candidates based in Hyderabad are highly preferred)?*

Who referred you/how did you get to know about this opportunity?*

By clicking 'Submit Application', you agree to receive job application updates from AALUCKS Talent Pro via text and/or WhatsApp. Message frequency may vary. Reply STOP to unsubscribe at any time. Message & data rates may apply.