0
A J A Y
About Skills Projects Experience Contact Presentations

Hello, I'm

Ajay Parameswaran Sindhu

> |

Data Scientist with 3 years of hands-on experience in modern data architecture, Lakehouse/Databricks, ETL migration, and production-grade deep learning (NLP/CV). From PySpark pipelines to LLM-powered agents — I build systems that deliver measurable impact.

Scroll
DATA • AI • WEB • DESIGN
PySpark Delta Lake Databricks Apache Airflow PyTorch TensorFlow React & TypeScript LangChain & RAG FastAPI Docker & AWS PySpark Delta Lake Databricks Apache Airflow PyTorch TensorFlow React & TypeScript LangChain & RAG FastAPI Docker & AWS

01. About Me

Data Scientist & Engineer based in Potsdam, Germany — building scalable data platforms, production ML systems, and modern web applications.

Experience

3 years at Envestnet Inc. and Cognizant — data pipeline modernization, Lakehouse architecture, anomaly detection, and full-stack development.

Education

M.Sc. Data Science — University of Europe for Applied Sciences, Potsdam.
B.Tech EEE — Govt. Engineering College, Barton Hill, India.

Focus

PySpark & Delta Lake pipelines, PyTorch/TensorFlow for NLP & CV, RAG systems with LangChain, and React/Next.js frontends.

0+ Projects Shipped
0+ Technologies
0+ Years Building

02. Skills & Technologies

Languages & Core

PythonSQLPySpark Bash/ShellTypeScriptPandas

Data Engineering

DatabricksDelta LakeApache Spark AirflowdbtKafka

AI / ML & MLOps

PyTorchTensorFlowscikit-learn XGBoostMLflowONNX Runtime

GenAI & NLP

LangChainOpenAI / GPT-4RAG ChromaDBHugging FaceBERT

Frontend & Backend

ReactTypeScriptFastAPI FlaskPostgreSQLTailwind CSS

DevOps & Cloud

DockerAWS (S3, EKS, RDS)Kubernetes GitLab CI/CDTerraformPrometheus

03. Featured Projects

NLP / Deep Learning

Incident Classification with BERT

Replaced TF-IDF baseline with fine-tuned BERT for mixed stack traces and free text, improving F1 from 0.78 to 0.92. Deployed on AWS EKS with p95 latency under 100ms for live Jira ticket processing.

PyTorchHugging FaceMLflowAWS EKS
Data Engineering

Real-Time Streaming Pipeline

End-to-end streaming pipeline for financial transaction events with Kafka, Structured Streaming, and Delta Lake sink. Exactly-once semantics with under 30s end-to-end latency at 10k+ events/min.

KafkaPySpark StreamingDelta LakeAirflow
Computer Vision

Visual Inspection — ResNet50

Fine-tuned ResNet50 for surface defect classification (scratches, cracks) with 96.5% validation accuracy. Achieved 30 FPS on CPU-limited edge devices via INT8 quantization and ONNX Runtime.

TensorFlowONNX Runtimetf.dataAWS S3
GenAI / RAG

Financial RAG Assistant

RAG system for 5,000+ pages of financial documents with hybrid retrieval (vector + BM25), metadata filtering, and conversation memory. Reduced hallucination rate by 30% on compliance queries.

LangChainGPT-4ChromaDBFastAPI
ML / NLP

Document Classification — BERT

Fine-tuned BERT-Base for financial document classification across 8 categories with Macro-F1 of 0.91. Optimized to 45ms p95 latency via ONNX INT8 quantization for CPU deployment.

PyTorchHugging FaceONNXMLflow
Data Quality

DQ & Lineage Framework

Configurable data quality framework with Great Expectations for automated validation across Bronze/Silver/Gold layers. Lineage tracking with Airflow XComs and dbt manifest parsing visualized in Streamlit.

Great ExpectationsdbtAirflowStreamlit

04. Experience

03/2025 — 02/2026

M.Sc. Data Science (Grade 1.6)

University of Europe for Applied Sciences, Potsdam

Master thesis: Predictive Maintenance on heavy-duty truck sensor data. Built hybrid ensemble (TFT + XGBoost) outperforming RNN baselines by 18% F1 on 1.1M measurements. Derived EUR 450K annual savings potential via asymmetric loss optimization. Used SHAP for interpretable fault driver identification.

04/2023 — 02/2025

Associate Software Engineer

Envestnet Inc.

Refactored legacy stored procedures to PySpark jobs (10M+ rows), accelerating runtimes by 60%. Built Delta Lake pipelines (Bronze/Silver/Gold) for 500k+ daily financial records. Reduced pipeline downtime 75% via Airflow orchestration. Developed LSTM autoencoder anomaly detection reducing server downtime 15%. Deployed inference services with Docker/TF Serving at p99 < 50ms.

07/2022 — 03/2023

Data Engineer (Internship)

Cognizant Technology Solutions

Migrated 10-year-old on-prem Hadoop pipelines to cloud Lakehouse architecture (TB-scale), reducing data availability latency by 40%. Cut cloud compute costs 35% for credit risk scoring pipelines via execution plan optimization. Implemented RBAC and column-level PII masking for zero-trust compliance.

2018 — 2022

B.Tech Electrical & Electronic Engineering

Govt. Engineering College, Barton Hill, India

Capstone: Autonomous vehicle with CNNs and sensor fusion — achieved 30 FPS inference on CPU, reduced steering MAE by 15% vs PilotNet, 95% collision avoidance over 200 test runs. CGPA: 8.0/10.

05. Get In Touch

I'm always open to new opportunities — whether it's data engineering, ML projects, agentic AI systems, or creative frontend work. Drop me a message and let's build something great.

Home About Skills Projects Experience Contact

Hey, it's Ajay

Thank you for visiting. Please enter the access code I shared with you.

That doesn't match. Please try again.