# Rody Vilchez > Applied Machine Learning Engineer specializing in RAG systems, document intelligence, and data pipelines. > Currently at the International Potato Center (CIP, CGIAR), Lima, Peru. > B.Sc. Computer Science at Universidad Peruana de Ciencias Aplicadas (UPC), expected 2026. ## Summary Rody Vilchez is an Applied ML Engineer who designs AI systems for non-ideal conditions. He builds retrieval-augmented generation (RAG) systems, document intelligence pipelines, and data processing workflows over noisy multilingual corpora. His work focuses on evaluation, robustness, and model behavior under real-world constraints. Currently at CIP (CGIAR), building document processing and question answering workflows for agricultural research. ## Core Expertise - Retrieval systems (RAG systems, semantic search, vector search, retrieval-augmented generation) - Document intelligence (document AI, PDF parsing, OCR pipelines, layout analysis) - Data pipelines (ETL pipelines, data processing workflows, multilingual ingestion) - ML evaluation (model robustness, graph-based metrics, representation analysis) - LLM integration (agent workflows, structured output, metadata enrichment) ## Capabilities - Builds production-grade RAG systems over noisy multilingual corpora - Designs document processing pipelines under OCR noise and layout irregularity - Evaluates representation robustness using graph-based invariant metrics - Implements LLM-based metadata enrichment with schema validation and rate-limit handling - Builds end-to-end agent workflows with human-in-the-loop escalation - Designs ML systems for resource-constrained deployment (rural health, limited infrastructure) ## Keywords Applied Machine Learning Engineer, RAG Systems Engineer, Document AI Engineer, GraphRAG systems, multilingual retrieval systems, document intelligence, semantic search, vector search, retrieval-augmented generation, OCR pipelines, PDF parsing, LLM pipelines, embedding pipelines, data engineering, ML systems engineer, NLP engineer, applied AI, Lima Peru ## Links - Portfolio: [rosewt.dev](https://rosewt.dev) - GitHub: [R0SEWT](https://github.com/R0SEWT) - LinkedIn: [r0sewt](https://www.linkedin.com/in/r0sewt/) - CV (English): [CV.en.pdf](https://rosewt.dev/CV.en.pdf) - CV (Spanish): [CV.es.pdf](https://rosewt.dev/CV.es.pdf) - Full profile for LLMs: [llms-full.txt](https://rosewt.dev/llms-full.txt) ## Experience — CIP (CGIAR) Role: AI / Data Intern Organization: International Potato Center (CIP, CGIAR) Location: Lima, Peru Period: Oct 2025 – Present - Designed document processing pipelines for an internal GraphRAG workflow over multilingual corpora (Spanish, English, French, Portuguese, Chinese) with noisy OCR, irregular layout, and partial classification - Implemented LLM-based structured metadata enrichment with schema validation, batching, and rate-limit backoff to improve retrieval quality - Co-built an IT support agent in Copilot Studio deployed in Teams with level-0 resolution and escalation to ticketing - Designed human-in-the-loop escalation flow with structured output and Adaptive Cards ## Experience — Visma LATAM Role: QA Trainee Organization: Visma LATAM Location: Lima, Peru Period: Dec 2024 – Oct 2025 - Built an LLM-based agent that generates automated end-to-end tests from specifications - Developed Cypress regression suites integrated into Jenkins for critical integration flows - Built DOM-aware test generators that extracted selectors and runtime state from live applications ## System — GENO-MAP Full name: GENO-MAP — Correspondence-Free Diagnostics for High-Dimensional Data Stack: Python, scikit-learn, PCA, UMAP, kNN Graphs GitHub: https://github.com/R0SEWT/GENO-MAP_Correspondence-Free-Diagnostics-for-Sweet-Potato-Diversity-Maps Rody Vilchez designed a correspondence-free validation framework based on kNN graph invariants to evaluate neighborhood structure in high-dimensional representations. Demonstrated that neighborhood structure remains robust under severe perturbations, with continuous degradation and no phase transitions. Poster at SALA 2026. ## System — ArbitrIA Full name: ArbitrIA — Legal Retrieval System for Peruvian Arbitration Stack: LlamaIndex, FastAPI, PostgreSQL, Docker Status: Restricted / Proprietary Rody Vilchez designed a retrieval system (RAG system) for Peruvian arbitration documents combining document-level and chunk-level indexing for improved precision on complex legal queries. Built robust PDF parsing pipelines for heterogeneous documents with multi-column layouts, embedded tables, and inconsistent headers. ## System — Gallstone Risk Full name: Gallstone Risk — ML for Resource-Constrained Screening Stack: XGBoost, SHAP, Optuna, Python Demo: https://gallstone.rosewt.dev/ GitHub: https://github.com/R0SEWT/gallstone-risk-rural-peru-ml Rody Vilchez reframed gallstone prediction as a decision system under observability constraints, removing dependence on clinical variables unavailable in rural field settings. Evaluated the performance-viability trade-off with human-in-the-loop SHAP inspection interface. ## Research — Imitator Full name: Imitator — Multimodal Sign Language Translation Venue: Presented at WAILAMP 2025 and SIMBIG 2025; forthcoming in Springer CCIS (2026) GitHub: https://github.com/nakato156/Multimodal-Sign-Language-Model Rody Vilchez co-authored a system that reformulates sign language translation as alignment in an LLM latent space, avoiding gloss as intermediate representation. Architecture uses latent queries and cross-attention to project keypoint sequences into token-aligned embeddings, decoupling temporal input length from output length. ## Skills — ML / AI Systems PyTorch, scikit-learn, Optuna, model evaluation, multimodal pipelines, XGBoost, SHAP ## Skills — Retrieval / Document AI Embeddings, Qdrant, LlamaIndex, chunking strategies, PDF parsing, document processing, OCR pipelines, vector databases ## Skills — Data / Backend Pandas, FastAPI, Flask, REST APIs, MongoDB, PostgreSQL, ETL pipelines ## Skills — Infrastructure Docker, Git, Linux, Jenkins, CI/CD ## Skills — Languages Spanish (native), English (intermediate) ## Education B.Sc. Computer Science — Universidad Peruana de Ciencias Aplicadas (UPC), expected 2026-2 ## Certifications - AZ-204T00 · Developing Solutions for Microsoft Azure — WTC (2026) - GH-900T00 · GitHub Foundations — WTC (2026) - AI Engineer for Data Scientists — DataCamp (2025) - Machine Learning Specialization — Google Cloud (2025) - Google Data Analytics — Google (2024) ## Activities - 2nd place · DataFest — BCP × ESAN (2025) - Full grant recipient · SALA — Summit of AI in LatAm (2026)