Large language models (LLMs) can identify microplastics in environmental samples by analyzing infrared (IR) spectra – a task typically reliant on manual interpretation – research finds. The study, led by Zijiang Yang and Hisayuki Arakawa at the Tokyo University of Marine Science and Technology, is the first to apply natural language-based AI models to microplastic identification.
IR spectroscopy, particularly Fourier transform infrared (FTIR) spectroscopy using the attenuated total reflection (ATR) method, is widely used to classify polymers based on spectral peaks. However, the interpretation process remains labor-intensive and prone to subjective judgment. The team proposed a novel workflow that reformats spectral data into natural language prompts for LLMs, allowing AI models to classify polymer types directly.
Three LLMs were evaluated: DeepSeek-R1-Distill-Llama-8B (a locally run reasoning model), GPT-4o, and GPT-4o-mini. The models were prompted with normalized peak intensity data from IR spectra of 435 microplastic particles collected from Japanese coastal waters and compared against reference library spectra of six common polymers: PE, PP, PET, PS, PA, and PVC.
Performance varied markedly. DeepSeek-R1-Distill-Llama-8B achieved the highest accuracy (>0.93 across all polymer types) when optimized with a higher peak threshold (ys = 0.3), allowing it to filter out noise and focus on key spectral features. GPT-4o performed comparably well (≥0.86 accuracy), offering a viable cloud-based alternative when local execution isn’t possible. GPT-4o-mini, in contrast, showed significantly lower accuracy and was not recommended.
The study also highlighted behavioral differences in model reasoning. DeepSeek-R1-Distill-Llama-8B followed a step-by-step elimination strategy, closely mimicking expert logic. GPT-4o showed greater response variability, sometimes skipping detailed comparisons. Errors across all models often stemmed from over-reliance on isolated peak matches or misinterpretation of spectral noise.
However, the authors caution against overreliance on LLMs. “Applying LLMs to microplastic identification remains relatively complex at the current stage,” they note, adding that improvements in prompt design and model tuning are needed. They also suggest integrating IR analysis with complementary methods, such as NMR or pyrolysis-GC–MS, for enhanced accuracy.
The authors have made their workflow publicly available via GitHub.