A public sample generated with AI Thesis Writer. Preview the first pages below.
This thesis systematically catalogs adversarial prompting techniques used to bypass safety filters in commercial LLMs. By constructing a red-teaming test suite of 200+ prompts across 6 attack categories, the study evaluates the robustness of current alignment methods and proposes a defense scoring framework for model developers.