# RAG - LangChain and ATT&CK Groups
---
* Collaborators:
    * Roberto Rodriguez (@Cyb3rWard0g)
* References:
    * https://python.langchain.com/en/latest/modules/indexes/getting_started.html

## Import Modules

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
import openai
import os
from dotenv import load_dotenv
import tqdm as notebook_tqdm

## Define Initial Variables

In [2]:
current_directory = os.path.dirname("__file__")
chroma_db = os.path.join(current_directory, "../source-knowledge/chroma_db")

## Load Vector DB

In [3]:
import chromadb

persistent_client = chromadb.PersistentClient(path=chroma_db)

# Define embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-mpnet-base-v2")

db = Chroma(
    client=persistent_client,
    collection_name="groups_collection",
    embedding_function=embedding_function,
)
db.get()

  from .autonotebook import tqdm as notebook_tqdm


{'ids': ['50b4e4a7-6c5f-11ee-b0d1-6479f0659de9',
  '50b4e4a8-6c5f-11ee-94d5-6479f0659de9',
  '50b4e4a9-6c5f-11ee-87b4-6479f0659de9',
  '50b4e4aa-6c5f-11ee-a51c-6479f0659de9',
  '50b4e4ab-6c5f-11ee-978c-6479f0659de9',
  '50b4e4ac-6c5f-11ee-93b6-6479f0659de9',
  '50b4e4ad-6c5f-11ee-84a2-6479f0659de9',
  '50b4e4ae-6c5f-11ee-8b81-6479f0659de9',
  '50b4e4af-6c5f-11ee-9060-6479f0659de9',
  '50b4e4b0-6c5f-11ee-a63c-6479f0659de9',
  '50b4e4b1-6c5f-11ee-96ae-6479f0659de9',
  '50b4e4b2-6c5f-11ee-91a2-6479f0659de9',
  '50b4e4b3-6c5f-11ee-9dae-6479f0659de9',
  '50b4e4b4-6c5f-11ee-a0c7-6479f0659de9',
  '50b4e4b5-6c5f-11ee-bab4-6479f0659de9',
  '50b4e4b6-6c5f-11ee-9fa5-6479f0659de9',
  '50b4e4b7-6c5f-11ee-aeca-6479f0659de9',
  '50b4e4b8-6c5f-11ee-99b2-6479f0659de9',
  '50b4e4b9-6c5f-11ee-acc7-6479f0659de9',
  '50b4e4ba-6c5f-11ee-9ae1-6479f0659de9',
  '50b4e4bb-6c5f-11ee-95bd-6479f0659de9',
  '50b4e4bc-6c5f-11ee-a00b-6479f0659de9',
  '50b4e4bd-6c5f-11ee-acf4-6479f0659de9',
  '50b4e4be-6c5f-11ee-8a73-

## Query ATT&CK Groups Knowledge Base

### Get OAI Key

In [4]:
# Get your key: https://platform.openai.com/account/api-keys
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

### Define Vector Store Retriever
The retriever interface is a generic interface that makes it easy to combine documents with language models. This interface exposes a get_relevant_documents method which takes in a query (a string) and returns a list of documents.

In [5]:
retriever = db.as_retriever(search_kwargs={"k":5})

### Get Relevant Documents

#### Threat Actors Texting

In [6]:
query = """
What threat actors sent text messages to their targets?
"""

In [7]:
print("[+] Getting relevant documents for query..")
relevant_docs = retriever.get_relevant_documents(query)
relevant_docs

[+] Getting relevant documents for query..


[Document(page_content='Lazarus Group has created new Twitter accounts to conduct social engineering against potential victims.(Citation: Google TAG Lazarus Jan 2021)|\n|mitre-attack|enterprise-attack,ics-attack|Linux,macOS,Windows|T1566.003|Spearphishing via Service|\n\nLazarus Group has used social media platforms, including LinkedIn and Twitter, to send spearphishing messages.(Citation: Google TAG Lazarus Jan 2021)|\n|mitre-attack|enterprise-attack,ics-attack|PRE|T1584.004|Server|\n\nLazarus Group has compromised servers to stage malicious tools.(Citation: Kaspersky ThreatNeedle Feb 2021)|\n|mitre-attack|enterprise-attack,ics-attack|PRE|T1591|Gather Victim Org Information|\n\nLazarus Group has studied publicly available information about a targeted organization to tailor spearphishing efforts against specific departments and/or individuals.(Citation: Kaspersky ThreatNeedle Feb 2021)|\n|mitre-attack|enterprise-attack,ics-attack|PRE|T1585.002|Email Accounts|\n\nLazarus Group has creat

#### Question Answering

In [8]:
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

In [9]:
chain = load_qa_chain(OpenAI(temperature=0), chain_type="stuff")
chain.run(input_documents=relevant_docs, question=query)

'\nNone of the threat actors mentioned in the context sent text messages to their targets.'

#### Prompt Engineering

In [10]:
query = """
What threat actors sent text messages to their targets
over social media accounts?
"""

print("[+] Getting relevant documents for query..")
relevant_docs = retriever.get_relevant_documents(query)
print("[+] Asking LLM..")
chain.run(input_documents=relevant_docs, question=query)

[+] Getting relevant documents for query..
[+] Asking LLM..


' Lazarus Group and CURIUM have both used social media accounts to send malicious files to their targets.'

#### Phishings Techniques used by Threat Actors

In [11]:
query = "What are some phishing techniques used by threat actors?"

In [12]:
print("[+] Getting relevant documents for query..")
relevant_docs = retriever.get_relevant_documents(query)
relevant_docs

[+] Getting relevant documents for query..


[Document(page_content='Lazarus Group has created new Twitter accounts to conduct social engineering against potential victims.(Citation: Google TAG Lazarus Jan 2021)|\n|mitre-attack|enterprise-attack,ics-attack|Linux,macOS,Windows|T1566.003|Spearphishing via Service|\n\nLazarus Group has used social media platforms, including LinkedIn and Twitter, to send spearphishing messages.(Citation: Google TAG Lazarus Jan 2021)|\n|mitre-attack|enterprise-attack,ics-attack|PRE|T1584.004|Server|\n\nLazarus Group has compromised servers to stage malicious tools.(Citation: Kaspersky ThreatNeedle Feb 2021)|\n|mitre-attack|enterprise-attack,ics-attack|PRE|T1591|Gather Victim Org Information|\n\nLazarus Group has studied publicly available information about a targeted organization to tailor spearphishing efforts against specific departments and/or individuals.(Citation: Kaspersky ThreatNeedle Feb 2021)|\n|mitre-attack|enterprise-attack,ics-attack|PRE|T1585.002|Email Accounts|\n\nLazarus Group has creat

#### Question Answering

In [13]:
chain.run(input_documents=relevant_docs, question=query)

' Lazarus Group has used social media platforms, including LinkedIn and Twitter, to send spearphishing messages. HEXANE has targeted executives, human resources staff, and IT personnel for spearphishing. TA505 has used spearphishing emails with malicious attachments to initially compromise victims. FIN6 has used fake job advertisements sent via LinkedIn to spearphish targets.'