Análisis de Datos con Pandas - Transformando Datos¶
Autor: Jose Rodriguez (@Cyb3rPandah)
Proyecto: Infosec Jupyter Book
Organización Pública: Open Threat Research
Licencia: Creative Commons Attribution-ShareAlike 4.0 International
Referencia: https://mordordatasets.com/notebooks/small/windows/05_defense_evasion/SDWIN-190403133337.html
Representando el Archivo JSON como Dataframe usando Pandas¶
La representación del archivo JSON como un Dataframe de Pandas puede involucrar el uso de comandos como wget, unzip. Esto fue explicado con más detalle en el notebook que lleva por título Representando el Archivo JSON como Dataframe usando Pandas, y además se encuentra en la misma carpeta del presente notebook. Para mantener la estructura del presente notebok en un formato simple, los JSON files requeridos para este workshop ya han sido desargados y descomprimidos. Estos archivos se encuentran en la carpeta sets_datos.
! wget https://raw.githubusercontent.com/OTRF/mordor/master/datasets/small/windows/lateral_movement/host/empire_shell_dcerpc_smb_service_dll_hijack.zip -O sets_datos/empire_shell_dcerpc_smb_service_dll_hijack.zip
! unzip -o sets_datos/empire_shell_dcerpc_smb_service_dll_hijack.zip -d sets_datos/
dllhijack_json = 'sets_datos/empire_shell_dcerpc_smb_service_dll_hijack_2020-09-21232839.json'
a) Importando la librería Pandas¶
import pandas as pd
b) Leyendo Archivo JSON¶
Usaremos el método pandas.read_json.
Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html
df = pd.read_json(path_or_buf = dllhijack_json, lines = True)
df.head()
Keywords | SeverityValue | TargetObject | EventTypeOrignal | EventID | ProviderGuid | ExecutionProcessID | host | Channel | UserID | ... | KeyType | ClientProcessId | AlgorithmName | ReturnCode | KeyName | KeyFilePath | MiniportNameLen | MiniportName | param4 | param3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -9223372036854775808 | 2 | HKU\.DEFAULT\Software\Microsoft\Office\16.0\Co... | INFO | 13 | {5770385F-C22A-43E0-BF4C-06F5698FFBD9} | 3172 | wec.internal.cloudapp.net | Microsoft-Windows-Sysmon/Operational | S-1-5-18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | -9223372036854775808 | 2 | NaN | NaN | 10 | {5770385F-C22A-43E0-BF4C-06F5698FFBD9} | 3392 | wec.internal.cloudapp.net | Microsoft-Windows-Sysmon/Operational | S-1-5-18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | -9223372036854775808 | 2 | NaN | NaN | 10 | {5770385F-C22A-43E0-BF4C-06F5698FFBD9} | 3392 | wec.internal.cloudapp.net | Microsoft-Windows-Sysmon/Operational | S-1-5-18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | -9214364837600034816 | 2 | NaN | NaN | 5158 | {54849625-5478-4994-A5BA-3E3B0328C30D} | 4 | wec.internal.cloudapp.net | security | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | -9214364837600034816 | 2 | NaN | NaN | 5156 | {54849625-5478-4994-A5BA-3E3B0328C30D} | 4 | wec.internal.cloudapp.net | security | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 206 columns
c) Conociendo las columnas o atributos del Dataframe¶
Usaremos el método pandas.DataFrame.info.
Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html
df.info(verbose = True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6349 entries, 0 to 6348
Data columns (total 206 columns):
# Column Dtype
--- ------ -----
0 Keywords int64
1 SeverityValue int64
2 TargetObject object
3 EventTypeOrignal object
4 EventID int64
5 ProviderGuid object
6 ExecutionProcessID int64
7 host object
8 Channel object
9 UserID object
10 AccountType object
11 ThreadID int64
12 ProcessGuid object
13 Details object
14 EventReceivedTime object
15 Opcode object
16 EventTime object
17 @timestamp object
18 SourceModuleType object
19 port int64
20 AccountName object
21 RecordNumber int64
22 Task int64
23 Domain object
24 @version int64
25 OpcodeValue float64
26 SourceModuleName object
27 Severity object
28 SourceName object
29 Version float64
30 Image object
31 Category object
32 UtcTime object
33 Hostname object
34 RuleName object
35 tags object
36 SourceImage object
37 SourceProcessGUID object
38 TargetImage object
39 GrantedAccess object
40 EventType object
41 SourceProcessId object
42 SourceThreadId float64
43 TargetProcessGUID object
44 TargetProcessId object
45 CallTrace object
46 Application object
47 ProcessId object
48 Message object
49 FilterRTID float64
50 LayerRTID float64
51 Protocol object
52 SourcePort float64
53 LayerName object
54 SourceAddress object
55 RemoteUserID object
56 Direction object
57 DestPort float64
58 DestAddress object
59 RemoteMachineID object
60 ActivityID object
61 Payload object
62 ERROR_EVT_UNRESOLVED float64
63 ContextInfo object
64 ImageLoaded object
65 Signed object
66 SignatureStatus object
67 Hashes object
68 Description object
69 Company object
70 FileVersion object
71 Signature object
72 Product object
73 OriginalFileName object
74 SubjectDomainName object
75 SubjectUserSid object
76 SubjectLogonId object
77 TaskContentNew object
78 SubjectUserName object
79 TaskName object
80 ProcessName object
81 Status object
82 RuleAttr object
83 RuleId object
84 ChangeType object
85 FilterKey object
86 FilterType object
87 FilterName object
88 Weight float64
89 UserName object
90 LayerId float64
91 Action object
92 CalloutKey object
93 CalloutName object
94 FilterId float64
95 UserSid object
96 ProviderName object
97 LayerKey object
98 ProviderKey object
99 Conditions object
100 PrivilegeList object
101 TargetLogonId object
102 LogonType float64
103 VirtualAccount object
104 LogonGuid object
105 AuthenticationPackageName object
106 IpAddress object
107 TransmittedServices object
108 LmPackageName object
109 ImpersonationLevel object
110 ElevatedToken object
111 WorkstationName object
112 TargetOutboundUserName object
113 TargetOutboundDomainName object
114 LogonProcessName object
115 KeyLength float64
116 TargetLinkedLogonId object
117 RestrictedAdminMode object
118 TargetUserName object
119 IpPort object
120 TargetUserSid object
121 TargetDomainName object
122 EventIdx float64
123 GroupMembership object
124 EventCountTotal float64
125 TargetFilename object
126 CreationUtcTime object
127 SourceHandleId object
128 TargetHandleId object
129 ObjectServer object
130 HandleId object
131 TransactionId object
132 AccessMask object
133 ObjectName object
134 ObjectType object
135 AccessReason object
136 AccessList object
137 RestrictedSidCount float64
138 ResourceAttributes object
139 EnabledPrivilegeList object
140 DisabledPrivilegeList object
141 ShareName object
142 ShareLocalPath object
143 RelativeTargetName object
144 SourcePortName object
145 DestinationPort float64
146 User object
147 SourceHostname object
148 DestinationIp object
149 SourceIp object
150 DestinationIsIpv6 object
151 Initiated object
152 SourceIsIpv6 object
153 DestinationPortName object
154 DestinationHostname object
155 ParentImage object
156 CommandLine object
157 CurrentDirectory object
158 IntegrityLevel object
159 TerminalSessionId float64
160 ParentProcessGuid object
161 ParentCommandLine object
162 ParentProcessId float64
163 LogonId object
164 Device object
165 NewSd object
166 OldSd object
167 MandatoryLabel object
168 ParentProcessName object
169 NewProcessName object
170 TokenElevationType object
171 NewProcessId object
172 PipeName object
173 Properties object
174 OperationType object
175 AdditionalInfo object
176 Path object
177 Priority float64
178 Service object
179 ServiceName object
180 TicketEncryptionType object
181 ServiceSid object
182 TicketOptions object
183 QueryResults object
184 QueryName object
185 QueryStatus float64
186 IsExecutable object
187 Archived object
188 param1 object
189 param2 object
190 MessageNumber float64
191 ScriptBlockText object
192 MessageTotal float64
193 ScriptBlockId object
194 Operation object
195 ClientCreationTime object
196 KeyType object
197 ClientProcessId float64
198 AlgorithmName object
199 ReturnCode object
200 KeyName object
201 KeyFilePath object
202 MiniportNameLen float64
203 MiniportName object
204 param4 object
205 param3 object
dtypes: float64(25), int64(9), object(172)
memory usage: 10.0+ MB
Filtrando Eventos de Seguridad: Sysmon 1 (Creacion de Processo)¶
Vamos a usar el mismo codigo del notebok anterior, pero con una pequeña modificación. En vez de buscar el nombre del channel completo, buscaremos la palabra sysmon.
Referencias:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.startswith.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.endswith.html
(
df[['@timestamp','Image','CommandLine']]
[(df['EventID'] == 1) & (df['Channel'].str.contains('sysmon',case = False, na = False, regex = False)) ]
.head(5)
)
@timestamp | Image | CommandLine | |
---|---|---|---|
661 | 2020-09-22T03:29:33.845Z | C:\Windows\System32\svchost.exe | C:\windows\system32\svchost.exe -k appmodel -p... |
1034 | 2020-09-22T03:30:11.221Z | C:\Program Files (x86)\Microsoft Office\root\O... | "C:\Program Files (x86)\Microsoft Office\root\... |
1181 | 2020-09-22T03:30:11.292Z | C:\Program Files (x86)\Microsoft Office\root\O... | "C:\Program Files (x86)\Microsoft Office\Root\... |
5019 | 2020-09-22T03:31:11.219Z | C:\Windows\System32\sc.exe | "C:\windows\system32\sc.exe" \\WORKSTATION6 st... |
5315 | 2020-09-22T03:31:41.475Z | C:\Windows\System32\sc.exe | "C:\windows\system32\sc.exe" \\WORKSTATION6 qu... |
Calculando la Longitud del CommandLine¶
Usaremos el método assign para agregar una columna nueva a nuestro dataframe. Esta nueva columna mostrará el calculo de la longitud del command line que el processo utilizó.
Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html
(
df[['@timestamp','Image','CommandLine']]
[(df['EventID'] == 1) & (df['Channel'].str.contains('sysmon',case = False, na = False, regex = False))]
.assign(Command_Length = df['CommandLine'].str.len())
)
@timestamp | Image | CommandLine | Command_Length | |
---|---|---|---|---|
661 | 2020-09-22T03:29:33.845Z | C:\Windows\System32\svchost.exe | C:\windows\system32\svchost.exe -k appmodel -p... | 56.0 |
1034 | 2020-09-22T03:30:11.221Z | C:\Program Files (x86)\Microsoft Office\root\O... | "C:\Program Files (x86)\Microsoft Office\root\... | 69.0 |
1181 | 2020-09-22T03:30:11.292Z | C:\Program Files (x86)\Microsoft Office\root\O... | "C:\Program Files (x86)\Microsoft Office\Root\... | 80.0 |
5019 | 2020-09-22T03:31:11.219Z | C:\Windows\System32\sc.exe | "C:\windows\system32\sc.exe" \\WORKSTATION6 st... | 55.0 |
5315 | 2020-09-22T03:31:41.475Z | C:\Windows\System32\sc.exe | "C:\windows\system32\sc.exe" \\WORKSTATION6 qu... | 56.0 |
5552 | 2020-09-22T03:32:02.675Z | C:\Windows\System32\svchost.exe | C:\windows\system32\svchost.exe -k netsvcs -p ... | 55.0 |
5673 | 2020-09-22T03:32:02.741Z | C:\Windows\System32\sc.exe | "C:\windows\system32\sc.exe" \\WORKSTATION6 st... | 56.0 |