Análisis de Datos con Pandas - Correlacionando Datos¶
Autor: Jose Rodriguez (@Cyb3rPandah)
Proyecto: Infosec Jupyter Book
Organización Pública: Open Threat Research
Licencia: Creative Commons Attribution-ShareAlike 4.0 International
Referencia: https://mordordatasets.com/notebooks/small/windows/08_lateral_movement/SDWIN-190518210652.html
Representando el Archivo JSON como Dataframe usando Pandas¶
La representación del archivo JSON como un Dataframe de Pandas puede involucrar el uso de comandos como wget, unzip. Esto fue explicado con más detalle en el notebook que lleva por título Representando el Archivo JSON como Dataframe usando Pandas, y además se encuentra en la misma carpeta del presente notebook. Para mantener la estructura del presente notebok en un formato simple, los JSON files requeridos para este workshop ya han sido desargados y descomprimidos. Estos archivos se encuentran en la carpeta sets_datos.
psexec_json = 'sets_datos/empire_psexec_dcerpc_tcp_svcctl_2020-09-20121608.json'
a) Importando la librería Pandas¶
import pandas as pd
b) Leyendo Archivo JSON¶
Usaremos el método pandas.read_json.
Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html
df = pd.read_json(path_or_buf = psexec_json, lines = True)
df.head(5)
Keywords | SeverityValue | SourceImage | EventID | ProviderGuid | ExecutionProcessID | Channel | host | AccountType | UserID | ... | MessageNumber | ScriptBlockText | MessageTotal | ScriptBlockId | NewSd | OldSd | MiniportNameLen | MiniportName | param4 | param3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -9223372036854775808 | 2 | C:\windows\system32\svchost.exe | 10 | {5770385F-C22A-43E0-BF4C-06F5698FFBD9} | 9848 | Microsoft-Windows-Sysmon/Operational | wec.internal.cloudapp.net | User | S-1-5-18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | -9223372036854775808 | 2 | C:\windows\system32\svchost.exe | 10 | {5770385F-C22A-43E0-BF4C-06F5698FFBD9} | 9848 | Microsoft-Windows-Sysmon/Operational | wec.internal.cloudapp.net | User | S-1-5-18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | -9223372036854775808 | 2 | C:\windows\system32\svchost.exe | 10 | {5770385F-C22A-43E0-BF4C-06F5698FFBD9} | 9848 | Microsoft-Windows-Sysmon/Operational | wec.internal.cloudapp.net | User | S-1-5-18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | -9223372036854775808 | 2 | C:\windows\system32\svchost.exe | 10 | {5770385F-C22A-43E0-BF4C-06F5698FFBD9} | 9848 | Microsoft-Windows-Sysmon/Operational | wec.internal.cloudapp.net | User | S-1-5-18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | -9214364837600034816 | 2 | NaN | 5158 | {54849625-5478-4994-A5BA-3E3B0328C30D} | 4 | Security | wec.internal.cloudapp.net | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 186 columns
c) Conociendo las columnas o atributos del Dataframe¶
Usaremos el método pandas.DataFrame.info.
Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html
df.info(verbose = True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4348 entries, 0 to 4347
Data columns (total 186 columns):
# Column Dtype
--- ------ -----
0 Keywords int64
1 SeverityValue int64
2 SourceImage object
3 EventID int64
4 ProviderGuid object
5 ExecutionProcessID int64
6 Channel object
7 host object
8 AccountType object
9 UserID object
10 SourceProcessGUID object
11 ThreadID int64
12 TargetImage object
13 GrantedAccess object
14 EventType object
15 Opcode object
16 EventTime object
17 EventReceivedTime object
18 @timestamp object
19 SourceModuleType object
20 port int64
21 AccountName object
22 RecordNumber int64
23 SourceProcessId object
24 SourceThreadId float64
25 Task int64
26 Domain object
27 @version int64
28 OpcodeValue float64
29 SourceModuleName object
30 TargetProcessGUID object
31 Severity object
32 SourceName object
33 Version float64
34 TargetProcessId object
35 Category object
36 CallTrace object
37 UtcTime object
38 Hostname object
39 RuleName object
40 tags object
41 Application object
42 ProcessId object
43 Message object
44 FilterRTID float64
45 LayerRTID float64
46 Protocol object
47 SourcePort float64
48 LayerName object
49 SourceAddress object
50 RemoteUserID object
51 Direction object
52 DestPort float64
53 DestAddress object
54 RemoteMachineID object
55 ProcessGuid object
56 Image object
57 SubjectDomainName object
58 SubjectUserSid object
59 SubjectLogonId object
60 ProcessName object
61 SubjectUserName object
62 Status object
63 ActivityID object
64 Payload object
65 ERROR_EVT_UNRESOLVED float64
66 ContextInfo object
67 TargetObject object
68 EventTypeOrignal object
69 Details object
70 PrivilegeList object
71 TargetLogonId object
72 LogonType float64
73 VirtualAccount object
74 LogonGuid object
75 AuthenticationPackageName object
76 IpAddress object
77 TransmittedServices object
78 LmPackageName object
79 ImpersonationLevel object
80 ElevatedToken object
81 WorkstationName object
82 TargetOutboundUserName object
83 TargetOutboundDomainName object
84 LogonProcessName object
85 KeyLength float64
86 TargetLinkedLogonId object
87 RestrictedAdminMode object
88 TargetUserName object
89 IpPort object
90 TargetUserSid object
91 TargetDomainName object
92 EventIdx float64
93 GroupMembership object
94 EventCountTotal float64
95 SourceHandleId object
96 TargetHandleId object
97 ObjectServer object
98 HandleId object
99 TransactionId object
100 AccessMask object
101 ObjectName object
102 ObjectType object
103 AccessReason object
104 AccessList object
105 RestrictedSidCount float64
106 ResourceAttributes object
107 Path object
108 TaskName object
109 Priority float64
110 MandatoryLabel object
111 ParentProcessName object
112 CommandLine object
113 NewProcessName object
114 TokenElevationType object
115 NewProcessId object
116 ParentImage object
117 User object
118 Hashes object
119 CurrentDirectory object
120 Description object
121 Company object
122 FileVersion object
123 IntegrityLevel object
124 TerminalSessionId float64
125 ParentProcessGuid object
126 ParentCommandLine object
127 ParentProcessId float64
128 LogonId object
129 Product object
130 OriginalFileName object
131 ImageLoaded object
132 Signed object
133 SignatureStatus object
134 Signature object
135 TargetFilename object
136 CreationUtcTime object
137 Service object
138 EnabledPrivilegeList object
139 DisabledPrivilegeList object
140 ShareName object
141 ShareLocalPath object
142 RelativeTargetName object
143 IsExecutable object
144 Archived object
145 Device object
146 ServiceName object
147 ServiceAccount object
148 ServiceType object
149 ServiceStartType float64
150 ServiceFileName object
151 TicketEncryptionType object
152 ServiceSid object
153 TicketOptions object
154 ImagePath object
155 StartType object
156 param1 object
157 param2 object
158 SourcePortName object
159 DestinationPort float64
160 SourceHostname object
161 DestinationIp object
162 SourceIp object
163 DestinationIsIpv6 object
164 Initiated object
165 SourceIsIpv6 object
166 DestinationPortName object
167 DestinationHostname object
168 QueryResults object
169 QueryName object
170 QueryStatus float64
171 AdditionalInfo2 object
172 Properties object
173 OperationType object
174 AdditionalInfo object
175 PipeName object
176 MessageNumber float64
177 ScriptBlockText object
178 MessageTotal float64
179 ScriptBlockId object
180 NewSd object
181 OldSd object
182 MiniportNameLen float64
183 MiniportName object
184 param4 object
185 param3 object
dtypes: float64(22), int64(9), object(155)
memory usage: 6.2+ MB
Filtrando Columnas o Atributos de nuestro Dataframe¶
Seleccionando las columnas ‘@timestamp’,’Hostname’,’Channel’,’EventID’ usando una lista con los nombres de las columnas.
df[['@timestamp','Hostname','Channel','ParentImage','Image','EventID']].head()
@timestamp | Hostname | Channel | ParentImage | Image | EventID | |
---|---|---|---|---|---|---|
0 | 2020-09-20T16:16:09.362Z | WORKSTATION5.theshire.local | Microsoft-Windows-Sysmon/Operational | NaN | NaN | 10 |
1 | 2020-09-20T16:16:09.363Z | WORKSTATION5.theshire.local | Microsoft-Windows-Sysmon/Operational | NaN | NaN | 10 |
2 | 2020-09-20T16:16:09.363Z | WORKSTATION5.theshire.local | Microsoft-Windows-Sysmon/Operational | NaN | NaN | 10 |
3 | 2020-09-20T16:16:09.364Z | WORKSTATION5.theshire.local | Microsoft-Windows-Sysmon/Operational | NaN | NaN | 10 |
4 | 2020-09-20T16:16:09.365Z | MORDORDC.theshire.local | Security | NaN | NaN | 5158 |
Preparando Dataframes para un JOIN¶
a) QUE EVENTO NOS PODRIA AYUDAR A VER NUEVOS SERVICIOS INSTALLADOS EN UN SISTEMA?¶
b) QUE EVENTO NOS PODRIA AYUDA A IDENTIFICAR USUARIOS AUTHENTICATING REMOTAMENTE? HINT: LOGON TYPE 3 ;)¶
JOINing Nuevos Servicios Installados con usuarios authenticating remotemente¶
COMO PODEMOS UNIR ESTOS DOS EVENTOS?¶
(
pd.merge(Security4697, Security4624[Security4624['LogonType'] == 3],
left_on = 'SubjectLogonId', right_on = 'TargetLogonId', how = 'inner')
[['ServiceName', 'ServiceFileName','IpAddress']]
)
ServiceName | ServiceFileName | IpAddress | |
---|---|---|---|
0 | Updater | %COMSPEC% /C start /b C:\Windows\System32\Wind... | 172.18.39.5 |