Creando un Dataframe desde un Set de Datos de Mordor¶
Autor: Jose Rodriguez (@Cyb3rPandah)
Proyecto: Infosec Jupyter Book
Organización Pública: Open Threat Research
Licencia: Creative Commons Attribution-ShareAlike 4.0 International
Referencia: https://mordordatasets.com/notebooks/small/windows/02_execution/SDWIN-200806115603.html
Obteniendo el archivo JSON¶
a) Descargando el archivo Zip¶
Usaremos el comando wget y la opción -O (output document file) para guardar el archivo Zip en la carpeta sets_datos.
! wget https://raw.githubusercontent.com/OTRF/mordor/master/datasets/small/windows/lateral_movement/host/covenant_psremoting_command.zip -O sets_datos/covenant_psremoting_command.zip
--2020-09-26 10:43:39-- https://raw.githubusercontent.com/OTRF/mordor/master/datasets/small/windows/lateral_movement/host/covenant_psremoting_command.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.200.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.200.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 359550 (351K) [application/zip]
Saving to: ‘sets_datos/covenant_psremoting_command.zip’
sets_datos/covenant 100%[===================>] 351.12K --.-KB/s in 0.1s
2020-09-26 10:43:40 (2.32 MB/s) - ‘sets_datos/covenant_psremoting_command.zip’ saved [359550/359550]
b) Extrayendo el archivo JSON¶
Usaremos el comando unzip y las opciones -o (Overwrite) y -d (different directory) para guardar el archivo JSON en la carpeta sets_datos.
! unzip -o sets_datos/covenant_psremoting_command.zip -d sets_datos/
Archive: sets_datos/covenant_psremoting_command.zip
inflating: sets_datos/covenant_psremoting_command_2020-08-06115603.json
Para facilitar nuestro código, almacenaremos el directorio del archivo JSON en una variable.
psremoting_json = 'sets_datos/covenant_psremoting_command_2020-08-06115603.json'
Representando el Archivo JSON como Dataframe usando Pandas¶
a) Importando la librería Pandas¶
import pandas as pd
b) Leyendo el archivo JSON¶
Usaremos el método pandas.read_json.
Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html
df = pd.read_json(path_or_buf = psremoting_json, lines = True)
Validaremos el tipo de objecto creado usando la funcion type, la cual es proveida por Python (Built-In).
Referencia: https://docs.python.org/3/library/functions.html
type(df)
pandas.core.frame.DataFrame
Finalmente, podemos visualizar el dataframe generado:
df
tags | @version | EventType | Version | ThreadID | EventTime | Task | AccountType | PipeName | Channel | ... | ContextInfo | Payload | AdditionalInfo | AdditionalInfo2 | OperationType | Properties | MiniportNameLen | MiniportName | param3 | param4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | [mordorDataset] | 1 | ConnectPipe | 1.0 | 4208 | 2020-08-06 11:56:04 | 18 | User | \lsass | Microsoft-Windows-Sysmon/Operational | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | [mordorDataset] | 1 | INFO | 3.0 | 4208 | 2020-08-06 11:56:05 | 10 | User | NaN | Microsoft-Windows-Sysmon/Operational | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | [mordorDataset] | 1 | INFO | 3.0 | 4208 | 2020-08-06 11:56:05 | 10 | User | NaN | Microsoft-Windows-Sysmon/Operational | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | [mordorDataset] | 1 | AUDIT_SUCCESS | 1.0 | 1808 | 2020-08-06 11:56:06 | 12810 | NaN | NaN | Security | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | [mordorDataset] | 1 | AUDIT_SUCCESS | 1.0 | 1808 | 2020-08-06 11:56:06 | 12810 | NaN | NaN | Security | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4279 | [mordorDataset] | 1 | AUDIT_SUCCESS | 0.0 | 6244 | 2020-08-06 11:57:36 | 12810 | NaN | NaN | Security | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4280 | [mordorDataset] | 1 | AUDIT_SUCCESS | 0.0 | 6244 | 2020-08-06 11:57:36 | 12810 | NaN | NaN | Security | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4281 | [mordorDataset] | 1 | AUDIT_SUCCESS | 0.0 | 6244 | 2020-08-06 11:57:36 | 12810 | NaN | NaN | Security | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4282 | [mordorDataset] | 1 | AUDIT_SUCCESS | 0.0 | 6244 | 2020-08-06 11:57:36 | 12810 | NaN | NaN | Security | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4283 | [mordorDataset] | 1 | INFO | 5.0 | 4372 | 2020-08-06 11:57:36 | 3 | User | NaN | Microsoft-Windows-Sysmon/Operational | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4284 rows × 180 columns
c) Conociendo las columnas o atributos del Dataframe¶
Usaremos el método pandas.DataFrame.info.
Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html
df.info(verbose = True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4284 entries, 0 to 4283
Data columns (total 180 columns):
# Column Dtype
--- ------ -----
0 tags object
1 @version int64
2 EventType object
3 Version float64
4 ThreadID int64
5 EventTime object
6 Task int64
7 AccountType object
8 PipeName object
9 Channel object
10 EventTypeOrignal object
11 SourceName object
12 Hostname object
13 @timestamp object
14 OpcodeValue float64
15 Message object
16 SourceModuleName object
17 ProcessId object
18 EventReceivedTime object
19 port int64
20 AccountName object
21 UtcTime object
22 Domain object
23 ExecutionProcessID int64
24 host object
25 Severity object
26 SeverityValue int64
27 EventID int64
28 UserID object
29 ProviderGuid object
30 RecordNumber int64
31 Image object
32 Keywords int64
33 ProcessGuid object
34 SourceModuleType object
35 RuleName object
36 TargetProcessGUID object
37 SourceProcessGUID object
38 CallTrace object
39 SourceImage object
40 GrantedAccess object
41 SourceProcessId object
42 SourceThreadId float64
43 TargetProcessId object
44 TargetImage object
45 DestAddress object
46 SourcePort float64
47 Category object
48 Opcode object
49 Application object
50 Direction object
51 RemoteMachineID object
52 DestPort float64
53 LayerRTID float64
54 LayerName object
55 SourceAddress object
56 FilterRTID float64
57 Protocol object
58 RemoteUserID object
59 TargetObject object
60 SubjectLogonId object
61 PrivilegeList object
62 SubjectDomainName object
63 SubjectUserSid object
64 SubjectUserName object
65 TargetDomainName object
66 ElevatedToken object
67 TargetOutboundDomainName object
68 IpPort object
69 ImpersonationLevel object
70 ProcessName object
71 KeyLength float64
72 TargetOutboundUserName object
73 RestrictedAdminMode object
74 LogonProcessName object
75 LogonGuid object
76 WorkstationName object
77 TargetLinkedLogonId object
78 TargetLogonId object
79 TargetUserSid object
80 TransmittedServices object
81 IpAddress object
82 LmPackageName object
83 TargetUserName object
84 AuthenticationPackageName object
85 VirtualAccount object
86 LogonType float64
87 GroupMembership object
88 EventIdx float64
89 EventCountTotal float64
90 DestinationPortName object
91 DestinationIp object
92 SourceIp object
93 Initiated object
94 DestinationHostname object
95 SourceHostname object
96 DestinationPort float64
97 SourceIsIpv6 object
98 User object
99 DestinationIsIpv6 object
100 SourcePortName object
101 ActivityID object
102 ObjectType object
103 AccessMask object
104 AccessList object
105 ShareName object
106 Device object
107 EnabledPrivilegeList object
108 DisabledPrivilegeList object
109 ShareLocalPath object
110 AccessReason object
111 RelativeTargetName object
112 Details object
113 OriginalFileName object
114 ParentCommandLine object
115 ParentProcessId float64
116 FileVersion object
117 Company object
118 CurrentDirectory object
119 IntegrityLevel object
120 CommandLine object
121 ParentImage object
122 LogonId object
123 Hashes object
124 Description object
125 TerminalSessionId float64
126 ParentProcessGuid object
127 Product object
128 Signed object
129 SignatureStatus object
130 Signature object
131 ImageLoaded object
132 ObjectName object
133 ResourceAttributes object
134 TransactionId object
135 RestrictedSidCount float64
136 ObjectServer object
137 HandleId object
138 SourceHandleId object
139 TargetHandleId object
140 NewProcessName object
141 NewProcessId object
142 TokenElevationType object
143 MandatoryLabel object
144 ParentProcessName object
145 Service object
146 NewSd object
147 OldSd object
148 TargetFilename object
149 CreationUtcTime object
150 Path object
151 Priority float64
152 TaskName object
153 Status object
154 IsExecutable object
155 Archived object
156 QueryResults object
157 QueryName object
158 QueryStatus float64
159 TargetServerName object
160 TargetLogonGuid object
161 TargetInfo object
162 TicketEncryptionType object
163 ServiceName object
164 ServiceSid object
165 PreAuthType float64
166 TargetSid object
167 TicketOptions object
168 param1 object
169 param2 object
170 ContextInfo object
171 Payload object
172 AdditionalInfo object
173 AdditionalInfo2 object
174 OperationType object
175 Properties object
176 MiniportNameLen float64
177 MiniportName object
178 param3 object
179 param4 object
dtypes: float64(19), int64(9), object(152)
memory usage: 5.9+ MB