Creando un Dataframe desde un Set de Datos de Mordor

Obteniendo el archivo JSON

a) Descargando el archivo Zip

Usaremos el comando wget y la opción -O (output document file) para guardar el archivo Zip en la carpeta sets_datos.

! wget https://raw.githubusercontent.com/OTRF/mordor/master/datasets/small/windows/lateral_movement/host/covenant_psremoting_command.zip -O sets_datos/covenant_psremoting_command.zip
--2020-09-26 10:43:39--  https://raw.githubusercontent.com/OTRF/mordor/master/datasets/small/windows/lateral_movement/host/covenant_psremoting_command.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.200.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.200.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 359550 (351K) [application/zip]
Saving to: ‘sets_datos/covenant_psremoting_command.zip’

sets_datos/covenant 100%[===================>] 351.12K  --.-KB/s    in 0.1s    

2020-09-26 10:43:40 (2.32 MB/s) - ‘sets_datos/covenant_psremoting_command.zip’ saved [359550/359550]

b) Extrayendo el archivo JSON

Usaremos el comando unzip y las opciones -o (Overwrite) y -d (different directory) para guardar el archivo JSON en la carpeta sets_datos.

! unzip -o sets_datos/covenant_psremoting_command.zip -d sets_datos/
Archive:  sets_datos/covenant_psremoting_command.zip
  inflating: sets_datos/covenant_psremoting_command_2020-08-06115603.json  

Para facilitar nuestro código, almacenaremos el directorio del archivo JSON en una variable.

psremoting_json = 'sets_datos/covenant_psremoting_command_2020-08-06115603.json'

Representando el Archivo JSON como Dataframe usando Pandas

a) Importando la librería Pandas

import pandas as pd

b) Leyendo el archivo JSON

Usaremos el método pandas.read_json.

Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html

df = pd.read_json(path_or_buf = psremoting_json, lines = True)

Validaremos el tipo de objecto creado usando la funcion type, la cual es proveida por Python (Built-In).

Referencia: https://docs.python.org/3/library/functions.html

type(df)
pandas.core.frame.DataFrame

Finalmente, podemos visualizar el dataframe generado:

df
tags @version EventType Version ThreadID EventTime Task AccountType PipeName Channel ... ContextInfo Payload AdditionalInfo AdditionalInfo2 OperationType Properties MiniportNameLen MiniportName param3 param4
0 [mordorDataset] 1 ConnectPipe 1.0 4208 2020-08-06 11:56:04 18 User \lsass Microsoft-Windows-Sysmon/Operational ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 [mordorDataset] 1 INFO 3.0 4208 2020-08-06 11:56:05 10 User NaN Microsoft-Windows-Sysmon/Operational ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 [mordorDataset] 1 INFO 3.0 4208 2020-08-06 11:56:05 10 User NaN Microsoft-Windows-Sysmon/Operational ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 [mordorDataset] 1 AUDIT_SUCCESS 1.0 1808 2020-08-06 11:56:06 12810 NaN NaN Security ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 [mordorDataset] 1 AUDIT_SUCCESS 1.0 1808 2020-08-06 11:56:06 12810 NaN NaN Security ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4279 [mordorDataset] 1 AUDIT_SUCCESS 0.0 6244 2020-08-06 11:57:36 12810 NaN NaN Security ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4280 [mordorDataset] 1 AUDIT_SUCCESS 0.0 6244 2020-08-06 11:57:36 12810 NaN NaN Security ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4281 [mordorDataset] 1 AUDIT_SUCCESS 0.0 6244 2020-08-06 11:57:36 12810 NaN NaN Security ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4282 [mordorDataset] 1 AUDIT_SUCCESS 0.0 6244 2020-08-06 11:57:36 12810 NaN NaN Security ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4283 [mordorDataset] 1 INFO 5.0 4372 2020-08-06 11:57:36 3 User NaN Microsoft-Windows-Sysmon/Operational ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

4284 rows × 180 columns

c) Conociendo las columnas o atributos del Dataframe

Usaremos el método pandas.DataFrame.info.

Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html

df.info(verbose = True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4284 entries, 0 to 4283
Data columns (total 180 columns):
 #   Column                     Dtype  
---  ------                     -----  
 0   tags                       object 
 1   @version                   int64  
 2   EventType                  object 
 3   Version                    float64
 4   ThreadID                   int64  
 5   EventTime                  object 
 6   Task                       int64  
 7   AccountType                object 
 8   PipeName                   object 
 9   Channel                    object 
 10  EventTypeOrignal           object 
 11  SourceName                 object 
 12  Hostname                   object 
 13  @timestamp                 object 
 14  OpcodeValue                float64
 15  Message                    object 
 16  SourceModuleName           object 
 17  ProcessId                  object 
 18  EventReceivedTime          object 
 19  port                       int64  
 20  AccountName                object 
 21  UtcTime                    object 
 22  Domain                     object 
 23  ExecutionProcessID         int64  
 24  host                       object 
 25  Severity                   object 
 26  SeverityValue              int64  
 27  EventID                    int64  
 28  UserID                     object 
 29  ProviderGuid               object 
 30  RecordNumber               int64  
 31  Image                      object 
 32  Keywords                   int64  
 33  ProcessGuid                object 
 34  SourceModuleType           object 
 35  RuleName                   object 
 36  TargetProcessGUID          object 
 37  SourceProcessGUID          object 
 38  CallTrace                  object 
 39  SourceImage                object 
 40  GrantedAccess              object 
 41  SourceProcessId            object 
 42  SourceThreadId             float64
 43  TargetProcessId            object 
 44  TargetImage                object 
 45  DestAddress                object 
 46  SourcePort                 float64
 47  Category                   object 
 48  Opcode                     object 
 49  Application                object 
 50  Direction                  object 
 51  RemoteMachineID            object 
 52  DestPort                   float64
 53  LayerRTID                  float64
 54  LayerName                  object 
 55  SourceAddress              object 
 56  FilterRTID                 float64
 57  Protocol                   object 
 58  RemoteUserID               object 
 59  TargetObject               object 
 60  SubjectLogonId             object 
 61  PrivilegeList              object 
 62  SubjectDomainName          object 
 63  SubjectUserSid             object 
 64  SubjectUserName            object 
 65  TargetDomainName           object 
 66  ElevatedToken              object 
 67  TargetOutboundDomainName   object 
 68  IpPort                     object 
 69  ImpersonationLevel         object 
 70  ProcessName                object 
 71  KeyLength                  float64
 72  TargetOutboundUserName     object 
 73  RestrictedAdminMode        object 
 74  LogonProcessName           object 
 75  LogonGuid                  object 
 76  WorkstationName            object 
 77  TargetLinkedLogonId        object 
 78  TargetLogonId              object 
 79  TargetUserSid              object 
 80  TransmittedServices        object 
 81  IpAddress                  object 
 82  LmPackageName              object 
 83  TargetUserName             object 
 84  AuthenticationPackageName  object 
 85  VirtualAccount             object 
 86  LogonType                  float64
 87  GroupMembership            object 
 88  EventIdx                   float64
 89  EventCountTotal            float64
 90  DestinationPortName        object 
 91  DestinationIp              object 
 92  SourceIp                   object 
 93  Initiated                  object 
 94  DestinationHostname        object 
 95  SourceHostname             object 
 96  DestinationPort            float64
 97  SourceIsIpv6               object 
 98  User                       object 
 99  DestinationIsIpv6          object 
 100 SourcePortName             object 
 101 ActivityID                 object 
 102 ObjectType                 object 
 103 AccessMask                 object 
 104 AccessList                 object 
 105 ShareName                  object 
 106 Device                     object 
 107 EnabledPrivilegeList       object 
 108 DisabledPrivilegeList      object 
 109 ShareLocalPath             object 
 110 AccessReason               object 
 111 RelativeTargetName         object 
 112 Details                    object 
 113 OriginalFileName           object 
 114 ParentCommandLine          object 
 115 ParentProcessId            float64
 116 FileVersion                object 
 117 Company                    object 
 118 CurrentDirectory           object 
 119 IntegrityLevel             object 
 120 CommandLine                object 
 121 ParentImage                object 
 122 LogonId                    object 
 123 Hashes                     object 
 124 Description                object 
 125 TerminalSessionId          float64
 126 ParentProcessGuid          object 
 127 Product                    object 
 128 Signed                     object 
 129 SignatureStatus            object 
 130 Signature                  object 
 131 ImageLoaded                object 
 132 ObjectName                 object 
 133 ResourceAttributes         object 
 134 TransactionId              object 
 135 RestrictedSidCount         float64
 136 ObjectServer               object 
 137 HandleId                   object 
 138 SourceHandleId             object 
 139 TargetHandleId             object 
 140 NewProcessName             object 
 141 NewProcessId               object 
 142 TokenElevationType         object 
 143 MandatoryLabel             object 
 144 ParentProcessName          object 
 145 Service                    object 
 146 NewSd                      object 
 147 OldSd                      object 
 148 TargetFilename             object 
 149 CreationUtcTime            object 
 150 Path                       object 
 151 Priority                   float64
 152 TaskName                   object 
 153 Status                     object 
 154 IsExecutable               object 
 155 Archived                   object 
 156 QueryResults               object 
 157 QueryName                  object 
 158 QueryStatus                float64
 159 TargetServerName           object 
 160 TargetLogonGuid            object 
 161 TargetInfo                 object 
 162 TicketEncryptionType       object 
 163 ServiceName                object 
 164 ServiceSid                 object 
 165 PreAuthType                float64
 166 TargetSid                  object 
 167 TicketOptions              object 
 168 param1                     object 
 169 param2                     object 
 170 ContextInfo                object 
 171 Payload                    object 
 172 AdditionalInfo             object 
 173 AdditionalInfo2            object 
 174 OperationType              object 
 175 Properties                 object 
 176 MiniportNameLen            float64
 177 MiniportName               object 
 178 param3                     object 
 179 param4                     object 
dtypes: float64(19), int64(9), object(152)
memory usage: 5.9+ MB

Muchas gracias!! Espero que este notebooks haya sido útil para empezar a importar sets de datos de mordor

Aún hay más por aprender :D