Análisis de Datos con Pandas - Correlacionando Datos

Representando el Archivo JSON como Dataframe usando Pandas

La representación del archivo JSON como un Dataframe de Pandas puede involucrar el uso de comandos como wget, unzip. Esto fue explicado con más detalle en el notebook que lleva por título Representando el Archivo JSON como Dataframe usando Pandas, y además se encuentra en la misma carpeta del presente notebook. Para mantener la estructura del presente notebok en un formato simple, los JSON files requeridos para este workshop ya han sido desargados y descomprimidos. Estos archivos se encuentran en la carpeta sets_datos.

psexec_json = 'sets_datos/empire_psexec_dcerpc_tcp_svcctl_2020-09-20121608.json'

a) Importando la librería Pandas

import pandas as pd

b) Leyendo Archivo JSON

Usaremos el método pandas.read_json.

Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html

df = pd.read_json(path_or_buf = psexec_json, lines = True)
df.head(5)
Keywords SeverityValue SourceImage EventID ProviderGuid ExecutionProcessID Channel host AccountType UserID ... MessageNumber ScriptBlockText MessageTotal ScriptBlockId NewSd OldSd MiniportNameLen MiniportName param4 param3
0 -9223372036854775808 2 C:\windows\system32\svchost.exe 10 {5770385F-C22A-43E0-BF4C-06F5698FFBD9} 9848 Microsoft-Windows-Sysmon/Operational wec.internal.cloudapp.net User S-1-5-18 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 -9223372036854775808 2 C:\windows\system32\svchost.exe 10 {5770385F-C22A-43E0-BF4C-06F5698FFBD9} 9848 Microsoft-Windows-Sysmon/Operational wec.internal.cloudapp.net User S-1-5-18 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 -9223372036854775808 2 C:\windows\system32\svchost.exe 10 {5770385F-C22A-43E0-BF4C-06F5698FFBD9} 9848 Microsoft-Windows-Sysmon/Operational wec.internal.cloudapp.net User S-1-5-18 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 -9223372036854775808 2 C:\windows\system32\svchost.exe 10 {5770385F-C22A-43E0-BF4C-06F5698FFBD9} 9848 Microsoft-Windows-Sysmon/Operational wec.internal.cloudapp.net User S-1-5-18 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 -9214364837600034816 2 NaN 5158 {54849625-5478-4994-A5BA-3E3B0328C30D} 4 Security wec.internal.cloudapp.net NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 186 columns

c) Conociendo las columnas o atributos del Dataframe

Usaremos el método pandas.DataFrame.info.

Referencia: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html

df.info(verbose = True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4348 entries, 0 to 4347
Data columns (total 186 columns):
 #   Column                     Dtype  
---  ------                     -----  
 0   Keywords                   int64  
 1   SeverityValue              int64  
 2   SourceImage                object 
 3   EventID                    int64  
 4   ProviderGuid               object 
 5   ExecutionProcessID         int64  
 6   Channel                    object 
 7   host                       object 
 8   AccountType                object 
 9   UserID                     object 
 10  SourceProcessGUID          object 
 11  ThreadID                   int64  
 12  TargetImage                object 
 13  GrantedAccess              object 
 14  EventType                  object 
 15  Opcode                     object 
 16  EventTime                  object 
 17  EventReceivedTime          object 
 18  @timestamp                 object 
 19  SourceModuleType           object 
 20  port                       int64  
 21  AccountName                object 
 22  RecordNumber               int64  
 23  SourceProcessId            object 
 24  SourceThreadId             float64
 25  Task                       int64  
 26  Domain                     object 
 27  @version                   int64  
 28  OpcodeValue                float64
 29  SourceModuleName           object 
 30  TargetProcessGUID          object 
 31  Severity                   object 
 32  SourceName                 object 
 33  Version                    float64
 34  TargetProcessId            object 
 35  Category                   object 
 36  CallTrace                  object 
 37  UtcTime                    object 
 38  Hostname                   object 
 39  RuleName                   object 
 40  tags                       object 
 41  Application                object 
 42  ProcessId                  object 
 43  Message                    object 
 44  FilterRTID                 float64
 45  LayerRTID                  float64
 46  Protocol                   object 
 47  SourcePort                 float64
 48  LayerName                  object 
 49  SourceAddress              object 
 50  RemoteUserID               object 
 51  Direction                  object 
 52  DestPort                   float64
 53  DestAddress                object 
 54  RemoteMachineID            object 
 55  ProcessGuid                object 
 56  Image                      object 
 57  SubjectDomainName          object 
 58  SubjectUserSid             object 
 59  SubjectLogonId             object 
 60  ProcessName                object 
 61  SubjectUserName            object 
 62  Status                     object 
 63  ActivityID                 object 
 64  Payload                    object 
 65  ERROR_EVT_UNRESOLVED       float64
 66  ContextInfo                object 
 67  TargetObject               object 
 68  EventTypeOrignal           object 
 69  Details                    object 
 70  PrivilegeList              object 
 71  TargetLogonId              object 
 72  LogonType                  float64
 73  VirtualAccount             object 
 74  LogonGuid                  object 
 75  AuthenticationPackageName  object 
 76  IpAddress                  object 
 77  TransmittedServices        object 
 78  LmPackageName              object 
 79  ImpersonationLevel         object 
 80  ElevatedToken              object 
 81  WorkstationName            object 
 82  TargetOutboundUserName     object 
 83  TargetOutboundDomainName   object 
 84  LogonProcessName           object 
 85  KeyLength                  float64
 86  TargetLinkedLogonId        object 
 87  RestrictedAdminMode        object 
 88  TargetUserName             object 
 89  IpPort                     object 
 90  TargetUserSid              object 
 91  TargetDomainName           object 
 92  EventIdx                   float64
 93  GroupMembership            object 
 94  EventCountTotal            float64
 95  SourceHandleId             object 
 96  TargetHandleId             object 
 97  ObjectServer               object 
 98  HandleId                   object 
 99  TransactionId              object 
 100 AccessMask                 object 
 101 ObjectName                 object 
 102 ObjectType                 object 
 103 AccessReason               object 
 104 AccessList                 object 
 105 RestrictedSidCount         float64
 106 ResourceAttributes         object 
 107 Path                       object 
 108 TaskName                   object 
 109 Priority                   float64
 110 MandatoryLabel             object 
 111 ParentProcessName          object 
 112 CommandLine                object 
 113 NewProcessName             object 
 114 TokenElevationType         object 
 115 NewProcessId               object 
 116 ParentImage                object 
 117 User                       object 
 118 Hashes                     object 
 119 CurrentDirectory           object 
 120 Description                object 
 121 Company                    object 
 122 FileVersion                object 
 123 IntegrityLevel             object 
 124 TerminalSessionId          float64
 125 ParentProcessGuid          object 
 126 ParentCommandLine          object 
 127 ParentProcessId            float64
 128 LogonId                    object 
 129 Product                    object 
 130 OriginalFileName           object 
 131 ImageLoaded                object 
 132 Signed                     object 
 133 SignatureStatus            object 
 134 Signature                  object 
 135 TargetFilename             object 
 136 CreationUtcTime            object 
 137 Service                    object 
 138 EnabledPrivilegeList       object 
 139 DisabledPrivilegeList      object 
 140 ShareName                  object 
 141 ShareLocalPath             object 
 142 RelativeTargetName         object 
 143 IsExecutable               object 
 144 Archived                   object 
 145 Device                     object 
 146 ServiceName                object 
 147 ServiceAccount             object 
 148 ServiceType                object 
 149 ServiceStartType           float64
 150 ServiceFileName            object 
 151 TicketEncryptionType       object 
 152 ServiceSid                 object 
 153 TicketOptions              object 
 154 ImagePath                  object 
 155 StartType                  object 
 156 param1                     object 
 157 param2                     object 
 158 SourcePortName             object 
 159 DestinationPort            float64
 160 SourceHostname             object 
 161 DestinationIp              object 
 162 SourceIp                   object 
 163 DestinationIsIpv6          object 
 164 Initiated                  object 
 165 SourceIsIpv6               object 
 166 DestinationPortName        object 
 167 DestinationHostname        object 
 168 QueryResults               object 
 169 QueryName                  object 
 170 QueryStatus                float64
 171 AdditionalInfo2            object 
 172 Properties                 object 
 173 OperationType              object 
 174 AdditionalInfo             object 
 175 PipeName                   object 
 176 MessageNumber              float64
 177 ScriptBlockText            object 
 178 MessageTotal               float64
 179 ScriptBlockId              object 
 180 NewSd                      object 
 181 OldSd                      object 
 182 MiniportNameLen            float64
 183 MiniportName               object 
 184 param4                     object 
 185 param3                     object 
dtypes: float64(22), int64(9), object(155)
memory usage: 6.2+ MB

Filtrando Columnas o Atributos de nuestro Dataframe

Seleccionando las columnas ‘@timestamp’,’Hostname’,’Channel’,’EventID’ usando una lista con los nombres de las columnas.

df[['@timestamp','Hostname','Channel','ParentImage','Image','EventID']].head()
@timestamp Hostname Channel ParentImage Image EventID
0 2020-09-20T16:16:09.362Z WORKSTATION5.theshire.local Microsoft-Windows-Sysmon/Operational NaN NaN 10
1 2020-09-20T16:16:09.363Z WORKSTATION5.theshire.local Microsoft-Windows-Sysmon/Operational NaN NaN 10
2 2020-09-20T16:16:09.363Z WORKSTATION5.theshire.local Microsoft-Windows-Sysmon/Operational NaN NaN 10
3 2020-09-20T16:16:09.364Z WORKSTATION5.theshire.local Microsoft-Windows-Sysmon/Operational NaN NaN 10
4 2020-09-20T16:16:09.365Z MORDORDC.theshire.local Security NaN NaN 5158

Preparando Dataframes para un JOIN

a) QUE EVENTO NOS PODRIA AYUDAR A VER NUEVOS SERVICIOS INSTALLADOS EN UN SISTEMA?

b) QUE EVENTO NOS PODRIA AYUDA A IDENTIFICAR USUARIOS AUTHENTICATING REMOTAMENTE? HINT: LOGON TYPE 3 ;)

JOINing Nuevos Servicios Installados con usuarios authenticating remotemente

COMO PODEMOS UNIR ESTOS DOS EVENTOS?

(
pd.merge(Security4697, Security4624[Security4624['LogonType'] == 3],
         left_on = 'SubjectLogonId', right_on = 'TargetLogonId', how = 'inner')
[['ServiceName', 'ServiceFileName','IpAddress']]
)
ServiceName ServiceFileName IpAddress
0 Updater %COMSPEC% /C start /b C:\Windows\System32\Wind... 172.18.39.5

Muchas gracias!! Espero que este notebooks haya sido útil para empezar a revisar algunas técnicas para correlationar datos :D

Aún hay más por aprender :D