I have a sourcetype that is in CSV format and I'd like to extract fields from the multiline header that proceeds these files coming in. Each new line in the header begins with `#` and these lines are comma separated. This header is followed by the actual data fields which are semi-colon separated. I'm trying to figure out the best way to go about parsing this data. The data looks like this:
#PlanID: '1.2.246.352.71.5.459020837699.2820.20131008220539', IrrSessionID: '1.2.246.352.82.6.5130518203565855886.177214397189262813860', FieldNum:1
#BeamSizeID:'4.0', Status (1, 0, 0), CumMU: 109720.889806
#Temp C/M(0.000000,0.000000), Pressure C/M(0.000000,0.000000)
#VALUES;;;
Field1;Field2;Field3;Field4;...;FieldN
Data1;Data2;Data4;Data4;...;DataN
I'd like to get out the following fields and what they would be relate to in this example:
PlanID = 1.2.246.352.71.5.459020837699.2820.20131008220539
IrrSessionID = 1.2.246.352.82.6.5130518203565855886.177214397189262813860
FieldNum = 1
BeamSizeID = 4.0
Status = (1, 0, 0)
CumMu = 109720.889806
Temp C/M = (0.000000,0.000000)
Pressure C/M= (0.000000,0.000000)
Followed by:
Field1, Field2,..., etc.
Previously I just had Splunk strip away all the lines beginning with # using:
FIELD DELIMITER = ;
PREAMBLE REGEX = ^\#
(this works well for all my other sourcetypes)
I have also used a Python script to append the necessary fields to file and then I can keep using the aforementioned settings. However, we do not want to run a separate process on our files before getting them into Splunk.
So, what is the best way to get these fields in? Is there a way to manipulate my Python script and use it in Splunk on my incoming data? Or should I use some extensive RegEx in `props.conf` and `transforms.conf` to achieve this?
Many thanks!
↧