Hello,
i recently discovered a little problem/challenge/opportunity(?) with an existing graph I wrote. We have some account data arriving in a flat file where the account numbers are shared potentially between multiple individuals. I thought the sortWithinGroups component would be ideal....I could group by account number and sort by last_name and first_name within the same account number, but I need to find a way to distinguish between the number of individuals within a group so sorted....Basically I need a counter of some kind added which will keep track of each individual within a given group for later reference and granularity...There isn't anything in the input data to identify the number of individuals in the same group in a given file....Sample input
Account#, Last name, First name, Order_amount, etc...
123, Smith, John, 1000,....
123, Smith, Mary, 1000,...
123, Jones, Davey, 50,...
456, Bird, Larry, 10,...
456, Rogers,Ginger, 2000,...
789, Crosby,Bing, 300,...
The files will potentially have 1000's of records
Needed output
123-1, Jones, Davey,50,...
123-2, Smith, John,1000,...
123-3, Smith, Mary, 1000,...
456-1, Bird, Larry,10
456-2, Rogers, Ginger, 2000,...
789-1, Crosby, Bing,300,...
All ideas are greatly appreciated,
Thanks
--Matt
i recently discovered a little problem/challenge/opportunity(?) with an existing graph I wrote. We have some account data arriving in a flat file where the account numbers are shared potentially between multiple individuals. I thought the sortWithinGroups component would be ideal....I could group by account number and sort by last_name and first_name within the same account number, but I need to find a way to distinguish between the number of individuals within a group so sorted....Basically I need a counter of some kind added which will keep track of each individual within a given group for later reference and granularity...There isn't anything in the input data to identify the number of individuals in the same group in a given file....Sample input
Account#, Last name, First name, Order_amount, etc...
123, Smith, John, 1000,....
123, Smith, Mary, 1000,...
123, Jones, Davey, 50,...
456, Bird, Larry, 10,...
456, Rogers,Ginger, 2000,...
789, Crosby,Bing, 300,...
The files will potentially have 1000's of records
Needed output
123-1, Jones, Davey,50,...
123-2, Smith, John,1000,...
123-3, Smith, Mary, 1000,...
456-1, Bird, Larry,10
456-2, Rogers, Ginger, 2000,...
789-1, Crosby, Bing,300,...
All ideas are greatly appreciated,
Thanks
--Matt
-
Hi Matt,
I think that your case is pretty specific so it is not necessary to make a new component for it. You can achieve the same result with a combination of ExtSort and Reformat. ExtSort is capable of sorting by more than one key and Reformat takes care of adding the unique account id. Sort key for ExtSort will be Account(a);Last_name(a);First_name(a) and Reformat will have following CTL code:
string account = '';
string last_name = '';
string first_name = '';
integer counter = 1;
boolean first_record = true;
function integer transform() {
if ($in.0.Account == null || $in.0.Last_name == null || $in.0.First_name == null || $in.0.Order_amount == null) {
raiseError('Wrong input file. Some value contains NULL.');
}
if (first_record) {
account = $in.0.Account;
last_name = $in.0.Last_name;
first_name = $in.0.First_name;
first_record = false;
}
if (account == $in.0.Account && last_name == $in.0.Last_name && first_name == $in.0.First_name) { // account number and person are the same as in the previous record
$out.0.Account = $in.0.Account + '-' + counter;
} else if (account == $in.0.Account && !(last_name == $in.0.Last_name && first_name == $in.0.First_name)) { // account number is the same but it is a different person
counter++;
$out.0.Account = $in.0.Account + '-' + counter;
} else { // different account number
counter = 1;
$out.0.Account = $in.0.Account + '-' + counter;
}
$out.0.Last_name = $in.0.Last_name;
$out.0.First_name = $in.0.First_name;
$out.0.Order_amount = $in.0.Order_amount;
account = $in.0.Account;
last_name = $in.0.Last_name;
first_name = $in.0.First_name;
return OK;
}
However, this is just an example working for data you provided. Real application should use some unique identifier, not the first_name and last_name combination.
Best regards, -
Thanks for the info sir!
--Matt -
Hi,
more generic solution may look like:
//#CTL2
string currentUuid;
input_csv lastRecord;
string[] checkedFields = ["Account", "Last_name", "First_name"];
function boolean equalRecords(input_csv a, input_csv b, string[] fields) {
foreach (string field : fields) {
if (isNull(a, field) || isNull(b, field)) {
return false;
}
if (compare(a, field, b, field)!=0) {
return false;
}
}
return true;
}
function integer transform() {
if (!equalRecords(lastRecord, $in.0, checkedFields)) {
currentUuid = randomUUID();
}
$out.0.* = $in.0.*;
$out.0.Account = currentUuid;
lastRecord = $in.0;
return OK;
}
Please sign in to leave a comment.
Comments 3