I am trying to use dbjoin instead of incremental file in one oof my graphs. I am selecting max(event_id) from the fact table ( where I am storing the source event_id) in DBINPUT transformation and passing that through the metadata to DBJOIN transformation. DBJoin is selecting rows from the source database using the logic > ? . This is taking very long. Taking up memory. Graph fails because it is taking very long. This is my sql in DBjoin
select
coalesce(lower(trim("public"."ev_quicklook"."page")),lower(trim("public"."ev_quicklook"."page")),'no value') as page,
'no value',
'no value',
coalesce(trim(upper("public"."ev_quicklook"."style_group")),trim(upper("public"."ev_quicklook"."style_group")),'no value') as style_group,
'quicklook',
"public"."evbase"."created",
null,
"public"."ev_quicklook"."event_id",
trim("public"."evbase"."session_id") as session_id
from
"public"."ev_quicklook",
"public"."evbase"
where
"public"."ev_quicklook"."event_id" >? and
"public"."ev_quicklook"."event_id" ="public"."evbase"."event_id"
What am I doing wrong?
Thanks for your help.
Kasturi
select
coalesce(lower(trim("public"."ev_quicklook"."page")),lower(trim("public"."ev_quicklook"."page")),'no value') as page,
'no value',
'no value',
coalesce(trim(upper("public"."ev_quicklook"."style_group")),trim(upper("public"."ev_quicklook"."style_group")),'no value') as style_group,
'quicklook',
"public"."evbase"."created",
null,
"public"."ev_quicklook"."event_id",
trim("public"."evbase"."session_id") as session_id
from
"public"."ev_quicklook",
"public"."evbase"
where
"public"."ev_quicklook"."event_id" >? and
"public"."ev_quicklook"."event_id" ="public"."evbase"."event_id"
What am I doing wrong?
Thanks for your help.
Kasturi
-
There are several potential reasons why your graph is slow: - Missing or incorrect indexes on event_id on one of the tables.
- What part of graph is slow - reader, joiner or the output?
- What are the data volumes you are processing?
- DBJoin should not be memory intensive. What other operations are you doing in the graph?
- If the graph fails, what is the reason for failure? Can you send the log file?
- Can you post the graph so that we can see what you are actually doing?
-
Thank you for your reply. Here is the graph.
For the initial load, rows were about 3 million.
It is not bad for incremental load. I take the max(event_id) from the fact and get the records which are >max(event_id) .
Thanks,
Kasturi
Please sign in to leave a comment.
Comments 2