No question at this time
The DBA-Village forum
as RSS feed
Site StatisticsEver registered users | 48667 | Total active users | 1334 | Act. users last 24h | 0 | Act. users last hour | 0 | Registered user hits last week | 21 | Registered user hits last month | 202 |
|
Go up
Help needed to delete huge volume of data
Message |
Score |
Author |
Date |
Hi Experts
Please help me.
I have a table ...... |
|
Venu Madhav |
Apr 17, 2022, 10:02 |
How do you determine those 8 million rows?
Wha...... |
     |
Michel Cadot |
Apr 17, 2022, 14:26 |
Hi Michel,
Many thanks for your reply.
I don't...... |
|
Venu Madhav |
Apr 17, 2022, 17:56 |
Still not clear how you retrieve 8 million rows ...... |
     |
Michel Cadot |
Apr 17, 2022, 18:35 |
Hi Michel,
Thanks for your assistance.
<pre>...... |
|
Venu Madhav |
Apr 18, 2022, 12:35 |
The number of rows is not so huge, I have tables...... |
     |
Michel Cadot |
Apr 18, 2022, 17:34 |
Thanks Michel,
I will update you.
Thanks,
V...... |
|
Venu Madhav |
Apr 18, 2022, 19:31 |
Hello Venu,
my understanding (please correct th...... |
     |
Bruno Vroman |
Apr 19, 2022, 11:20 |
Hi Bruno
Thanks for the help and anlysis by dif...... |
|
Venu Madhav |
Apr 20, 2022, 12:26 |
Correction-
So
ideally table DOC_Control conta...... |
|
Venu Madhav |
Apr 20, 2022, 12:27 |
The performances you can get depend on the index...... |
     |
Michel Cadot |
Apr 20, 2022, 12:33 |
Hi Venu,
thank you for the feedback -although I...... |
     |
Bruno Vroman |
Apr 20, 2022, 17:54 |
Subject: |
Help needed to delete huge volume of data |
Author: |
Venu Madhav, India |
Date: |
Apr 17, 2022, 10:02, 36 days ago |
Os info: |
Hpux |
Oracle info: |
12 |
Message: |
Hi Experts
Please help me.
I have a table with 5 coloums. Out of these 5 coloums ,4 columns have fixed value in entire table.and remaining 1 field contains 80 Million rows.
Now out of these 80 million rows , I want to keep only 8 million rows and rest I want to delete.
I was trying to di this by taking export backup of the table based on the Query Clause.Now I am not able to understand How to select those 8 million records as 4 other columns have content value in entire table.
Table ABC- 80 Million records
COL A- CONSTANT VALUE FOR 80 MILLION RECORDS
COL B- -CONSTANT VALUE FOR 80 MILLION RECORDS
COL C- CONSTANT VALUE FOR 80 MILLION RECORDS
COL D - CONSTANT VALUE FOR 80 MILLION RECORDS
COL E- DIFFERENT 80 MILLION VALUES FOR 80 MILLION RECORDS.
PLEASE HELP TO SELECT 8 MILLION RECORDS FROM THIS TABLE.
KINDLY HELP.
THANKYOU VERY MUCH IN ADVANCE.
THANKS
VENU |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Michel Cadot, France |
Date: |
Apr 17, 2022, 14:26, 35 days ago |
Score: |
       |
Message: |
How do you determine those 8 million rows?
What is the criterion?
Regards
Michel
|
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Venu Madhav, India |
Date: |
Apr 17, 2022, 17:56, 35 days ago |
Message: |
Hi Michel,
Many thanks for your reply.
I don't have the selection criteria on the same table.
But I did someore coordination with requesters and they provided me one more table which is having the same column as our problematic Table's Column E.
I have tried to fetch the matching records on both the tables based on Column E and date range which in another table.
Since the first problematic table is having 80 Million records and second table is having 50 Million records. Due to huge volume of records, comparision query is still running for the last 6-7 hours and no output yet.
Any thought on this?
Is expdp is possible with 2 tables in query clause.
Thanks
Venu
|
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Michel Cadot, France |
Date: |
Apr 17, 2022, 18:35, 35 days ago |
Score: |
       |
Message: |
Still not clear how you retrieve 8 million rows from the first table using the second table of 50 million rows.
Post the query you execute.
Regards
Michel
|
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Venu Madhav, India |
Date: |
Apr 18, 2022, 12:35, 34 days ago |
Message: |
Hi Michel,
Thanks for your assistance.
Table -1 /- DOC_Control
Name Null? Type
----------------------------------------- -------- ----------------------------
MANDT NOT NULL VARCHAR2(9)
RIGHT_ID NOT NULL VARCHAR2(60)
OBJECT_GUID NOT NULL RAW(16)
PRC_TYPE_ID NOT NULL NUMBER(3)
TASK_ID NOT NULL VARCHAR2(24)
Table -2 /- DOC_Capture
Name Null? Type
----------------------------------------- -------- ----------------------------
CLIENT NOT NULL VARCHAR2(9)
GUID NOT NULL RAW(16)
OBJECT_ID NOT NULL VARCHAR2(30)
PROCESS_TYPE NOT NULL VARCHAR2(12)
POSTING_DATE NOT NULL VARCHAR2(24)
DESCRIPTION NOT NULL VARCHAR2(120)
Common field in Both the table is Doc_Control(OBJECT_GUID) and Doc_capture(GUID)
No of records- Doc_Control - 80M
Doc_Capture- 50M
I want to keep only 8M records in Table Doc_Control.
Since I am not able to find any selection criteria on Doc_Control. so I have involved another table Doc_Capture.
Below is the query :
select /*+ parallel */B.GUID,B.POSTING_DATE
from
Doc_Control A,Doc_Capture B
where
B.POSTING_DATE>='20200101' and
A.OBJECT_GUID=B.GUID
and
A.RIGHT_ID='ABC8923BL00U';
since volume of data is huge so above query is running for long time.
Goal is to keep only 8M records in Table Doc_Control based on OBJECT_GUID.
How this can be achive ?
is this possible with expdp and impdp which Query clause on 2 tables ?
Please help.
Thanks
Venu |
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Michel Cadot, France |
Date: |
Apr 18, 2022, 17:34, 34 days ago |
Score: |
       |
Message: |
The number of rows is not so huge, I have tables on my laptop with more rows. :)
What is the execution plan of the query?
After the execution use
select * from table(dbms_xplan.display_cursor(format=>'allstats last +cost'));
If you don't want to execute the query and it is still in the SGA, use:
select * from table(dbms_xplan.display_cursor(sql_id=>'<the query sql id>',format=>'allstats last +cost'));
Regards
Michel
|
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Venu Madhav, India |
Date: |
Apr 18, 2022, 19:31, 34 days ago |
Message: |
Thanks Michel,
I will update you.
Thanks,
Venu |
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Bruno Vroman, Belgium |
Date: |
Apr 19, 2022, 11:20, 34 days ago |
Score: |
       |
Message: |
Hello Venu,
my understanding (please correct the mistakes)
- you have a table doc_control with 80 million rows
- you want to keep 8 million rows of this table
- rows to keep: any of them, with the condition that doc_control.object_guid must exist in table doc_capture.guid
(from the last point I guess that there are doc_control.object_id that have NO corresponding doc_capture.guid)
I tend to guess that the "foreign key" relationship between the 2 tables is not declared, this might be the source of issues (orphan rows...), maybe you should review the data model.
Important: I assume that this is NOT a "production" database and that you want to decrease the data volume for some reason, and that deleting 90% of the table is not an issue.
The fact that 4 columns out of 5 have a constant value is not a problem (well, it shows that something must be wrong somewhere from business point of view, but technically it is not a problem)
Would we run:SELECT co.*
FROM doc_control co
WHERE EXISTS( SELECT NULL
FROM doc_capture ca
WHERE ca.guid = co.object_id
) we would have all the rows of "co" having a correspondance in "ca"
If we want only 8 million rows, we can doSELECT co.*
FROM doc_control co
WHERE EXISTS( SELECT NULL
FROM doc_capture ca
WHERE ca.guid = co.object_id
)
AND rownum <= 8E6 But if the query implies to scan the 50 millions rows of "ca" for each candidate row of "co" this would be prohibitive (especially if many rows of "co" have no corresponding row in "ca")
It is not clear if the condition on "posting_date" is important or just a "nice to have", and if the following querySELECT co.*
FROM doc_control co
WHERE EXISTS( SELECT NULL
FROM doc_capture ca
WHERE ca.guid = co.object_id
AND ca.posting_date >= '20200101'
)
AND rownum <= 8E6 would retrieve 8 million rows or possibly less... (remark: dates stored as VARCHAR2(24)? Awful!)
To derive a better way I would like to know if some assumptions are correct...
a) Please tell us what indexes exist on both tables and if they are UNIQUE
b) For each table: is there a primary key? Is it declared in the database? On which column(s)?
c) Is there a foreign key joining the 2 tables? If yes, which one exactly? Is it declared in the database?
d) for "b)" and "c)", maybe constraints are not declared in the database, but tell us if they make sense from a business point of view...
And, waiting your replies, just an idea: maybe we don't really care about the values of "co" to keep and we might truncate the table and then insert in "co" 8 million ca.guid picked randomly for "co.object_id" and the constant values for the 4 other columns...
Best regards,
Bruno Vroman.
|
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Venu Madhav, India |
Date: |
Apr 20, 2022, 12:26, 32 days ago |
Message: |
Hi Bruno
Thanks for the help and anlysis by different approaches.
Issue is due to some business logic triggered and caused the mishaps in the environment. To correct the issue we are doing this exercise.
So
ideally table DOC_Control contains 0 records.But due to issue this table is now have 8M records.
and we don't have any selection criteria to identify those 8M correct records.
So another table DOC_Capture came in to the picture with one field in common.which helps to identify the 8 M records.
Now we have tested successfully below method in Pre Prod.
1. Created on temp table and in "Temp Table" inserted the common 8 M records.
2.taken expdp Backup for DOC_Control table with 80M records.
3.Truncated the DOC_Control table.
4.Inserted the 8M records from "Temp Table" to DOC_Control.
5.Activated the Business logic jobs.
Now validation is in progress.
Many thanks for your help.
Thnaks,
Venu |
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Venu Madhav, India |
Date: |
Apr 20, 2022, 12:27, 32 days ago |
Message: |
Correction-
So
ideally table DOC_Control contains 0 records.But due to issue this table is now have 80M records. |
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Michel Cadot, France |
Date: |
Apr 20, 2022, 12:33, 32 days ago |
Score: |
       |
Message: |
The performances you can get depend on the indexes and constraints... you have thus the execution plan I asked you in order to determine what you have to do to improve these performances.
Data Pump or SELECT use the same way so fist improve SELECT.
Regards
Michel
|
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
Subject: |
Re: Help needed to delete huge volume of data |
Author: |
Bruno Vroman, Belgium |
Date: |
Apr 20, 2022, 17:54, 32 days ago |
Score: |
       |
Message: |
Hi Venu,
thank you for the feedback -although I don't understand much of it-.
It seems that you have fixed your issue and are now "happy" so all is fine.
But still:
- you might on long term find some benefits in declaring constraints like 'PK' and 'FK' at database level.
- you might maybe review the design of doc_control; the fact that all rows have the same value for four fields seem to show that they are useless...
Maybe you might even drop the table and simply use a view:DROP TABLE doc_control;
CREATE VIEW doc_control AS
SELECT 'mandt' mandt, 'ABC8923BL00U' right_id, ca.guid object_id, 123, 'task_id'
FROM doc_capture ca
; (with the "correct" values for the constants, I've just picked examples)
Best regards,
Bruno. |
Your rating?: |
This reply is Good Excellent |
Goto: |
Reply - Top of page |
If you think this item violates copyrights, please click here
|