No More Spool Space Teradata Query Solution-白红宇

No More Spool Space Teradata Query Solution

阅读量：4209 次

发布时间：2019-05-26

本文共 2831 字，大约阅读时间需要 9 分钟。

Have you ever come across a situation called NO MORE SPOOL SPACE? My friend does and hence I studied a bit about Teradata.

The query:

SELECT DISTINCT fieldname FROM tablename;

The error:

“NO MORE SPOOL SPACE”.

Correct, the problem wasn’t related to MySQL, that was for something called Teradata.

Initially, I could see that query should be using temporary space (from name SPOOL SPACE I guessed) to get distinct data – may be it’s a huge table or something.

So I think one has to look into temporary table space’s size – they call it SPOOL SPACE.

Next is Google’s turn to explain me more, I Googled with appropriate keywords and got following understandings:

Spool is temporary disk space used to hold intermediate rows during query processing, and to hold the rows in the answer set of a transaction.

Spool space is allocated to a user as a total amount available for that user, however this total amount is spread across all AMPS. This means if you’re allocated 200G of spool and you’re on a system with 24 AMPS, you’re allocated ~8.3G of spool on each AMP. If any AMP exceeds its allocated amount of spool during a transaction you submit, your query is cancelled with error: NO MORE SPOOL SPACE.

Common reasons for “No More Spool Space Error”:

1) All the rows in the query are being processed on a few amps creating a “hot amp” situation. In this case, just a few amps are racking up spool while the others sit comparatively idle. This is caused when the tables in the query are missing stats, have been improperly PI’d, or are otherwise “untuned”.

2) The query is excluding critical join criteria resulting in Cartesian products.

3) You just don’t have enough space to perform the query.

From other readings I suggested one query change:

SELECT fieldname FROM tablename GROUP BY fieldname;

The reason behind above query is:

Teradata uses two different methods to produce the result sets of above queries, However the performance varies dramatically depending on the data.

SELECT DISTINCT is designed for data sets that are nearly unique to begin with and works by sorting the entire intermediate spool file and discarding duplicate values.

SELECT/GROUP BY is designed for data sets that have relatively few unique values and works by performing an AMP local grouping operation and then merging the partial result sets for final processing.

Few other steps (even) I can suggest to solve such problem are:

– Explain to see execution plan to have an idea.

– Check table definition for indexes: Show table table-name;

– CREATE INDEX indexName (field-name) on table-name;

Finally, if nothing happens contact your DBA or the “right person”

I hope this will help someone, somewhere, someday.

If you know that I’ve misunderstood something please comment and give respective links – I should be correcting it asap.

转载地址：http://qfrli.baihongyu.com/

你可能感兴趣的文章