Apache Hive Changelog

New in version 0.11.0

May 21st, 2013
  • Sub-task:
  • optimize orderby followed by a groupby
  • TypeInfoFactory is not thread safe and is access by multiple threads
  • InspectorFactories contains static HashMaps which can cause infinite loop
  • disable TestBeeLineDriver
  • disable TestBeeLineDriver in ptest util
  • Integrate HCatalog site into Hive site
  • Adjust build.xml package command to move all hcat jars and binaries into build
  • Move HCatalog trunk code from trunk/hcatalog/historical to trunk/hcatalog
  • HCatalog branches need to move out of trunk/hcatalog/historical
  • HCat needs to get current Hive jars instead of pulling them from maven repo
  • Merge HCat NOTICE file with Hive NOTICE file
  • Clean up remaining items in hive/hcatalog/historical/trunk
  • Bug:
  • Hive server is SHUTTING DOWN when invalid queries beeing executed.
  • If all of the parameters of distinct functions are exists in group by columns, query fails in runtime
  • ObjectInspectorConverters cannot convert Void types to Array/Map/Struct types.
  • should throw "Ambiguous column reference key" Exception in particular join condition
  • Aggregations without grouping should return NULL when applied to partitioning column of a partitionless table
  • Invalid tag is used for MapJoinProcessor
  • Filters on outer join with mapjoin hint is not applied correctly
  • Hive CI failing due to script_broken_pipe1.q
  • Comment indenting is broken for "describe" in CLI
  • HBase Handler doesn't handle NULLs properly
  • Hive compile errors under Java 7 (JDBC 4.1)
  • change hive.auto.convert.join's default value to true
  • LOAD DATA INPATH fails if a hdfs file with same name is added to table
  • Mixing avro and snappy gives null values
  • semi-colon in comments in .q file does not work
  • Result of outer join is not valid
  • HIVE JDBC module won't compile under JDK1.7 as new methods added in JDBC specification
  • user should not specify mapjoin to perform sort-merge bucketed join
  • Fix log4j configuration errors when running hive on hadoop23
  • PrimitiveObjectInspector doesn't handle timestamps properly
  • Merging join tree may reorder joins which could be invalid
  • Implement * or a.* for arguments to UDFs
  • Avro SerDe doesn't handle serializing Nullable types that require access to a Schema
  • release locks at the end of move tasks
  • NPE in union processing followed by lateral view followed by 2 group bys
  • When Group by Partition Column Type is Timestamp or STRING Which Format contains "HH:MM:SS", It will occur URISyntaxException
  • reflect udf cannot find method which has arguments of primitive types and String, Binary, Timestamp types mixed
  • script_pipe.q fails when using JDK7
  • RCFileWriter does not implement the right function to support Federation
  • HiveMetaStoreFsImpl is not compatible with hadoop viewfs
  • Allow URIs without port to be specified in metatool
  • External JAR files on HDFS can lead to race condition with hive.downloaded.resources.dir
  • enhanceModel.notRequired is incorrectly determined
  • Multiple insert overwrite into multiple tables query stores same results in all tables
  • Renaming table changes table location scheme/authority
  • Hive Query Explain Plan JSON not being created properly
  • Patch: Hive's ivy internal resolvers need to use sourceforge for sqlline
  • Hive won't compile with -Dhadoop.mr.rev=20S
  • make optimizing multi-group by configurable
  • Error in groupSetExpression rule in Hive grammar
  • PTest doesn't work due to hive snapshot version upgrade to 11
  • Driver.validateConfVariables() should perform more validations
  • Provide hive operation name for hookContext
  • JDBCStatsPublisher fails when ID length exceeds length of ID column
  • union_remove_9.q fails in trunk (hadoop 23)
  • TestNegativeMinimrCliDriver_mapreduce_stack_trace.q fails on hadoop-1
  • Enable adding hooks to hive meta store init
  • BucketizedHiveInputFormat should be automatically used with Bucketized Map Joins also
  • HIVE-3750 broke TestParse
  • Sort merge join should work if join cols are a prefix of sort columns for each partition
  • Unit test failures due to unspecified order of results in "show grant" command
  • Add MapJoinDesc.isBucketMapJoin() as part of explain plan
  • testCliDriver_sample_islocalmode_hook fails on hadoop-1
  • stats19.q is failing on trunk
  • Regression introduced from HIVE-3401
  • testCliDriver_repair fails on hadoop-1
  • Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
  • NPE in SELECT when WHERE-clause is an and/or/not operation involving null
  • testCliDriver_combine2 fails on hadoop-1
  • testCliDriver_loadpart_err fails on hadoop-1
  • testCliDriver_input39 fails on hadoop-1
  • explain dependency should show the dependencies hierarchically in presence of views
  • Ptest failing due to "Argument list too long" errors
  • Concurrency issue in RCFile: multiple threads can use the same decompressor
  • Adding the name space for the maven task for the maven-publish target.
  • Consider creating a literal like "D" or "BD" for representing Decimal type constants
  • bug if different serdes are used for different partitions
  • Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)
  • insert overwrite fails with stored-as-dir in cluster
  • Hive CLI needs UNSET TBLPROPERTY command
  • Insert overwrite doesn't create a dir if the skewed column position doesnt match
  • adding .gitattributes file for normalizing line endings during cross platform development
  • hive cli null representation in output is inconsistent
  • ppd.remove.duplicatefilters removing filters too aggressively
  • Aliased column in where clause for multi-groupby single reducer cannot be resolved
  • hour() function returns 12 hour clock value when using timestamp datatype
  • Multi-groupby optimization fails when same distinct column is used twice or more
  • Normalize left over CRLF files
  • Upgrade hbase dependency to 0.94
  • testHBaseNegativeCliDriver_cascade_dbdrop fails on hadoop-1
  • MAP JOIN for VIEW thorws NULL pointer exception error
  • lot of tests failing for hadoop 23
  • negative value for hive.stats.ndv.error should be disallowed
  • wrong mapside groupby if no partition is being selected
  • something wrong with the hive-default.xml
  • Partition pruning fails on = expression
  • create view statement's outputs contains the view and a temporary dir.
  • Wrong data due to HIVE-2820
  • table_access_keys_stats.q fails with hadoop 0.23
  • Possible deadlock in ZK lock manager
  • Union with map-only query on one side and two MR job query on the other produces wrong results
  • For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress
  • Normalize more CRLF line endings
  • Change test for HIVE-2332
  • recursive_dir.q fails on 0.23
  • join_filters_overlap.q fails on 0.23
  • join_nullsafe.q fails on 0.23
  • Potential overflow with new RCFileCat column sizes options
  • Add Oracle metastore upgrade script for 0.9 to 10.0
  • Hive release tarballs don't contain PostgreSQL metastore scripts
  • Skewed query fails if hdfs path has special characters
  • MiniMR test remains pending after test completion
  • avro_nullable_fields.q is failing in trunk
  • Hive 0.10 postgres schema script is broken
  • Cleanup after HIVE-3403
  • Maintain a clear separation between Windowing & PTF at the specification level.
  • Update new UDAFs introduced for Windowing to work with new Decimal Type
  • Fix select expr processing in PTF Operator
  • Update PTF invocation and windowing grammar
  • Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks
  • PostgreSQL upgrade scripts are not valid
  • Oracle metastore update script will fail when upgrading from 0.9.0 to 0.10.0
  • Mysql metastore upgrade script will end up with different schema than the full schema load
  • Hive client goes into infinite loop at 100% cpu
  • Incorrect status for AddPartition metastore event if RawStore commit fails
  • MapJoin failing with Distributed Cache error
  • PostgreSQL upgrade scripts are creating column with incorrect name
  • Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0
  • Thrift alter_table api doesnt validate column type
  • Bring paranthesis handling in windowing specification in compliance with sql standard
  • Hive Profiler dies with NPE
  • Name windowing function in consistence with sql standard
  • NPE at runtime while selecting virtual column after joining three tables on different keys
  • Should be able to specify windowing spec without needing Between
  • Column Pruner for PTF Op
  • remove use of FunctionRegistry during PTF Op initialization
  • Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.
  • fix ptf negative tests
  • Support multiple partitionings in a single Query
  • Disallow partition/sort and distribute/order combinations in windowing and partitioning spec
  • Extend rcfilecat to support (un)compressed size and no. of row
  • Followup to HIVE-701: reduce ambiguity in grammar
  • Map-join outer join produces incorrect results.
  • Hive eclipse build path update for string template jar
  • Make partition by optional in over clause
  • alterPartition and alterPartitions methods in ObjectStore swallow exceptions
  • Delay the serialize-deserialize pair in CommonJoinTaskDispatcher
  • Altering a view partition fails with NPE
  • Add Lead & Lag UDAFs
  • allow expressions with over clause
  • Break up ptf tests in PTF, Windowing and Lead/Lag tests
  • PTF ColumnPruner doesn't account for Partition & Order expressions
  • Generated aliases for windowing expressions is broken
  • Use of hive.exec.script.allow.partial.consumption can produce partial results
  • Store complete names of tables in column access analyzer
  • Remove sprintf from PTFTranslator and use String.format()
  • decimal_3.q & decimal_serde.q fail on hadoop 2
  • problem in hive.map.groupby.sorted with distincts
  • ORC file doesn't properly interpret empty hive.io.file.readcolumn.ids
  • OrcInputFormat assumes Hive always calls createValue
  • Remove System.gc() call from the map-join local-task loop
  • Hive localtask does not buffer disk-writes or reads
  • Hive MapJoinOperator unnecessarily deserializes values for all join-keys
  • Update Hive 0.10.0 RELEASE_NOTES.txt
  • Allow over() clause to contain an order by with no partition by
  • Partition by column does not have to be in order by
  • Default value in lag is not handled correctly
  • Window range specification should be more flexible
  • ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty
  • Queries fail if timestamp data not in expected format
  • remove support for lead/lag UDFs outside of UDAF args
  • Bring the Lead/Lag UDFs interface in line with Lead/Lag UDAFs
  • Fix eclipse template classpath to include new packages added by ORC file patch
  • ORC's union object inspector returns a type name that isn't parseable by TypeInfoUtils
  • MiniDFS shim does not work for hadoop 2
  • Specifying alias for windowing function
  • Remove inferring partition specification behavior
  • Incorrect column mappings with over clause
  • bug with hive.auto.convert.join.noconditionaltask with outer joins
  • Cleanup aisle "ivy"
  • wrong results big outer joins with array of ints
  • HiveProfiler NPE with ScriptOperator
  • NPE reading column of empty string from ORC file
  • need to add protobuf classes to hive-exec.jar
  • RetryingHMSHandler doesn't retry in enough cases
  • Hive converts bucket map join to SMB join even when tables are not sorted
  • union_remove_*.q fail on hadoop 2
  • [REGRESSION] FsShell.close closes filesystem, removing temporary directories
  • Round UDF converts BigInts to double
  • ORC fails with files with different numbers of columns
  • NonBlockingOpDeDup does not merge SEL operators correctly
  • Filter getting dropped with PTFOperator
  • doAS does not work with HiveServer2 in non-kerberos mode with local job
  • Document HiveServer2 setup under the admin documentation on hive wiki
  • Document HiveServer2 JDBC and Beeline CLI in the user documentation
  • NPE in ReduceSinkDeDuplication
  • QL build-grammar target fails after HIVE-4148
  • TestJdbcDriver2.testDescribeTable failing consistently
  • ORC fails with String column that ends in lots of nulls
  • OVER clauses with ORDER BY not getting windowing set properly
  • describe table output always prints as if formatted keyword is specified
  • Bring windowing support inline with SQL Standard
  • reuse Partition objects in PTFOperator processing
  • Clientpositive test parenthesis_star_by is non-deteministic
  • Fix show_create_table_*.q test failures
  • explain dependency does not capture the input table
  • CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists
  • hiveserver2 string representation of complex types are inconsistent with cli
  • Code cleanup : debug methods, having clause associated with Windowing
  • update show_functions.q.out for functions added for windowing
  • SEL operator created with missing columnExprMap for unions
  • union_remove_12, union_remove_13 are failing on hadoop2
  • union_remove_10 is failing on hadoop2 with assertion (root task with non-empty set of parents)
  • fix last_value UDAF behavior
  • fix handling of binary type in hiveserver2, jdbc driver
  • bug in hive.map.groupby.sorted in the presence of multiple input partitions
  • Limit precision of decimal type
  • partition wise metadata does not work for text files
  • Hive does not differentiate scheme and authority in file uris
  • TestRetryingHMSHandler is failing on trunk.
  • Add IntelliJ project files files to .gitignore
  • HCatalog build fails when behind a firewall
  • hiveserver2 should support -hiveconf commandline parameter
  • ant thriftif fails on hcatalog
  • Fix how RowSchema and RowResolver are set on ReduceSinkOp that precedes PTFOp
  • empty java files in hcatalog
  • Newly added test TestCliDriver.hiveprofiler_union0 is failing on trunk
  • DOS line endings in auto_join26.q
  • enable doAs in unsecure mode for hive server2, when MR job runs locally
  • OperatorHooks hit performance even when not used
  • Revert changes checked-in as part of HIVE-1953
  • Consider extending max limit for precision to 38
  • sqlline dependency is not required
  • NPE in constant folding with decimal
  • orc*.q tests fail on hadoop 2
  • most windowing tests fail on hadoop 2
  • ctas test on hadoop 2 has outdated golden file
  • serde_regex test fails on hadoop 2
  • Selecting from a view, and another view that also selects from that view fails
  • NPE for query involving UNION ALL with nested JOIN and UNION ALL
  • Guava not getting included in build package
  • remove duplicate impersonation parameters for hiveserver2
  • Check for Map side processing in PTFOp is no longer valid
  • wrong result in left semi join
  • some issue with merging join trees
  • Hive Version returned by HiveDatabaseMetaData.getDatabaseProductVersion is incorrect
  • Counters hit performance even when not used
  • ant maven-build fails because hcatalog doesn't have a make-pom target
  • test leadlag.q fails
  • HS2 Resource leak: operation handles not cleaned when originating session is closed
  • TestHCatStorer.testStoreFuncAllSimpleTypes fails because of null case difference
  • PTFDesc tries serialize transient fields like OIs, etc.
  • webhcat - support ${WEBHCAT_PREFIX}/conf/ as config directory
  • HCatalog unit tests stop after a failure
  • Improve memory usage by ORC dictionaries
  • hcatalog version numbers need to be updated
  • HCatalog build directories get included in tar file produced by "ant tar"
  • hcatalog jars not getting published to maven repo
  • ORC map columns get class cast exception in some context
  • TestBeeLineWithArgs.testPositiveScriptFile fails
  • HS2 holding too many file handles of hive_job_log_hive_*.txt files
  • Hive can't load transforms added using 'ADD FILE'
  • Fix eclipse project template
  • Improvement:
  • improve group by syntax
  • more query plan optimization rules
  • Hive should process comments in CliDriver
  • Upgrade antlr version to 3.4
  • Use name of original expression for name of CAST output
  • RegexSerDe should support other column types in addition to STRING
  • msck repair should find partitions already containing data files
  • Add environment context to metastore Thrift calls
  • Diversify grammar for split sampling
  • Avoid race conditions while downloading resources from non-local filesystem
  • Provide ALTER for partition changing bucket number
  • Allow CREATE TABLE LIKE command to take TBLPROPERTIES
  • Simple lock manager for dedicated hive server
  • hivetest.py: revision number and applied patch
  • Provide a way to use counters in Hive through UDF
  • sort-merge join does not work with sub-queries
  • Support altering partition column type in Hive
  • Add mapreduce workflow information to job configuration
  • Stop storing default ConfVars in temp file
  • HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
  • Minor fix for 'tableName' in Hive.g
  • de-emphasize mapjoin hint
  • Print number of fetched rows after query in CliDriver
  • Multi-insert involving bucketed/sorted table turns off merging on all outputs
  • Better error message if metalisteners or hookContext cannot be loaded/instantiated
  • Resolve TODO in TUGIBasedProcessor
  • object inspectors should be initialized based on partition metadata
  • UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD
  • Create a new Optimized Row Columnar file format for Hive
  • Better align columns in DESCRIBE table_name output to make more human-readable
  • Replace hashmaps in JoinOperators to array
  • Support noscan operation for analyze command
  • Remove code for merging files via MR job
  • merge map-job followed by map-reduce job
  • support partial scan for analyze command - RCFile
  • Clean up/fix PartitionNameWhitelistPreEventListener
  • Correctly enforce the memory limit on the multi-table map-join
  • Add o.a.h.h.serde.Constants for backward compatibility
  • Create abstract classes for serializer and deserializer
  • Add ORC file to the grammar as a file format
  • Remove init(fname) from TestParse.vm for each test
  • Swap applying order of CP and PPD
  • Improve Error Logging in MetaStore
  • Add reflect UDF for member method invocation of column
  • ignore mapjoin hint
  • Modify PreDropPartitionEvent to pass Table parameter
  • Refactor code for finding windowing expressions
  • Expose metastore JMX metrics
  • Support avg(decimal)
  • Window handling dumps debug info on console, instead should use logger.
  • ORC runs out of heap when writing
  • Sort merge join does not work for outer joins for 7 inputs
  • sort merge join should work for outer joins for more than 8 inputs
  • optimize hive.enforce.bucketing and hive.enforce sorting insert
  • Log logical plan tree for debugging
  • add hive.map.groupby.sorted.testmode
  • Remove unused builtins and pdk submodules
  • PTFDeserializer should reconstruct OIs based on InputOI passed to PTFOperator
  • Change default bigtable selection policy for sort-merge joins
  • New Feature:
  • Implement TRUNCATE
  • lots of reserved keywords in hive
  • Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
  • Infer bucketing/sorting properties
  • Adding the oracle nvl function to the UDF
  • Specify location of log4j configuration files via configuration properties
  • Add DECIMAL data type
  • Implement HiveServer2
  • Hive List Bucketing - DML support
  • HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
  • Add 'IGNORE PROTECTION' predicate for dropping partitions
  • when output hive table to file,users should could have a separator of their own choice
  • Add Operator level Hooks
  • Support ALTER VIEW AS SELECT in Hive
  • Add a way to get the uncompressed/compressed sizes of columns from an RC File
  • getReducersBucketing in SemanticAnalyzer may return more than the max number of reducers
  • Allow updating bucketing/sorting metadata of a partition through the CLI
  • Hive Profiler
  • Allow Decimal type columns in Regex Serde
  • Ability to create and drop temporary partition function
  • Allow partition by/order by in partitioning spec in over clause and partition function
  • Implement decimal encoding for ORC
  • Testing with Hadoop 2.x causes test failure for ORC's TestFileDump
  • Expose ORC's FileDump as a service
  • Implement a memory manager for ORC
  • Task:
  • Unescape partition names returned by show partitions
  • Add check to determine whether partition can be dropped at Semantic Analysis time
  • ALTER TABLE ADD PARTS should check for valid partition spec and throw a SemanticException if part spec is not valid
  • Add input table name to MetaStoreEndFunctionContext for logging purposes
  • Track columns accessed in each table in a query
  • Split up tests in ptf_general_queries.q
  • Merge PTFDesc and PTFDef classes
  • Add apache headers in new files
  • Create hcatalog stub directory and add it to the build
  • Test:
  • add a way to run a small unit quickly
  • Remove redundant test codes
  • Make accept qfile argument for miniMR tests
  • TestMetaStoreAuthorization always uses the same port
  • Add more tests for windowing
  • add tests for distincts for hive.map.groutp.sorted
  • Update list bucketing test results
  • Wish:
  • Result of mapjoin_test_outer.q is not deterministic

New in version 0.10.0 (May 21st, 2013)

  • Sub-task:
  • Optimizer statistics on columns in tables and partitions
  • Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV)
  • Remove the duplicate JAR entries from the (“test.classpath”) to avoid command line exceeding char limit on windows
  • Windows: Fix the unit tests which contains “!” commands (Unix shell commands)
  • FileUtils.tar does not close input files
  • Fix “TestDosToUnix” unit tests on Windows by closing the leaking file handle in DosToUnix.java.
  • Fix the “TestHiveHistory”, “TestHiveConf”, & “TestExecDriver” unit tests on Windows by fixing the path related issues.
  • Handle “CRLF” line endings to avoid the extra spacing in generated test outputs in Windows. (Utilities.Java :: readColumn)
  • Remove the Unix specific absolute path of “Cat” utility in several .q files to make them run on Windows with CygWin in path.
  • PartitionPruner should log why it is not pushing the filter down to JDO
  • Bug:
  • cluster by multiple columns does not work if parenthesis is present
  • Nested UDAFs cause Hive Internal Error (NullPointerException)
  • DESCRIBE TABLE syntax doesn't support specifying a database qualified table name
  • mapjoin sometimes gives wrong results if there is a filter in the on condition
  • java.io.IOException: error=7, Argument list too long
  • Group by operator does not estimate size of Timestamp & Binary data correctly
  • LATERAL VIEW with EXPLODE produces ConcurrentModificationException
  • DROP DATABASE CASCADE does not drop non-native tables.
  • Nullpointer on registering udfs.
  • Hive Ivy dependencies on Hadoop should depend on jars directly, not tarballs
  • Make the header of RCFile unique
  • Upgrade Thrift dependency to 0.9.0
  • ability to select a view qualified by the database / schema name
  • Reduce Sink deduplication fails if the child reduce sink is followed by a join
  • Hive UDFs cannot emit binary constants
  • hive can't find hadoop executor scripts without HADOOP_HOME set
  • When integrating into MapReduce2, Hive is unable to handle corrupt rcfile archive
  • query_properties.q contains non-deterministic queries
  • NPE in "create index" without comment clause in external metastore
  • utc_from_timestamp and utc_to_timestamp returns incorrect results.
  • Task log retrieval fails on Hadoop 0.23
  • TestNegativeCliDriver autolocal1.q fails on 0.23
  • Renaming external partition changes location
  • ant gen-test failed
  • Hive error when dropping a table with large number of partitions
  • Hive Dynamic Partition Insert - move task not considering 'hive.exec.max.dynamic.partitions' from CLI
  • race condition in DAG execute tasks for hive
  • analyze command throw NPE when table doesn't exists
  • Hive should expand nested structs when setting the table schema from thrift structs
  • substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException
  • Queries consists of metadata-only-query returns always empty value
  • Hive JDBC doesn't support TIMESTAMP column
  • metastore delegation token is not getting used by hive commandline
  • GET_JSON_OBJECT fails on some valid JSON keys
  • Filter parsing does not recognize '!=' as operator and silently ignores invalid tokens
  • Fix maven-build Ant target
  • Fix test failure in TestNegativeCliDriver.dyn_part_max caused by HIVE-2918
  • Remove hadoop-source Ivy resolvers and Ant targets
  • Offline build is not working
  • Potential infinite loop / log spew in ZookeeperHiveLockManager
  • Memory leak in TUGIContainingTransport
  • TestCliDriver cannot be debugged with eclipse since hadoop_home is set incorrectly
  • Fix metastore test failures caused by HIVE-2757
  • Add JUnit to list of test dependencies managed by Ivy
  • Tests failing for me
  • Fix javadoc again
  • Update ShimLoader to work with Hadoop 2.x
  • escape more chars for script operator
  • hive docs target does not work
  • Modify clean target to remove ~/.ivy2/local/org.apache.hive ~/.ivy2/cache/org.apache.hive
  • Partition column values are not valid if any of virtual columns is selected
  • setup classpath for templates correctly for eclipse
  • TestHadoop20SAuthBridge always uses the same port
  • metastore.HiveMetaStore$HMSHandler should set the thread local raw store to null in shutdown()
  • hive.transform.escape.input breaks tab delimited data
  • revert HIVE-2703
  • Insert into table overwrites existing table if table name contains uppercase character
  • drop partition for non-string columns is failing
  • Drop partition problem
  • Filter on outer join condition removed while merging join tree
  • drop partition does not work for non-partition columns
  • Revert HIVE-2989
  • ROFL Moment. Numberator and denaminator typos
  • Oracle Metastore schema script doesn't include DDL for DN internal tables
  • make parallel tests work
  • Timestamp type values not having nano-second part breaks row
  • Hive tests should load Hive classes from build directory, not Ivy cache
  • Memory leak from large number of FileSystem instances in FileSystem.CACHE
  • Add HiveCLI that runs over JDBC
  • dropTable will all ways excute hook.rollbackDropTable whether drop table success or faild.
  • clear hive.metastore.partition.inherit.table.properties till HIVE-3109 is fixed
  • make copyLocal work for parallel tests
  • Hadoop20Shim. CombineFileRecordReader does not report progress within files
  • Error in Removing ProtectMode from a Table
  • sort_array doesn't work with LazyPrimitive
  • Generate & build the velocity based Hive tests on windows by fixing the path issues
  • Pass hconf values as XML instead of command line arguments to child JVM
  • use commons-compress instead of forking tar process
  • Drop table/index/database can result in orphaned locations
  • add an option in ptest to run on a single machine
  • Comment indenting is broken for "describe" in CLI
  • Bug in parallel test for singlehost flag
  • Dynamically generated paritions deleted by Block level merge
  • drop the temporary function at end of autogen_colalias.q
  • Fix non-deterministic testcases failures when running Hive0.9.0 on MapReduce2
  • Hive thrift code doesnt generate quality hashCode()
  • LazyBinaryObjectInspector.getPrimitiveJavaObject copies beyond length of underlying BytesWritable
  • Bucketed sort merge join doesn't work when multiple files exist for small alias
  • retry not honored in RetryingRawMetastore
  • Fix Eclipse classpath template broken in HIVE-3128
  • Drop partition throws NPE if table doesn't exist
  • Bucketed mapjoin on partitioned table which has no partition throws NPE
  • FileUtils.tar assumes wrong directory in some cases
  • JobDebugger should use RunningJob.getTrackingURL
  • Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
  • HiveConf.getPositionFromInternalName does not support more than sinle digit column numbers
  • NPE on a join query with authorization enabled
  • ColumnPruner is not working on LateralView
  • Make logging of plan progress in HadoopJobExecHelper configurable
  • Resource Leak: Fix the File handle leak in EximUtil.java
  • Fix non-deterministic results in newline.q and timestamp_lazy.q
  • Fix cascade_dbdrop.q when building hive on hadoop0.23
  • ignore white space between entries of hive/hbase table mapping
  • java primitive type for binary datatype should be byte[]
  • Sorted by order of table not respected
  • lack of semi-colon in .q file leads to missing the next statement
  • Upgrade guava to 11.0.2
  • Hive doesn't remove scrach directories while killing running MR job
  • Fix avro_joins.q testcase failure when building hive on hadoop0.23
  • alter the number of buckets for a non-empty partitioned table should not be allowed
  • bucketed mapjoin silently ignores mapjoin hint
  • HiveHistory.printRowCount() throws NPE
  • escaped columns in cluster/distribute/order/sort by are not working
  • expressions in cluster by are not working
  • Add avro jars into hive execution classpath
  • Fix autolocal1.q testcase failure when building hive on hadoop0.23 MR2
  • optimize union sub-queries
  • Table schema not being copied to Partitions with no columns
  • Convert runtime exceptions to semantic exceptions for missing partitions/tables in show/describe statements
  • bucket information should be used from the partition instead of the table
  • sort merge join may not work silently
  • fix fs resolvers
  • Load file into a table does not update table statistics
  • HIVE-3128 introduced bug causing dynamic partitioning to fail
  • Fix quote printing bug in mapreduce_stack_trace.q testcase failure when running hive on hadoop23
  • Race condition in query plan for merging at the end of a query
  • Fix error code inconsistency bug in mapreduce_stack_trace.q and mapreduce_stack_trace_turnoff.q when running hive on hadoop23
  • SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key
  • [Regression] TestMTQueries test is failing on trunk
  • Convert runtime exceptions to semantic exceptions for validation of alter table commands
  • Archives broken for hadoop 1.0
  • Change the rules in SemanticAnalyzer to use Operator.getName() instead of hardcoded names
  • shims unit test failures fails further test progress
  • Making hive tests run against different MR versions
  • Hive: Query misaligned result for Group by followed by Join with filter and skip a group-by result
  • Add junit exclude utility to disable testcases
  • Upgrade Hive's Avro dependency to version 1.7
  • bucketed map join should check that the number of files match the number of buckets
  • stats are not being collected correctly for analyze table with dynamic partitions
  • fpair on creating external table
  • Hive Metatool should take serde_param_key from the user to allow for changes to avro serde's schema url key
  • GenMRSkewJoinProcessor uses File.Separator instead of Path.Separator
  • map-reduce jobs does not work for a partition containing sub-directories
  • Missing column causes null pointer exception
  • Parallel test script doesnt run all tests
  • Dynamic partition queries producing no partitions fail with hive.stats.reliable=true
  • hive unit tests fail to get lock using zookeeper on windows
  • insert into statement overwrites if target table is prefixed with database name
  • Duplicate data possible with speculative execution for dynamic partitions
  • Remove the specialized logic to handle the file schemas in windows vs unix from build.xml
  • Bug fix: Return the child JVM exit code to the parent process to handle the error conditions
  • : Fix the file handle leaks in Symbolic & Symlink related input formats.
  • : Hiveserver is not closing the existing driver handle before executing the next command. It results in to file handle leaks.
  • joins using partitioned table give incorrect results on windows
  • RetryingRawStore logic needs to be significantly reworked to support retries within transactions
  • Hive List Bucketing - Skewed DDL doesn't support skewed value with string quote
  • CTAS in database with location on non-default name node fails
  • Some of the Metastore unit tests failing on Windows because of the static variables initialization problem in HiveConf class.
  • aggName of SemanticAnalyzer.getGenericUDAFEvaluator is generated in two different ways
  • Some of the JDBC test cases are failing on Windows because of the longer class path.
  • For UDAFs, when generating a plan without map-side-aggregation, constant agg parameters will be replaced by ExprNodeColumnDesc
  • Query plan for multi-join where the third table joined is a subquery containing a map-only union with hive.auto.convert.join=true is wrong
  • Avoid NPE in skewed information read
  • hivetest.py fails with --revision option
  • log4j template has logging threshold that hides all audit logs
  • Some of the tests are not deterministic
  • metadata_export_drop.q causes failure of other tests
  • QTestUtil side-effects
  • partition to directory comparison in CombineHiveInputFormat needs to accept partitions dir without scheme
  • ivysettings.xml does not let you override .m2/repository
  • Make separator for Entity name configurable
  • Hive info logging is broken
  • Avro Maps with Nullable Values fail with NPE
  • Incorrect partition bucket/sort metadata when overwriting partition with different metadata from table
  • ZooKeeperHiveLockManager does not respect the option to keep locks alive even after the current session has closed
  • derby metastore upgrade script throw errors when updating from 0.7 to 0.8
  • Output of sort merge join is no longer bucketed
  • union involving double column with a map join subquery will fail or give wrong results
  • Test "Path -> Alias" for explain extended
  • Hive always prints a warning message when using remote metastore
  • Drop database cascade fails when there are indexes on any tables
  • get_json_object and json_tuple return null in the presence of new line characters
  • Regression - HiveConf static variable causes issues in long running JVM insname of some metastore scripts are not per convention
  • Use varbinary instead of longvarbinary to store min and max column values in column stats schema
  • Metastore: Sporadic unit test failures
  • Create index fails on CLI using remote metastore
  • Hive Driver leaks ZooKeeper connections
  • Metastore tests use hardcoded ports
  • Error in groupSetExpression rule in Hive grammar
  • Multiple aggregates in query fail the job
  • PTest doesn't work due to hive snapshot version upgrade to 11
  • hive unit test case build failure.
  • The derby metastore schema script for 0.10.0 doesn't run
  • Must publish new Hive-0.10 artifacts to apache repository.
  • RetryingMetaStoreClient Should Log the Caught Exception
  • hive pom file has missing conf and scope mapping for compile configuration.
  • Oracle upgrade script for Hive is broken
  • Cannot drop partitions on table when using Oracle metastore
  • Hive JIRA still shows 0.10 as unreleased in "Affects Version/s" dropdown
  • HIVE_AUX_JARS_PATH should have : instead of , as separator since it gets appended to HADOOP_CLASSPATH
  • TestCase TestMTQueries fails with Non-Sun Java
  • Doc update for .8, .9 and .10
  • closeAllForUGI causes failure in hiveserver2 when fetching large amount of data
  • Improvement:
  • Ability to enforce correct stats
  • Add a configuration property that sets the variable substitution max depth
  • metastore 0.8 upgrade script for PostgreSQL
  • Collapse hive.metastore.uris and hive.metastore.local
  • Support auto completion for hive configs in CliDriver
  • Add validation to HiveConf ConfVars
  • Improve the HWI interface
  • Move global .hiverc file
  • Support non-MR fetching for simple queries with select/limit/filter operations only
  • [hive] Provide error message when using UDAF in the place of UDF instead of throwing NPE
  • pass a environment context to metastore thrift APIs
  • hive custom scripts do not work well if the data contains new lines
  • Make the new header for RC Files introduced in HIVE-2711 optional
  • Collect_set Aggregate does uneccesary check for value.
  • JDBC cannot find metadata for tables/columns containing uppercase character
  • Improve HiveMetaStore logging
  • add findbugs in build.xml
  • Add option to make multi inserts more atomic
  • Release codecs and output streams between flushes of RCFile
  • Typo in dynamic partitioning code bits, says "genereated" instead of "generated" in some places.
  • Add hive command for resetting hive confs
  • Support Bucketed mapjoin on partitioned table which has two or more partitions
  • BucketizedHiveInputFormat should be automatically used with SMBJoin
  • getting the reporter in the recordwriter
  • Enable Metastore audit logging for non-secure connections
  • Propagates filters which are on the join condition transitively
  • enum to string conversions
  • Create Table Like should copy configured Table Parameters
  • As a follow up for HIVE-3276, optimize union for dynamic partition queries
  • Keep the original query in HiveDriverRunHookContextImpl
  • get_json_object and json_tuple should use Jackson library
  • .23 compatibility: shim job.tracker.address
  • Add Retries to Hive MetaStore Connections
  • Yet better error message in CLI on invalid column name
  • All operators's conf should inherit from a common class
  • Support partial partition specifications in when enabling/disabling protections in Hive
  • perform a map-only group by if grouping key matches the sorting properties of the table
  • Provide backward compatibility for AvroSerDe properties
  • Hive maven-publish ant task should be configurable
  • To add instrumentation to capture if there is skew in reducers
  • Log client IP address with command in metastore's startFunction method
  • Allow Partition Offline Enable/Disable command to be specified at the ds level even when Partition is based on more columns than ds
  • Refactor Partition Pruner so that logic can be reused.
  • Storing certain Exception objects thrown in HiveMetaStore.java in MetaStoreEndFunctionContext
  • Early skipping for limit operator at reduce stage
  • Access to external URLs in hivetest.py
  • Add/fix facility to collect operator specific statisticsin hive + add hash-in/hash-out counter for GroupBy Optr
  • Revert HIVE-3268
  • TCP KeepAlive and connection timeout for the HiveServer
  • Make prompt in Hive CLI configurable
  • Reset operator-id before executing parse tests
  • RetryingHMSHandler should wrap JDOException inside MetaException
  • Catch the NPe when using ^D to exit from CLI
  • getBoolVar in FileSinkOperator can be optimized
  • Round map/reduce progress down when it is in the range [99.5, 100)
  • New Feature:
  • Allow SELECT without a mapreduce job
  • Add SerDe for Avro serialized data
  • Implement "show create table"
  • Support with rollup option for group by
  • replace or translate function in hive
  • Implement SHOW TBLPROPERTIES
  • Support standard cross join syntax
  • Add FORMAT UDF
  • Optionally use framed transport with metastore
  • SHOW COLUMNS table_name; to provide a comma-delimited list of columns.
  • Support for Oracle-backed Hive-Metastore ("longvarchar" to "clob" in package.jdo)
  • Returning Meaningful Error Codes & Messages
  • Create a new metastore tool to bulk update location field in Db/Table/Partition records
  • Add the option -database DATABASE in hive cli to specify a default database to use for the cli session.
  • Add ability to export table metadata as JSON on table drop
  • Hive List Bucketing - DDL support
  • Skewed Join Optimization
  • Disallow certain character patterns in partition names
  • A table generating, table generating function
  • sort merge join should work if both the tables are sorted in descending order
  • Implement CUBE and ROLLUP operators in Hive
  • Implement grouping sets in hive
  • Hive List Bucketing - Query logic
  • Add a command "Explain dependency ..."
  • Hive List Bucketing - set hive.mapred.supports.subdirectories
  • Hive List Bucketing - enhance DDL to specify list bucketing table
  • Adding authorization capability to the metastore
  • Add support for phonetic algorithms in Hive
  • Task:
  • Move RegexSerDe out of hive-contrib and over to hive-serde
  • RCFileMergeMapper Prints To Standard Output Even In Silent Mode
  • Implement INCLUDE_HADOOP_MAJOR_VERSION test macro
  • Revert HIVE-2986
  • Add hive.exec.rcfile.use.explicit.header to hive-default.xml.template
  • hive.binary.record.max.length is a magic string
  • Extract global limit configuration to optimizer
  • Improve Performance of UDF PERCENTILE_APPROX()
  • Track table and keys used in joins and group bys for logging
  • Unescape partition names returned by show partitions
  • Update website with info on how to report security bugs
  • Test:
  • TestHiveServerSessions hangs when executed directly
  • TestRemoteHiveMetaStoreIpAddress always uses the same port
  • Stop testing concat of partitions containing control characters.
  • Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk
  • Add tests for 'm' bigs tables sortmerge join with 'n' small tables where both m,n>1
  • add tests to use bucketing metadata for partitions
  • Add more tests where output of sort merge join is sorted
  • New test cases added by HIVE-3676 in insert1.q is not deterministic
  • Wish:
  • Log Time To Submit metric with PerfLogger

New in version 0.9.0 (May 21st, 2013)

  • Sub-task:
  • add DOAP file for Hive
  • Enable/Add type-specific compression for rcfile
  • Move retry logic in HiveMetaStore to a separe class
  • Add support for filter pushdown for key ranges in hbase for keys of type string
  • Bug:
  • Hive Server getSchema() returns wrong schema for "Explain" queries
  • "hdfs" is hardcoded in few places in the code which inhibits use of other file systems
  • show functions also returns internal operators
  • Not using map aggregation, fails to execute group-by after cluster-by with same key
  • HiveServer should provide per session configuration
  • Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory
  • left semi join will duplicate data
  • Compact index table's files merged in creation
  • Passing user identity from metastore client to server in non-secure mode
  • Insert overwrite table db.tname fails if partition already exists
  • Describe partition returns table columns but should return partition columns
  • Make a single Hive binary work with both 0.20.x and 0.23.0
  • Make Hive work with Hadoop 1.0.0
  • ignore exception for external jars via reflection
  • wrong class loader used for external jars
  • Force Bash shell on parallel test slave nodes
  • Parallel tests fail if master directory is not present
  • Allow multiple ptest runs by the same person
  • Parallel test commands that include cd fail
  • "hive.querylog.location" requires parent directory to be exist or else folder creation fails
  • builtins JAR is not being published to Maven repo & hive-cli POM does not depend on it either
  • Need better exception handling in RCFile tolerate corruptions mode
  • StackOverflowError when using custom UDF in map join
  • Eclipse launch configurations fail due to unsatisfied builtins JAR dependency
  • get_partitions_ps throws TApplicationException if table doesn't exist
  • SUCESS is misspelled
  • a bug in 'alter table concatenate' that causes filenames getting double url encoded
  • SemanticAnalyzer twice swallows an exception it shouldn't
  • StackOverflowError when using custom UDF after adding archive after adding jars
  • Lots of special characters are not handled in LIKE
  • NPE in union followed by join
  • Remove unused lib/log4j-1.2.15.jar
  • Fix flaky testing infrastructure
  • Fix some nondeterministic test output
  • PlanUtils.configureTableJobPropertiesForStorageHandler() is not called for partitioned table
  • Single binary built against 0.20 and 0.23, does not work against 0.23 clusters.
  • Metastore client doesn't log properly in case of connection failure to server
  • CONV returns incorrect results sometimes
  • Hive multi group by single reducer optimization causes invalid column reference error
  • Remove empty java files
  • NPE in union with lateral view
  • union follwowed by union_subq does not work if the subquery union has reducers
  • Metastore is caching too aggressively
  • Change global_limit.q into linux format file
  • Remove lib/javaewah-0.3.jar
  • Alter Table Partition Concatenate Fails On Certain Characters
  • union with a multi-table insert is not working
  • make union31.q deterministic
  • Fail on table sampling
  • New BINARY type produces unexpected results with supported UDFS when using MapReduce2
  • filter is still removed due to regression of HIVE-1538 althougth HIVE-2344
  • SUBSTR(CAST( AS BINARY)) produces unexpected results
  • Disable loadpart_err.q on 0.23
  • Export LANG=en_US.UTF-8 to environment while running tests
  • typo in configuration parameter
  • TestContribCliDriver.dboutput and TestCliDriver.input45 fail on 0.23
  • Fix test failures caused by HIVE-2716
  • insert into external tables should not be allowed
  • cleanup readentity/writeentity
  • INPUT__FILE__NAME virtual column returns unqualified paths on Hadoop 0.23
  • Fix TestCliDriver escape1.q failure on MR2
  • QTestUtil.cleanUp() fails with FileNotException on 0.23
  • Ambiguous table name or column reference message displays when table and column names are the same
  • Renaming partition changes partition location prefix
  • Metastore client doesnt close connection properly
  • Hive union with NULL constant and string in same column returns all null
  • BlockMergeTask Doesn't Honor Job Configuration Properties when used directly
  • TestStatsPublisherEnhanced throws NPE on JDBC connection failure
  • testAclPositive in TestZooKeeperTokenStore failing in clean checkout when run on Mac
  • HiveFileFormatUtils should use Path.SEPARATOR instead of File.Separator
  • GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger]
  • several jars in hive tar generated are not required
  • JOIN + LATERAL VIEW + MAPJOIN fails to return result (seems to stop halfway through and no longer do the final reduce part)
  • Regression - HiveConf static variable causes issues in long running JVM instances with /tmp/ data
  • TestCliDriver (script_pipe.q) failed with IBM JDK
  • Doc update for .8, .9 and .10
  • Improvement:
  • use sed rather than diff for masking out noise in diff-based tests
  • parallelize test query runs
  • Add java_method() as a synonym for the reflect() UDF
  • Extend concat_ws() UDF to support arrays of strings
  • When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double
  • Add timestamp column to the partition stats table.
  • pull junit jar from maven repos via ivy
  • Add target to install Hive JARs/POMs in the local Maven cache
  • Expose the HiveConf in HiveConnection API
  • Newly created partition should inherit properties from table
  • Make index table output of create index command if index is table based
  • move one line log from MapOperator to HiveContextAwareRecordReader
  • Add alterPartition to AlterHandler interface
  • fix Hive-2566 and make union optimization more aggressive
  • The variable hive.exec.mode.local.auto.tasks.max should be changed
  • Change arc config to hide generated files from Differential by default
  • Add Ant configuration property for dumping classpath of tests
  • Support for metastore service specific HADOOP_OPTS environment setting
  • The row count that loaded to a table may not right
  • Add 'ivy-clean-cache' and 'very-clean' Ant targets
  • Make ZooKeeper token store ACL configurable
  • Views should be added to the inputs of queries.
  • TestCliDriver should log elapsed time
  • Obtain delegation tokens for MR jobs in secure hbase setup
  • hbase handler uses ZooKeeperConnectionException which is not compatible with HBase versions other than 0.89
  • HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
  • Improve hooks run in Driver
  • HBaseSerDe should allow users to specify the timestamp passed to Puts
  • View partitions do not have a storage descriptor
  • Make the IP address of a Thrift client available to HMSHandler.
  • Add logging of total run time of Driver
  • Concatenating a partition does not inherit location from table
  • Implement nullsafe equi-join
  • Cache error messages for additional logging
  • Change default configuration for hive.exec.dynamic.partition
  • Fix javadoc warnings
  • Remove zero length files
  • Add pre event listeners to metastore
  • Cache remote map reduce job stack traces for additional logging
  • Support eventual constant expression for filter pushdown for key ranges in hbase
  • If hive history file's directory doesn't exist don't crash
  • hive-config.sh should honor HIVE_HOME env
  • Cache local map reduce job errors for additional logging
  • Add a new hook to run at the beginning and end of the Driver.run method
  • Store which configs the user has explicitly changed
  • Add "rat" target to build to look for missing license headers
  • Remove redundant key comparing in SMBMapJoinOperator
  • TextConverter for UDF's is inefficient if the input object is already Text or Lazy
  • Hive: Extend ALTER TABLE DROP PARTITION syntax to use all comparators
  • Add license to the Hive files
  • Hive metastore does not have any log messages while shutting itself down.
  • Remove need for storage descriptors for view partitions
  • Add support for filter pushdown for composite keys
  • New Feature
  • Allow access to Primitive types stored in binary format in HBase
  • Implement BETWEEN operator
  • Implement sort_array UDF
  • Add reset operation and average time attribute to Metrics MBean.
  • add support for insert partition overwrite(...) if not exists
  • support hive table/partitions exists in more than one region
  • Allow multiple group bys with the same input data and spray keys to be run on the same reducer.
  • Add PRINTF() Udf
  • Enable Hadoop-1.0.0 in Hive
  • Implement NULL-safe equality operator
  • Filter pushdown in hbase for keys stored in binary format
  • Closed range scans on hbase keys
  • Add JSON output to the hive ddl commands
  • RCFile Reader doesn't provide access to Metadata
  • Add nicer helper functions for adding and reading metadata from RCFiles
  • Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory
  • Task:
  • Hive Web Server startup messages logs incorrect path it is searching for WAR
  • Fix test failures caused by HIVE-2589
  • Upgrade Hbase and ZK dependcies
  • Add a getAuthorizationProvider to HiveStorageHandler
  • Move metastore upgrade scripts labeled 0.10.0 into scripts labeled 0.9.0
  • Remove unnecessary JAR dependencies
  • Revert HIVE-2612
  • Revert HIVE-2795
  • Row number issue in hive
  • Test:
  • Test ppr_pushdown.q is failing on trunk
  • add a testcase for partitioned view on union and base tables have index
  • Wish:
  • Clean-up logs

New in version 0.8.0 (May 21st, 2013)

  • New Feature:
  • Add TIMESTAMP column type for thrift dynamic_type
  • Support "INSERT [INTO] destination"
  • Triggers when a new partition is created for a table
  • Create a Hive CLI that connects to hive ThriftServer
  • Allow type widening on COALESCE/UNION ALL
  • Add support of columnar binary serde
  • optimize metadata only queries
  • Partitioning columns should be of primitive types only
  • add an interface in RCFile to support concatenation of two files without (de)compression
  • Allow users to specify LOCATION in CREATE DATABASE statement
  • Accelerate GROUP BY execution using indexes
  • Implement map_keys() and map_values() UDFs
  • Extend Explode UDTF to handle Maps
  • Implement bitmap indexing in Hive
  • Add export/import facilities to the hive system
  • support explicit view partitioning
  • Block merge for RCFile
  • Add "DROP DATABASE ... CASCADE/RESTRICT"
  • Input Sampling By Splits
  • extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)
  • Add get_table_objects_by_name() to Hive MetaStore
  • Add api for marking / querying set of partitions for events
  • support grouping on complex types in Hive
  • Purge expired events
  • Cli: Print Hadoop's CPU milliseconds
  • Add a Plugin Developer Kit to Hive
  • add TIMESTAMP data type
  • Support archiving for multiple partitions if the table is partitioned by multiple columns
  • Add Binary Datatype in Hive
  • Allow Hive to be debugged remotely
  • Literal bigint
  • Allow UDFs to specify additional FILE/JAR resources necessary for execution
  • Bug;
  • better error code from Hive describe command
  • Join operation fails for some queries
  • Improve the error messages for missing/incorrect UDF/UDAF class
  • CREATE TABLE t LIKE some_view should create a new empty base table, but instead creates a copy of view
  • describe parse_url throws an error
  • Predicate push down get error result when sub-queries have the same alias name
  • Clean up references to 'hive.metastore.local'
  • FilterOperator is applied twice with ppd on.
  • ProxyFileSystem.close calls super.close twice.
  • job name for alter table archive partition is not correct
  • JDBC driver returns wrong precision, scale, or column size for some data types
  • SAXParseException on plan.xml during local mode.
  • Different defaults for hive.metastore.local
  • alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?)
  • Potential risk of resource leaks in Hive
  • DDLSemanticAnalyzer won't take newly set Hive parameters
  • Metastore operations (like drop_partition) could be improved in terms of maintaining consistency of metadata and data
  • Potential memory leak when same connection used for long time. TaskInfo and QueryInfo objects are getting accumulated on executing more queries on the same connection.
  • Don't set ivy.home in build-common.xml
  • Auto convert mapjoin should not throw exception if the top operator is union operator.
  • Getting error when join on tables where name of table has uppercase letters
  • In error scenario some opened streams may not closed in ScriptOperator.java, Utilities.java
  • "insert overwrite directory" Not able to insert data with multi level directory path
  • Exception should be thrown when invalid jar,file,archive is given to add command
  • Merging using mapreduce rather than map-only job failed in case of dynamic partition inserts
  • HWI admin_list_jobs JSP page throws exception
  • Make the delegation token issued by the MetaStore owned by the right user
  • Add inputs and outputs to authorization DDL commands
  • LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
  • keyword_1.q is failing
  • Making JDO thread-safe by default
  • In Driver.execute(), mapred.job.tracker is not restored if one of the task fails.
  • Fix TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore broken by HIVE-2022
  • Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load statement.
  • create database does not honour warehouse.dir in dbproperties
  • A database's warehouse.dir is not used for tables created in it.
  • Backport HIVE-1991 after overridden by HIVE-1950
  • Merge result file size should honor hive.merge.size.per.task
  • the retry logic in Hive's concurrency is not working correctly.
  • In error scenario some opened streams may not closed
  • TCTLSeparatedProtocol.SimpleTransportTokenizer.nextToken() throws Null Pointer Exception in some cases
  • Exception on windows when using the jdbc driver. "IOException: The system cannot find the path specified"
  • CLI local mode hit NPE when exiting by ^D
  • Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility
  • HivePreparedStatement.executeImmediate always throw exception
  • NullPointerException on getSchemas
  • Few code improvements in the ql and serde packages.
  • Bug: RowContainer was set to 1 in JoinUtils.
  • Add test coverage for external table data loss issue
  • auto convert map join bug
  • throw a error if the input is larger than a threshold for index input format
  • Make couple of convenience methods in EximUtil public
  • virtual column references inside subqueries cause execution exceptions
  • Log4J initialization info should not be printed out if -S is specified
  • In shell mode, local mode continues if a local-mode task throws exception in pre-hooks
  • insert overwrite ignoring partition location
  • auto convert map join may miss good candidates
  • Remove usage of deprecated methods from org.apache.hadoop.io package
  • alter table concatenate fails and deletes data
  • Bitmap Operation UDF doesn't clear return list
  • Exception when no splits returned from index
  • Jobs do not get killed even when they created too many files.
  • NPE during parsing order-by expression
  • Block Sampling should adjust number of reducers accordingly to make it useful
  • Too many open files in running negative cli tests
  • Stats JDBC LIKE queries should escape '_' and '%'
  • NPE in MapJoinObjectKey
  • TableSample(percent ) uses one intermediate size to be int, which overflows for large sampled size, making the sampling never triggered.
  • Few code improvements in the metastore,hwi and ql packages.
  • Schema creation scripts are incomplete since they leave out tables that are specific to DataNucleus
  • Log related Check style Comments fixes
  • Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.
  • Avoid null pointer exception when executing UDF
  • In Task class and its subclasses logger is initialized in constructor
  • Few improvements in org.apache.hadoop.hive.ql.metadata.Hive.close()
  • Dynamic Partitioning Failing because of characters not supported globStatus
  • Stats table schema incompatible after HIVE-2185
  • Ensure HiveConf includes all properties defined in hive-default.xml
  • SessionState used before ThreadLocal set
  • While using Hive in server mode, HiveConnection.close() is not cleaning up server side resources
  • incorrect success flag passed to jobClose
  • unable to get column names for a specific table that has '_' as part of its table name
  • Fix a bug caused by HIVE-243
  • CommandNeedRetryException.java is missing ASF header
  • runnable queue in Driver and DriverContext is not thread safe
  • hive fails to build in eclipse due to syntax error in BitmapIndexHandler.java
  • Can't publish maven release artifacts to apache repository
  • Comparison Operators convert number types to common type instead of double if possible
  • Merge failing of join tree in exceptional case
  • Enable TestHadoop20SAuthBridge
  • Skip comments in hive script
  • ExecDriver::addInputPaths should pass the table properties to the record writer
  • Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions
  • Fix Inconsistency between RB and JIRA patches for HIVE-2194
  • Regression introduced from HIVE-2155
  • ClassCastException when building index with security.authorization turned on
  • Error during UNARCHIVE of a partition
  • Comment clause should immediately follow identifier field in CREATE DATABASE statement
  • Allow ShimLoader to work with Hadoop 0.20-append
  • bad compressed file names from insert into
  • Fix UDAFPercentile to tolerate null percentiles
  • files with control-A,B are not delimited correctly.
  • Schema creation scripts for PostgreSQL use bit(1) instead of boolean
  • Incorrect regular expression for extracting task id from filename
  • DatabaseMetadata.getColumns() does not return partition column names for a table
  • Calling alter_table after changing partition comment throws an exception
  • Add ColumnarSerDe to the list of native SerDes
  • Turn off bitmap indexing when map-side aggregation is turned off
  • hive.zookeeper.session.timeout is set to null in hive-default.xml
  • Turn off compression when generating index intermediate results
  • DESCRIBE TABLE causes NPE when hive.cli.print.header=true
  • Indexes are still automatically queried when out of sync with their source tables
  • Predicate pushdown erroneously conservative with outer joins
  • Alter table always throws an unhelpful error on failure
  • mirror.facebook.net is 404ing
  • stats not updated for non "load table desc" operations
  • filter is removed due to regression of HIVE-1538
  • Fix udtf_explode.q and udf_explode.q test failures
  • JDBC DatabaseMetaData and ResultSetMetaData need to match for particular types
  • HiveConf properties not appearing in the output of 'set' or 'set -v'
  • Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the old COLUMNS table
  • Slow dropping of partitions caused by full listing of storage descriptors
  • Minor typo in error message in HiveConnection.java (JDBC)
  • Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation
  • Incorrect alias filtering for predicate pushdown
  • import of multiple partitions from a partitioned table with external location overwrites files
  • Add Mockito to LICENSE file
  • published POMs in Maven repo are incorrect
  • Fix whitespace test diff accidentally introduced in HIVE-1360
  • Hive server doesn't return schema for 'set' command
  • Function like with empty string is throwing null pointer exception
  • get_privilege does not get user level privilege
  • File extensions not preserved in Hive.checkPaths when renaming new destination file
  • Metastore server tries to connect to NN without authenticating itself
  • Update Eclipse configuration to include Mockito dependency
  • BlockMergeTask ignores client-specified jars
  • Merging of compressed rcfiles fails to write the valuebuffer part correctly
  • skip corruption bug that cause data not decompressed
  • upgrading thrift version didn't upgrade libthrift.jar symlink correctly
  • TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
  • Pass correct remoteAddress in proxy user authentication
  • remove all @author tags from source
  • fix Eclipse for javaewah upgrade
  • Primitive Data Types returning null if the data is out of range of the data type.
  • mapjoin_subquery dump small table (mapjoin table) to the same file
  • Metastore statistics are not being updated for CTAS queries.
  • Hive PDK needs an Ivy configuration file
  • HadoopJobExecHelper does not handle null counters well
  • Phabricator for code review
  • Bug from HIVE-2446, the code that calls client stats publishers run() methods is in wrong place, should be in the same method but inside of while (!rj.isComplete()) {} loop
  • PDK tests failing on Hudson because HADOOP_HOME is not defined
  • PDK PluginTest failing on Hudson
  • partition pruning prune some right partition under specific conditions
  • small table filesize for automapjoin is not consistent in HiveConf.java and hive-default.xml
  • When new instance of Hive (class) is created, the current database is reset to default (current database shouldn't be changed).
  • Hive throws Null Pointer Exception upon CREATE TABLE . .... if the given doesn't exist
  • cleaunup QTestUtil: use test.data.files as current directory if one not specified
  • Dynamic partition insert should enforce the order of the partition spec is the same as the one in schema
  • HIVE-2446 bug (next one) - If constructor of ClientStatsPublisher throws runtime exception it will be propagated to HadoopJobExecHelper's progress method and beyond, whereas it shouldn't
  • Allow people to use only issue numbers without 'HIVE-' prefix with `arc diff --jira`.
  • Evaluation of non-deterministic/stateful UDFs should not be skipped even if constant oi is returned.
  • HiveIndexResult creation fails due to file system issue
  • Support scientific notation for Double literals
  • How to submit documentation fixes
  • Provide jira_base_url for improved arc commit workflow
  • upgrade script 008-HIVE-2246.mysql.sql contains syntax errors
  • HIVE-2247 Changed the Thrift API causing compatibility issues.
  • Add Java linter to Hive
  • HIVE-2246 upgrade script needs to drop foreign key in COLUMNS_OLD
  • eclipse template .classpath is broken
  • HIVE-2246 upgrade script changed the COLUMNS_V2.COMMENT length
  • ivy offline mode broken by changingPattern and checkmodified attributes
  • Debug mode in some situations doesn't work properly when child JVM is started from MapRedLocalTask
  • Hive build fails with error "java.io.IOException: Not in GZIP format"
  • explain task: getJSONPlan throws a NPE if the ast is null
  • bug in ivy 2.2.0 breaks build
  • Update arcconfig to include commit listener
  • HBase bulk load wiki page improvements
  • Update README.txt file to use description from wiki
  • HiveCli eclipse launch configuration hangs
  • Hive POMs reference the wrong Hadoop artifacts
  • Fix eclipse classpath template broken in HIVE-2523
  • Fix maven-build Ant target
  • TestHiveServer doesn't produce a JUnit report file
  • revert HIVE-2566
  • Recent patch prevents Hadoop confs from loading in 0.20.204
  • Improvement:
  • CREATE VIEW followup: CREATE OR REPLACE
  • Allow UDFs to access constant parameter values at compile time
  • increase hive.mapjoin.maxsize to 10 million
  • use filter pushdown for automatically accessing indexes
  • HivePreparedStatement.executeImmediate(String sql) is breaking the exception stack
  • Improve miscellaneous error messages
  • support NOT IN and NOT LIKE syntax
  • HiveInputFormat.readFields should print the cause when there's an exception
  • Ctrl+c should kill currently running query, but not exit the CLI
  • The class HiveResultSet should implement batch fetching.
  • Task-cleanup task should be disabled
  • HIVE-78 Followup: group partitions by tables when do authorizations and there is no partition level privilege
  • Change Default Alias For Aggregated Columns (_c1)
  • mapjoin operator should not load hashtable for each new inputfile if the hashtable to be loaded is already there.
  • recognize transitivity of predicates on join keys
  • Hive Shell to output number of mappers and number of reducers
  • Support new annotation @UDFType(stateful = true)
  • adding comments to Hive Stats JDBC queries
  • Expand exceptions caught for metastore operations
  • avoid loading Hive aux jars in CLI remote mode
  • Create a separate namespace for Hive variables
  • Performance instruments for client side execution
  • isEmptyPath() to use ContentSummary cache
  • Use block-level merge for RCFile if merging intermediate results are needed
  • Update bitmap indexes for automatic usage
  • Metastore listener
  • remove hadoop version check from hive cli shell script
  • getInputSummary() to call FileSystem.getContentSummary() in parallel
  • PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary
  • Generate single MR job for multi groupby query if hive.multigroupby.singlemr is enabled.
  • Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
  • SHOW GRANT grantTime field should be a human-readable timestamp
  • Reduce memory consumption in preparing MapReduce job
  • Increase the number of operator counter
  • No lock for some non-mapred tasks config variable hive.lock.mapred.only.operation added
  • Optimizer on partition field
  • Hive's symlink text input format should be able to work with ComineHiveInputFormat
  • Improve stats gathering reliability by retries on failures with hive.stats.retries.max and hive.stats.retries.wait
  • Automatic Indexing with multiple tables
  • DROP TABLE IF EXISTS should not fail if a view of that name exists
  • Remove System.exit
  • Enables HiveServer to accept -hiveconf option
  • reduce workload generated by JDBCStatsPublisher
  • Add api to send / receive message to metastore
  • Add interface classification in Hive.
  • add exception handling to hive's record reader
  • Improve error messages emitted during semantic analysis
  • Improve error messages emitted during task execution
  • Allow custom serdes to set field comments
  • Allow optional [inner] on equi-join.
  • Add actions for alter table and alter partition events for metastore event listeners
  • reduce name node calls in hive by creating temporary directories
  • create a new API in Warehouse where the root directory is specified
  • Provide a way by which ObjectInspectorUtils.compare can be extended by the caller for comparing maps which are part of the object
  • ALTER VIEW RENAME
  • Optimize partial specification metastore functions
  • add Query text for debugging in lock data
  • speedup addInputPaths
  • Make "alter table drop partition" more efficient
  • Provide metastore upgarde script for HIVE-2215
  • Ability to add partitions atomically
  • Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
  • Show current database in hive prompt
  • Make CombineHiveInputFormat the default hive.input.format
  • Dedupe tables' column schemas from partitions in the metastore db
  • Display a sample of partitions created when Fatal Error occurred due to too many partitioned created
  • Better error message in CLI on invalid column name
  • Local mode needs to work well with block sampling
  • bucketized map join should allow join key as a superset of bucketized columns
  • Improve error messages for DESCRIBE command
  • Optimize Hive query startup time for multiple partitions
  • Add hooks to run when execution fails.
  • Make Hadoop Job ID available after task finishes executing
  • Improve RCFile Read Speed
  • Support automatic rebuilding of indexes when they go stale
  • Make performance logging configurable.
  • Improve RCFileCat performance significantly
  • Warn user that precision is lost when bigint is implicitly cast to double.
  • Local Mode can be more aggressive if LIMIT optimization is on
  • RCFileReader Buffer Reuse
  • Allow RCFile Reader to tolerate corruptions
  • make hive mapper initialize faster when having tons of input files
  • The PerfLogger should log the full name of hooks, not just the simple name.
  • Introduction of client statistics publishers possibility
  • Add job ID to MapRedStats
  • Upgrade JavaEWAH to 0.3
  • move lock retry logic into ZooKeeperHiveLockManager
  • Need a way to categorize queries in hooks for improved logging
  • JDBCStatsAggregator DELETE STATEMENT should escape _ and %
  • Files in Avro-backed Hive tables do not have a ".avro" extension
  • Group-by query optimization Followup: add flag in conf/hive-default.xml
  • Add method to PerfLogger to perform cleanup/final steps.
  • make INNER a non-reserved keyword
  • HA Support for Metastore Server
  • Improve support for Constant Object Inspectors
  • Log more Hadoop task counter values in the MapRedStats class.
  • Enable ALTER TABLE SET SERDE to work on partition level
  • Update junit jar in testlibs
  • Get ConstantObjectInspectors working in UDAFs
  • Make Constant OIs work with UDTFs.
  • add a new builtins subproject
  • Consecutive string literals should be combined into a single string literal.
  • Use sorted nature of compact indexes
  • Make metastore log4j configuration file configurable again.
  • add explain formatted
  • Use hashing instead of list traversal for IN operator for primitive types
  • reduce the number map-reduce jobs for union all
  • Too much debugging info on console if a job failed
  • avoid referencing /tmp in tests
  • Setting no_drop on a table should cascade to child partitions
  • Add caching to json_tuple
  • Add hook to run in metastore's endFunction which can collect more fb303 counters
  • Task:
  • Hive in Maven
  • Provide Metastore upgrade scripts and default schemas for PostgreSQL
  • Remaining patch for HIVE-2148
  • Use the version commons-codec from Hadoop
  • Upgrade Hive's Thrift dependency to version 0.7.0
  • Metastore upgrade scripts for schema change introduced in HIVE-2215
  • Metastore upgrade script and schema DDL for Hive 0.8.0
  • Make Hive compile against Hadoop 0.23
  • Add pdk, hbase-handler etc as source dir in eclipse
  • Update wiki links in README file
  • Omit incomplete Postgres upgrade scripts from release tarball
  • Sub-task:
  • Support JDBC ResultSetMetadata
  • Bundle Log4j configuration files in Hive JARs
  • Push down partition pruning to JDO filtering for a subset of partition predicates
  • batch processing partition pruning process
  • Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema()
  • Partition Pruning bug in the case of hive.mapred.mode=nonstrict
  • Return correct Major / Minor version numbers for Hive Driver
  • add the HivePreparedStatement implementation based on current HIVE supported data-type
  • add a TM to Hive logo image
  • Update project naming and description in Hive wiki
  • Update project naming and description in Hive website
  • update project website navigation links
  • add trademark attributions to Hive homepage
  • Update project description and wiki link in ivy.xml files
  • Test:
  • Test that views with joins work properly
  • TestLazySimpleSerde fails randomly
  • create a test to verify that partition pruning works for partitioned views with a union
  • Wish;
  • ^C breaks out of running query, but not whole CLI

New in version 0.7.0 (May 21st, 2013)

  • New Feature:
  • Authorization infrastructure for Hive
  • Implement Indexing in Hive
  • Add reflect() UDF for reflective invocation of Java methods
  • Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
  • Authentication Infrastructure for Hive
  • Hive Variables
  • Concurrency Model for Hive
  • add row_sequence UDF
  • hive command line option -i to run an init file before other SQL commands
  • add option to let hive automatically run in local mode based on tunable heuristics
  • bring a table/partition offline
  • sentences() UDF for natural language tokenization
  • ngrams() UDAF for estimating top-k n-gram frequencies
  • Be able to modify a partition's fileformat and file location information.
  • context_ngrams() UDAF for estimating top-k contextual n-grams
  • Add json_tuple() UDTF function
  • Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.
  • Add ANSI SQL correlation aggregate function CORR(X,Y).
  • Support partition filtering in metastore
  • Patch to allows scripts in S3 location
  • Implement "SHOW TABLES {FROM | IN} db_name"
  • parse_url_tuple: a UDTF version of parse_url
  • Default values for parameters
  • Implement GenericUDF str_to_map
  • Patch to support HAVING clause in Hive
  • track the joins which are being converted to map-join automatically
  • Call frequency and duration metrics for HiveMetaStore via jmx
  • maintain lastAccessTime in the metastore
  • Make Hive database data center aware
  • Add a new local mode flag in Task.
  • Better auto-complete for Hive
  • Support ALTER DATABASE to change database properties
  • Implement DROP TABLE/VIEW ... IF EXISTS
  • Implement DROP {PARTITION, INDEX, TEMPORARY FUNCTION} IF EXISTS
  • Make the MetaStore filesystem interface pluggable via the hive.metastore.fs.handler.class configuration property
  • add an option (hive.index.compact.file.ignore.hdfs) to ignore HDFS location stored in index files.
  • Verbose/echo mode for the Hive CLI
  • Improvement:
  • Provide option to export a HEADER
  • Support for distinct selection on two or more columns
  • describe extended table/partition output is cryptic
  • Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.
  • Tapping logs from child processes
  • support filter pushdown against non-native tables
  • replace dependencies on HBase deprecated API
  • use Ivy for fetching HBase dependencies
  • Make Hive work with Hadoop security
  • Return value for map, array, and struct needs to return a string
  • do not update transient_lastDdlTime if the partition is modified by a housekeeping operation
  • automatically invoke .hiverc init script
  • add CLI command for executing a SQL script
  • serializing/deserializing the query plan is useless and expensive
  • Extend ivy offline mode to cover metastore downloads
  • Add support to turn off bucketing with ALTER TABLE
  • Speed up reflection method calls in GenericUDFBridge and GenericUDAFBridge
  • potentail NullPointerException
  • hive output file names are unnecessarily large
  • replace isArray() calls and remove LOG.isInfoEnabled() in Operator.forward()
  • supply correct information to hooks and lineage for index rebuild
  • support COMMENT clause on CREATE INDEX, and add new command for SHOW INDEXES
  • support IDXPROPERTIES on CREATE INDEX
  • Need to get hive_hbase-handler to work with hbase versions 0.20.4 0.20.5 and cloudera CDH3 version
  • hive starter scripts should load admin/user supplied script for configurability
  • ability to select across a database
  • Use ZooKeeper from maven
  • Add support for JDBC PreparedStatements
  • Ability to plug custom Semantic Analyzers for Hive Grammar
  • CompactIndexInputFormat should create split only for files in the index output file.
  • regression and improvements in handling NULLs in joins
  • Add alternative search-provider to Hive site
  • Add ProtocolBuffersStructObjectInspector
  • ScriptOperator's AutoProgressor can lead to an infinite loop
  • Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
  • convert commonly used udfs to generic udfs
  • add map joined table to distributed cache
  • Convert join queries to map-join based on size of table/row
  • ability to specify parent directory for zookeeper lock manager
  • Adding consistency check at jobClose() when committing dynamic partitions
  • Change get_partitions_ps to pass partition filter to database
  • FetchOperator.getInputFormatFromCache hides causal exception
  • drop support for pre-0.20 Hadoop versions
  • remove Hadoop 0.17 specific test reference logs
  • Optimize Key Comparison in GroupByOperator
  • Group-by to determine equals of Keys in reverse order
  • Support for using ALTER to set IDXPROPERTIES
  • ExecMapper and ExecReducer: reduce function calls to l4j.isInfoEnabled()
  • Remove Partition Filtering Conditions when Possible
  • Optimize ColumnarStructObjectInspector.getStructFieldData()
  • Remove JDBM component from Map Join
  • test cleanup for Hive-1641
  • optimize group by hash map memory
  • Support show locks for a particular table
  • Add queryid while locking
  • Update transident_lastDdlTime only if not specified
  • add more debug information for hive locking
  • CommonJoinOperator optimize the case of 1:1 join
  • change Pre/Post Query Hooks to take in 1 parameter: HookContext
  • Improve documentation for str_to_map() UDF
  • optimize the code path when there are no outer joins
  • dumps time at which lock was taken along with the queryid in show locks extended
  • Compressed the hashtable dump file before put into distributed cache
  • Clear empty files in Hive
  • HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
  • Show the time the local task takes
  • create a new ZooKeeper instance when retrying lock, and more info for debug
  • Add a option to run task to check map-join possibility in non-local mode
  • more debugging for locking
  • add an option in dynamic partition inserts to throw an error if 0 partitions are created
  • Reduce unnecessary DFSClient.rename() calls
  • Include Process ID in the log4j log file name
  • redo zookeeper hive lock manager
  • add a factory method for creating a synchronized wrapper for IMetaStoreClient
  • a mapper should be able to span multiple partitions
  • Store jobid in ExecDriver
  • Provide config parameters to control cache object pinning
  • Allow any type of stats publisher and aggregator in addition to HBase and JDBC
  • Find a way to disable owner grants
  • Improve the implementation of the METASTORE_CACHE_PINOBJTYPES config
  • Have audit logging in the Metastore
  • "Provide DFS initialization script for Hive
  • Make Stats gathering more flexible with timeout and atomicity
  • make a libthrift.jar and libfb303.jar in dist package for backward compatibility
  • Modify build to run all tests regardless of subproject failures
  • Hive SymlinkTextInputFormat does not estimate input size correctly
  • Bug:
  • "LOAD DATA LOCAL INPATH" fails when the table already contains a file of the same name
  • NULL is not handled correctly in join
  • HiveInputFormat.getInputFormatFromCache "swallows" cause exception when throwing IOExcpetion
  • add progress in join and groupby
  • Simple UDAFs with more than 1 parameter crash on empty row query
  • UDF field() doesn't work
  • Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
  • skip counter update when RunningJob.getCounters() returns null
  • FetchOperator(mapjoin) does not work with RCFile
  • bug in 'set fileformat'
  • Make Eclipse launch templates auto-adjust to Hive version number changes
  • Reporting progress in FileSinkOperator works in multiple directory case
  • hive-site.xml ${user.name} not replaced for local-file derby metastore connection URL
  • percentile_approx() fails with more than 1 reducer
  • CTAS should unescape the column name in the select-clause.
  • plan file should have a high replication factor
  • .gitignore files being placed in test warehouse directories causing build failure
  • TestCliDriver -Doverwrite=true does not put the file in the correct directory
  • fix or disable loadpart_err.q
  • Index followup: remove sort by clause and fix a bug in collect_set udaf
  • when generating reentrant INSERT for index rebuild, quote identifiers using backticks
  • Add cleanup method to HiveHistory class
  • Monitor the working set of the number of files
  • HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
  • hive.mapred.local.mem should only be used in case of local mode job submissions
  • ql tests no longer work in miniMR mode
  • Replace globStatus with listStatus inside Hive.java's replaceFiles.
  • Join filters do not work correctly with outer joins
  • alter partition should throw exception if the specified partition does not exist.
  • Unarchiving operation throws NPE
  • populate inputs and outputs for all statements
  • Fix TestContribCliDriver test
  • smb_mapjoin_8.q returns different results in miniMr mode
  • HBase tests broken
  • bucketizedhiveinputformat.q fails in minimr mode
  • referencing an added file by it's name in a transform script does not work in hive local mode
  • Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures
  • cleanup ExecDriver.progress
  • Hive should not override Hadoop specific system properties
  • wrong log files in contrib client positive
  • Add HBase/ZK JARs to Eclipse classpath
  • udtf_explode.q is an empty file
  • use SequenceFile rather than TextFile format for hive query results
  • need to sort hook input/output lists for test result determinism
  • Hadoop 0.17 ant test broken by HIVE-1523
  • For a null value in a string column, JDBC driver returns the string "NULL"
  • Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675
  • UDTF json_tuple should return null row when input is not a valid JSON string
  • Fix Base64TextInputFormat to be compatible with commons codec 1.4
  • Patch to fix hashCode method in DoubleWritable class
  • bug in NO_DROP
  • CombineHiveInputFormat fails with "cannot find dir for emptyFile"
  • ExecDriver.addInputPaths() error if partition name contains a comma
  • Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe )
  • TestContribNegativeCliDriver fails
  • All TestJdbcDriver test cases fail in Eclipse unless a property is added in run config
  • join results are displayed wrongly for some complex joins using select *
  • Fix describe * [extended] column formatting
  • ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty
  • Eclipse build broken
  • MapJoin throws EOFExeption when the mapjoined table has 0 column selected
  • multithreading on Context.pathToCS
  • Create table bug causes the row format property lost when serde is specified.
  • count(*) returns wrong result when a mapper returns empty results
  • NPE in MapJoin
  • In the MapJoinOperator, the code uses tag as alias, which is not always true
  • ANALYZE TABLE command should check columns in partition spec
  • incorrect partition pruning ANALYZE TABLE
  • bug when different partitions are present in different dfs
  • CREATE TABLE LIKE should not set stats in the new table
  • Migrating metadata from derby to mysql thrown NullPointerException
  • duplicated MapRedTask in Multi-table inserts mixed with FileSinkOperator and ReduceSinkOperator
  • make TestHBaseCliDriver use dynamic ports to avoid conflicts with already-running services
  • ant clean should delete stats database
  • hbase_stats.q is failing
  • Two Bugs for Estimating Row Sizes in GroupByOperator
  • Fix Eclipse templates (and use Ivy metadata to generate Eclipse library dependencies)
  • Statistics broken for tables with size in excess of Integer.MAX_VALUE
  • HIVE 1633 hit for Stage2 jobs with CombineHiveInputFormat
  • failures in fatal.q in TestNegativeCliDriver
  • Many important broken links on Hive web page
  • Mismatched open/commit transaction calls in case of connection retry
  • Merge files does not work with dynamic partition
  • pcr.q output is non-deterministic
  • ROUND(infinity) chokes
  • Assertation on inputObjInspectors.length in Groupy operator
  • parallel execution and auto-local mode combine to place plan file in wrong file system
  • Outdated comments for GenericUDTF.close()
  • Typo in hive-default.xml
  • outputs not populated for dynamic partitions at compile time
  • GenericUDFOr and GenericUDFAnd cannot receive boolean typed object
  • outputs not correctly populated for alter table
  • Mapjoin will fail if there are no files associating with the join tables
  • The merge criteria on dynamic partitons should be per partiton
  • No Element found exception in BucketMapJoinOptimizer
  • bug in auto_join25.q
  • Hive comparison operators are broken for NaN values
  • spurious rmr failure messages when inserting with dynamic partitioning
  • show locks should not use getTable()/getPartition
  • Fix intermittent failures in TestRemoteMetaStore
  • mappers in group followed by joins may die OOM
  • Hanging hive client caused by TaskRunner's OutOfMemoryError
  • Some attributes in the Eclipse template file is deprecated
  • change hive assumption that local mode mappers/reducers always run in same jvm
  • bug in MAPJOIN
  • add more logging to partition pruning
  • downgrade JDO version
  • Temporarily disable metastore tests for listPartitionsByFilter()
  • mixed case tablename on lefthand side of LATERAL VIEW results in query failing with confusing error message
  • Hive's smallint datatype is not supported by the Hive JDBC driver
  • Hive's float datatype is not supported by the Hive JDBC driver
  • Revive partition filtering in the Hive MetaStore
  • Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.
  • test load_overwrite.q fails
  • Add mechanism for disabling tests with intermittent failures
  • TestRemoteHiveMetaStore.java accidentally deleted during commit of HIVE-1845
  • bug introduced by HIVE-1806
  • Fix 'tar' build target broken in HIVE-1526
  • fix HBase filter pushdown broken by HIVE-1638
  • Set the version of Hive trunk to '0.7.0-SNAPSHOT' to avoid confusing it with a release
  • HBase and Contrib JAR names are missing version numbers
  • Alter command execution "when HDFS is down" results in holding stale data in MetaStore
  • create script for the metastore upgrade due to HIVE-78
  • Can't join HBase tables if one's name is the beginning of the other
  • FileHandler leak on partial iteration of the resultset.
  • Double escaping special chars when removing old partitions in rmr
  • use partition level serde properties
  • failures in testhbaseclidriver
  • authorization on database level is broken.
  • CTAS (create-table-as-select) throws exception when showing results
  • Fix TestHadoop20SAuthBridge failure on Hudson
  • GRANT/REVOKE should handle privileges as tokens, not identifiers
  • alter table rename messes the location
  • hive.semantic.analyzer.hook cannot have multiple values
  • Fix test failure in TestContribCliDriver/url_hook.q
  • dynamic partition insert creating different directories for the same partition during merge
  • input16_cc.q is failing in testminimrclidriver
  • fix some outputs and make some tests deterministic
  • add fully deterministic ORDER BY in test union22.q and input40.q
  • TestMinimrCliDriver merge_dynamic_partition2 and 3 are failing on trunk
  • fix hbase_bulk.m by setting HiveInputFormat
  • TestHadoop20SAuthBridge failed on current trunk
  • Mismatched open/commit transaction calls when using get_partition()
  • Update README.txt and add missing ASF headers
  • Executing queries using Hive Server is not logging to the log file specified in hive-log4j.properties
  • Improve naming and README files for MetaStore upgrade scripts
  • upgrade-0.6.0.mysql.sql script attempts to increase size of PK COLUMNS.TYPE_NAME to 4000
  • Add datanucleus.identifierFactory property to HiveConf to avoid unintentional MetaStore Schema corruption
  • Make call to SecurityUtil.getServerPrincipal unambiguous
  • Sub-task:
  • table/partition level statistics
  • Add delegation token support to metastore
  • a followup patch for changing the description of hive.exec.pre/post.hooks in conf/hive-default.xml
  • upgrade the database thrift interface to allow parameters key-value pairs
  • Extend the CREATE DATABASE command with DBPROPERTIES
  • Add the local flag to all the map red tasks, if the query is running locally.
  • Task:
  • Hive should depend on a release version of Thrift
  • Remove Hive dependency on unreleased commons-cli 2.0 Snapshot
  • Update Metastore upgrade scripts to handle schema changes introduced in HIVE-1413
  • Remove CHANGES.txt
  • Create MetaStore schema upgrade scripts for changes made in HIVE-417
  • Provide MetaStore schema upgrade scripts for changes made in HIVE-1823
  • Test:
  • improve test query performance
  • JDBM diff in test caused by Hive-1641
  • merge_dynamic_part's result is not deterministic
  • change the value of hive.input.format to CombineHiveInputFormat for tests

New in version 0.6.0 (May 21st, 2013)

  • New Feature
  • Add PERCENTILE aggregate function
  • add database/schema support Hive QL
  • Hive HBase Integration (umbrella)
  • row-wise IN would be useful
  • CommandProcessor should return DriverResponse
  • add udaf max_n, min_n to contrib
  • Bucketed Map Join
  • support views
  • multi-partition inserts
  • Create UDFs for XPath expression evaluation
  • Better Error Messages for Execution Errors
  • Let user script write out binary data into a table
  • CombinedHiveInputFormat for hadoop 19
  • Add UDF to create struct
  • Add column lineage information to the pre execution hooks
  • Add metastore API method to get partition by name
  • bucketing mapjoin where the big table contains more than 1 big partition
  • enforce bucketing for a table
  • Add UDF array_contains
  • ensure sorting properties for a table
  • sorted merge join
  • create a new input format where a mapper spans a file
  • More robust handling of metastore connection failures
  • Get partitions with a partial specification
  • Add mathematical UDFs PI, E, degrees, radians, tan, sign, and atan
  • Thread pool size in Thrift metastore server should be configurable
  • Add SymlinkTextInputFormat to Hive
  • Partition name to values conversion conversion method
  • More generic and efficient merge method
  • Archiving partitions
  • Tool to cat rcfiles
  • histogram() UDAF for a numerical column
  • Web Interface can ony browse default
  • Add TCP keepalive option for the metastore server
  • Alter the number of buckets for a table
  • Bug
  • support count(*) and count distinct on multiple columns
  • getSchema returns invalid column names, getThriftSchema does not return old style string schemas
  • GenericUDTFExplode() throws NPE when given nulls
  • desc Table should work
  • typedbytes does not support nulls
  • function in a transform with more than 1 argument fails
  • Predicate push down does not work with UDTF's
  • NPE when operating HiveCLI in distributed mode
  • TestContribCliDriver failure in serde_typedbytes.q, serde_typedbytes2.q, and serde_typedbytes3.q
  • Make it possible for users to recover data when moveTask fails
  • ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.
  • Add "-Doffline=true" option to ant
  • Skew Join does not work in distributed env.
  • Conditional task does not increase finished job counter when filter job out.
  • Disable streaming last table if there is a skew key in previous tables.
  • bug with alter table rename when table has property EXTERNAL=FALSE
  • create view should expand the query text consistently
  • Hive CLI shows 'Ended Job=' at the beginning of the job
  • Assertion in ExecDriver.execute when assertions are enabled in HADOOP_OPTS
  • "datanucleus" typos in conf/hive-default.xml
  • Use TreeMap instead of Property to make explain extended deterministic
  • Job counter error if "hive.merge.mapfiles" equals true
  • 'create if not exists' fails for a table name with 'select' in it
  • Expression Not In Group By Key error is sometimes masked
  • Fix RCFile resource leak when opening a non-RCFile
  • Increase ObjectInspector[] length on demand
  • Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition
  • typedbytes: writing to stderr kills the mapper
  • RowContainer should flush out dummy rows when the table desc is null
  • ScriptOperator AutoProgressor does not set the interval
  • CombineHiveInputFormat does not work for compressed text files
  • hints cannot be passed to transform statements
  • Task breaking bug when breaking after a filter operator
  • date_sub() function returns wrong date because of daylight saving time difference
  • joins between HBase tables and other tables (whether HBase or not) are broken
  • set merge files to files when bucketing/sorting is being enforced
  • ql.metadata.Hive#close() should check for null metaStoreClient
  • Cannot start metastore thrift server on a specific port
  • Case sensitiveness of type information specified when using custom reducer causes type mismatch
  • UDF_Percentile NullPointerException
  • bug in sort merge join if the big table does not have any row
  • TestHBaseCliDriver hangs
  • Select query with specific projection(s) fails if the local file system directory for ${hive.user.scratchdir} does not exist.
  • problem in combinehiveinputformat with nested directories
  • Bucketing column names in create table should be case-insensitive
  • error/info message being emitted on standard output
  • sort merge join does not work with bucketizedhiveinputformat
  • Fix UDAFPercentile ndexOutOfBoundsException
  • HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
  • unit test symlink_text_input_format.q needs ORDER BY for determinism
  • = throws NPE
  • bug is use of hadoop supports splittable
  • hive trunk does not compile with hadoop 0.17 any more
  • bucketed sort merge join breaks after dynamic partition insert
  • CombineHiveInputFormat throws exception when partition name contains special characters to URI
  • NPE with lineage in a query of union alls on joins.
  • bugs with temp directories, trailing blank fields in HBase bulk load
  • Cached FileSystem can lead to persistant IOExceptions
  • leading dash in partition name is not handled properly
  • dynamic partition insert should throw an exception if the number of target table columns + dynamic partition columns does not equal to the number of select columns
  • RowContainer uses hard-coded '/tmp/' path for temporary files
  • Group by partition column returns wrong results
  • fatal error check omitted for reducer-side operators
  • select * does not work if different partitions contain different formats
  • Fix bin/ext/jar.sh to work with hadoop 0.20 and above
  • Filter Operator Column Pruning should preserve the column order
  • TypedBytesSerDe fails to create table with multiple columns.
  • hive.query.id is not unique
  • rcfilecat should use '\t' to separate columns and print '\r\n' at the end of each row.
  • load_dyn_part*.q tests need ORDER BY for determinism
  • partition level properties honored if it exists
  • Increase the maximum length of various metastore fields, and remove TYPE_NAME from COLUMNS primary key
  • Bug in SMBJoinOperator which may causes a final part of the results in some cases.
  • inputFileFormat error if the merge job takes a different input file format than the default output file format
  • remove blank in rcfilecat
  • Missing connection pool plugin in Eclipse classpath
  • getPartitionDescFromPath() in CombineHiveInputFormat should handle matching by path
  • combinehiveinputformat does not work if files are of different types
  • Reporting progress to JT during closing files in FileSinkOperator
  • Add hadoop-*-tools.jar to Eclipse classpath
  • File format information is retrieved from first partition
  • DataNucleus throws NucleusException if core-3.1.1 JAR appears more than once on CLASSPATH
  • CombineHiveInputFormat bug on tablesample
  • Archived partitions throw error with queries calling getContentSummary
  • column pruning not working with lateral view
  • problem with sequence and rcfiles are mixed for null partitions
  • problem with sequence and rcfiles are mixed for null partitions
  • hive.task.progress should be added to conf/hive-default.xml
  • ALTER TABLE ADD PARTITION fails with a remote Thrift metastore
  • Upgraded naming scheme causes JDO exceptions
  • bug in 'set fileformat'
  • insert overwrite and CTAS fail in hive local mode
  • lateral view does not work with column pruning
  • FileSinkOperator should remove duplicated files from the same task based on file sizes
  • parallel execution failed if mapred.job.name is set
  • Typo of hive.merge.size.smallfiles.avgsize prevents change of value
  • hive --service jar looks for hadoop version but was not defined
  • Web Interface JSP needs Refactoring for removed meta store methods
  • ObjectStore.commitTransaction() does not properly handle transactions that have already been rolled back
  • Migration scripts should increase size of PARAM_VALUE in PARTITION_PARAMS
  • Improvement
  • provide option to run hive in local mode
  • handle skewed keys for a join in a separate job
  • Incorporate CheckStyle into Hive's build.xml
  • Merge tasks in GenMRUnion1
  • CREATE VIEW followup: add a "table type" enum attribute in metastore's MTable, and also null out irrelevant attributes for MTable instances which describe views
  • CREATE VIEW followup: find and document current expected version of thrift, and regenerate code to match
  • Add a "skew join map join size" variable to control the input size of skew join's following map join job.
  • make number of concurrent tasks configurable
  • QueryPlan to be independent from BaseSemanticAnalyzer
  • Structured temporary directories
  • add counters to show that skew join triggered
  • Make QueryPlan serializable
  • Add hive.merge.size.per.task to HiveConf
  • Make all Tasks and Works serializable
  • In ivy offline mode, don't delete downloaded jars
  • Make ql/metadata/Table and Partition serializable
  • Let max/min handle complex types like struct
  • add type-checking setters for HiveConf class to match existing getters
  • CREATE VIEW followup: support ALTER TABLE SET TBLPROPERTIES on views
  • Add comment to explain why we check for dir first in add_partitions().
  • Add metastore API method to drop partition / append partition by name
  • drop_partition_by_name() should use drop_partition_common()
  • Configure build to download Hadoop tarballs from Facebook mirror instead of Apache
  • When checkstyle is activated for Hive in Eclipse environment, it shows all checkstyle problems as errors.
  • Explicitly say "Hive Internal Error" to ease debugging
  • Show the row with error in mapper/reducer
  • accept TBLPROPERTIES on CREATE TABLE/VIEW
  • allow HBase key column to be anywhere in Hive table
  • add pre-drops in bucketmapjoin*.q
  • add backward-compatibility constructor to HiveMetaStoreClient
  • mapjoin followed by another mapjoin should be performed in a single query
  • from_unixtime should implment a overloading function to accept only bigint type
  • optimize bucketing
  • facilitate HBase bulk loads from Hive
  • CLI set and set -v commands should dump properties in alphabetical order
  • error message in Hive.checkPaths dumps Java array address instead of path string
  • support: alter table touch partition
  • cleanup the jobscratchdir
  • Increase the memory limit for CLI client
  • make mapred.input.dir.recursive work for select *
  • for ALTER TABLE t SET TBLPROPERTIES ('EXTERNAL'='TRUE'), change TBL_TYPE attribute from MANAGED_TABLE to EXTERNAL_TABLE
  • DataNucleus should use connection pooling
  • Moving inputFileChanged() from ExecMapper to where it is needed
  • Do not pull counters of non initialized jobs
  • Hive should use NullOutputFormat for hadoop jobs
  • CombineHiveInputSplit should initialize the inputFileFormat once for a single split
  • New algorithm for variance() UDAF
  • allow HBase WAL to be disabled
  • Add PERCENTILE_APPROX which works with double data type
  • Make Hive build work with Ivy versions < 2.1.0
  • set abort in ExecMapper when Hive's record reader got an IOException
  • Make the compile target depend on thrift.home
  • Task
  • Automated source code cleanup
  • Cleanup Class names
  • Add .gitignore file
  • Suppress Checkstyle warnings for generated files
  • Replace instances of StringBuffer/Vector with StringBuilder/ArrayList
  • Checkstyle fixes
  • Use Anakia for version controlled documentation
  • build references IVY_HOME incorrectly
  • Update Eclipse project configuration to match Checkstyle
  • Eclipse launchtemplate changes to enable debugging
  • fix Hive logo img tag to avoid stretching
  • Provide metastore schema migration scripts (0.5 -> 0.6)
  • Provide Postgres metastore schema migration scripts (0.5 -> 0.6)
  • Include metastore upgrade scripts in release tarball
  • Update README file for 0.6.0 release
  • Satisfy ASF release management requirements
  • Sub-task
  • checking VOID type for NULL in LazyBinarySerde
  • Test
  • NPE when running TestJdbcDriver/TestHiveServer
  • test HBase input format plus CombinedHiveInputFormat
  • temporarily disable HBase test execution
  • Unit test should be shim-aware