Extracted from SufficientlySmartCompiler:

A related issue is the SufficientlySmartVirtualMachine; early JavaLanguage proponents hypothesized about ultra-advanced VirtualMachine''''''s which, by employing various adaptive strategies (profile-directed optimizations and such), would achieve runtimes AsFastAsCee or faster.  The logic was that C/C++, being languages which are compiled in advance, could only use static optimizations whereas the Java code could tailor the optimizations being performed to each running instance of the program/class.

Certainly Sun and others have developed some rather advanced VM technology; but none of 'em are AsFastAsCee yet.  Given that many optimization strategies employed by C/C++ compilers are NpHard, I suspect that a runtime optimizer will not be able to avail itself of many of them.

Actually, there's no reason a JIT compiler can't use whatever algorithms it wants. By default the JVM interprets bytecode and only uses compilation where it sees fit (i.e. in the "hot spots", hence HotSpotVM. Compilation can also run in background threads out of band with normal execution, and can even avail itself of idle CPUs when they aren't being consumed by the running program.
----
''Hrm. Was the Java-specific CPU ever released? I remember hearing about it years ago - that it would "run Java not at VirtualMachine speed, but at hardware speed". . . never saw anything come of it though.''

There are quite a few CPUs, especially the various ARM derivatives used in PDAs, which have "Java accelerators" in them.  However, JavaByteCode is not a particularly good instruction set for efficient silicon implementation. Java's bytecode structure is optimized for loading into a compiler, not a processor.
* An interesting approach is the one of Azul - a general purpose RISC with a few primitives for excellent JVM support. Cliff Click is one of the best JVM engineers around, and he works for Azul. Here are a couple of links about this architecture:
** http://www.azulsystems.com/blog/cliff-click/2008-11-18-brief-conversation-david-moon (description of his architecture, comparing it with a Lisp Machine)
** http://www.azulsystems.com/blog/cliff-click/2010-04-21-un-bear-able, a discussion about why running Java bytecode in hardware is a bad idea, compared to JITting

Actually the traditional x86 machine code is also 'not a particularly good instruction set for efficient silicon implementation'. That is the reason why our beloved Intel processors internally decode (read: compile) it into a (very simplified and optimized) data flow machine instruction set and execute that. I don't see any reason why this coudn't be done with bytecode too. After all the http://en.wikipedia.org/wiki/Efficeon (aka Crusoe) also does this. So it is quite possible. It's just not done for historical reasons.

----
As far as I have seen in a former project (in a team that built a 32bit VM) the "Java accelerators" don't usually help very much.
The ones available at the time only helped to speed up the inner interpreter loop - i.e. looking up an address for a bytecode.
Some of them provide some stack manipulation shortcuts, possibly even simple mathematical computations using a stack, but as soon as you have to touch a 'real' object or perform a more complex function, you are on your own. 
Which in turn means that
* either you have to construct a "Java accelerator" that imposes a specific object model (and associated data structures in memory) 
* or you don't gain much performance from using these accelerators -- looking up an address in a function table does not take very long.

So - at least for common general-purpose processors - I doubt that a performance-boosting "Java accelerator" in hardware is possible.
Of course this does not rule out specific architectures that are tailor-made for efficient Java execution.
----
But sometimes the time of a SufficientlySmartVirtualMachine comes. An example is GarbageCollection. There are now a few studies that show that programs with automatic GarbageCollection are faster as their more complex counterparts with manual collection.

''Which is related to the virtual machine because ...?''
Because a garbage collector is a part of the hardware abstraction that makes up a VM.

* I'm surprised this hasn't generated a DemandForEvidence.
** Yes. I remember HansBoehm saying that sometimes GC beats malloc because keeping the free-lists up-to-date may be expensive. Trying to find a reference (http://www.hpl.hp.com/personal/Hans_Boehm/gc/) 
** Here's one: http://citeseer.ist.psu.edu/appel87garbage.html
*** Everyone should notice that fact from your link : ''With enough memory, explicit deallocation of garbage may be more expensive than collecting live data''
*** With enough memory. Truth is, using GarbageCollection can make your code in practice very memory hungry. I've yet to see a GarbageCollector that can be as fast as manual memory management AND as memory tight. It can't be both. 
*** {Depending '''entirely''' upon GarbageCollection seems wasteful, especially given the facilities already available for inference over memory usage in the source-code.  I'm looking forward to widespread implementation of memory-use inference performed statically upon the source-code... possibly with limited automatic runtime support (such as tracking whether a reference leaves a scope or is referenced at least twice outside its own scope - i.e. the 'smart virtual machine').  In theory, a compile-time inference for explicit deallocation can be nearly as memory-efficient as human programmers (since human programmers need to statically make the exact same deallocation decisions, with perhaps just a slightly more information than is available to the compiler) and is far less prone to error (where something is deleted prematurely).  Automatically injecting additional runtime support for tracking which objects can be deleted immediately goes even beyond what most humans are willing to provide explicitly.  '''(Inferred implicit deallocations calculated at compile-time + garbage collection + compiler support for efficient runtime collection)''' seems to me to be the best of both worlds... much like SoftTyping allows the best of static + dynamic typing.  Perhaps we should dub it SoftGarbageCollection?}

** Here is a very relevant reference: http://lambda-the-ultimate.org/node/2552 (Quantifying the Performance of Garbage Collection vs. Explicit Memory Management) this quantifies the "enough memory" part above. 

----
See VirtualMachine