Gaming Hardware News
Achieving High Web Throughput with VMware vSphere 4 on Intel Xeon 5500 series (Nehalem) servers
Something came across my inbox recently that I just had to share. I just finished watching The Bucket List. One of my favorite scenes was when Morgan Freeman and Jack Nicholson are racing around the track driving these classic muscle cars. The need for speed is alive and well it seems.
We've been working with our friends at VMware and have some nice results to report. The first paragraph of their blog post says it all:
We just published a SPECweb2005 benchmark score of 62,296 -- the highest result published to date on a virtual configuration. This result was obtained on an HP ProLiant DL380 G6 server running VMware vSphere 4 and featuring Intel Xeon 5500 series processors, and Intel 82598EB 10 Gigabit AF network interface cards.
To read more about these results, here are a few links.
- Achieving High Web Throughput with VMware vSphere 4 on Intel Xeon 5500 series (Nehalem) servers from VMware Performance Team's Vroom! Blog
- The story on VMware communities site, with some additional detail
- SPECweb2005 Result
Using the Media SDK - Simple DirectShow Trancoding
Calling all developers working with media technologies! If you haven’t already, download and learn Intel’s Media SDK 1.5. Its easier then DXVA code and using it ensures you are forward compatible with future Intel hardware. Check it out.. Download it here.
Need a quick way to transcode a clip? Use the Media SDK’s DirectShow filters to build a solution using the Windows SDK tool GraphEdit. GraphEdit is a visual way to connect chains of DirectShow filters together to perform a media task. A series of connected filters comprises a “filter graph”. This tool can be found in the "bin" directory of the Windows SDK - version 6 or 7.
The following filter graph transcodes a MP4 file that I had downloaded off the internet into a MPEG2 that I can burn on a DVD. The process of transcoding is described below.
The file “Marty_ear_Training.mp4” (a guitar lesson I am learning) is read off the disk by a file “source” filter, and then sent downstream to the next filter in the chain. At this point, both the video and audio data are mixed together. The next filter in the chain is the Intel® Media SDK MP4 splitter filter. It’s this filter’s job to separate the audio and video data, and pass each on its own path. This MP4 file was encoded with AAC audio, which needs to be decoded. The Intel Media SDK AAC decoder handles this job. When it’s completed, a wave formatted audio stream is the output. On the video side, the Intel Media SDK h2.64 decoder accepts an h.264 compressed video stream, and outputs an uncompressed video surface. At this point, my guitar lesson has been decoded and can be rendered on the screen.
But this blog is about transcoding, so the output of the decoding process needs to become the input of the encoding process. Fortunately, it’s just a matter of connecting the right filters to do the job. I want my resulting MPEG2 file to have a MP3 video stream, so I send the output of the AAC decode into the MP3 Encode. I also want the video to be MPEG2, so I send the output of the h264 decode right into the encoder for MP2. The encoders work their magic and I have two streams of data that need to be combined into a single file. So, I plug in the Intel Media SDK MPEG-2 Muxer to combine the streams. The final filter is just a file writer which puts my new MPEG2 movie on the disk.
There are many different applications to play back a MPEG2 file, but for the sake of completeness let’s look at the playback graph for our new transcoded file in GraphEdit.
Again, the file is read from the disk. Then the audio and video substreams are spilt and decoded. Finally, instead of doing a trancode we send the output to the screen and speakers. The Video Render and Direct Sound Device filters are used to playback the clip, they are provided by DirectShow.
I encourage you to check out the Intel Media SDK and the DirectShow samples! The source code for the Intel Media SDK H.264 Decode, and Intel Media SDK MPEG2 Encoder are provided with the SDK, and the remaining SDK filters are distributed “as is” with feedback welcomed. Becoming familiar with DirectShow, Filters, and graphs will set the stage to delve deeper into the inner workings of the Intel Media SDK.
In the next blog, we will dig deeper into those encode and decode filters, and I’ll start to go over how they use Intel’s graphics hardware to accelerate the job they do. Comments, and recommendations always welcome.
Thanks
Chip Shot: Intel Sponsors of Tomorrow Goes Mobile - Postcard Style
Browzwear Approved for Accelerate Innovation Program in Israel
Member Browzwear Ltd. of Israel used Intel Software Development Tools to optimize some of its flagship products on Intel multi-core architecture. They are also participating in the Accelerate Innovation Program in cooperation with the Office of the Chief Scientist in Israel. To Learn more about the Browzwear Ltd. Click Here . To learn more about the Accelerate Innovation in Israel Click Here
Enroll in Intel Software Partner Program today and learn how the program can help you deliver innovative solutions to meet your users' demands. Learn More
In what way can C++0x standard help you eliminate 64-bit errors
Programmers see in C++0x standard an opportunity to use lambda-functions and other entities I do not quite understand :). But personally I see convenient means in it that allow us to get rid of many 64-bit errors.
Consider a function that returns "true" if at least one string contains the sequence "ABC".
typedef vector<string> ArrayOfStrings; bool Find_Incorrect(const ArrayOfStrings &arrStr) { ArrayOfStrings::const_iterator it; for (it = arrStr.begin(); it != arrStr.end(); ++it) { unsigned n = it->find("ABC"); if (n != string::npos) return true; } return false; };This function is correct when compiling the Win32 version but fails when building the application in Win64. mode. Consider another example of using the function:
#ifdef IS_64 const char WinXX[] = "Win64"; #else const char WinXX[] = "Win32"; #endif int _tmain(int argc, _TCHAR* argv[]) { ArrayOfStrings array; array.push_back(string("123456")); array.push_back(string("QWERTY")); if (Find_Incorrect(array)) printf("Find_Incorrect (%s): ERROR!\n", WinXX); else printf("Find_Incorrect (%s): OK!\n", WinXX); return 0; } Find_Incorrect (Win32): OK! Find_Incorrect (Win64): ERROR!The error here is related to choosing the type "unsigned" for "n" variable although the function find() returns the value of string::size_type type. In the 32-bit program, the types string::size_type and unsigned coincide and we get the correct result.
In the 64-bit program, these types do not coincide. As the substring is not found, the function find() returns the value string::npos that equals 0xFFFFFFFFFFFFFFFFui64. This value gets cut to 0xFFFFFFFFu and is written into the 32-bit variable. As a result, the condition 0xFFFFFFFFu == 0xFFFFFFFFFFFFFFFFui64 is always false and we get the message "Find_Incorrect (Win64): ERROR!".
We may correct the code using the type string::size_type.
bool Find_Correct(const ArrayOfStrings &arrStr) { ArrayOfStrings::const_iterator it; for (it = arrStr.begin(); it != arrStr.end(); ++it) { string::size_type n = it->find("ABC"); if (n != string::npos) return true; } return false; };Now the code works as it should though it is too long and not very nice to constantly add the type string::size_type. You may redefine it through typedef but still it looks somehow complicated. Using C++0x we can make the code much smarter and safer.
Let us use the key word "auto" to do that. Earlier, this word meant that the variable was created on the stack and it was implied if you had not specified something different, for example, register. Now the compiler identifies the type of a variable defined as "auto" on its own, relying on the function initializing this variable.
Note that an auto-variable cannot store values of different types during one instance of program execution. C++ remains a static typified language and "auto" only makes the compiler identify the type on its own: once the variable is initialized, its type cannot be changed.
Let us use the key word "auto" in our code. The project was created in Visual Studio 2005 while C++0x standard gets supported only beginning with Visual Studio 2010. So I chose Intel C++ compiler included into Intel Parallel Studio 11.1 and supporting C++0x standard to perform compilation. The option of enabling C++0x support is situated in the Language section and reads "Enable C++0x Support". As you may see in Figure 1, this option is Intel Specific.
Figure 1 - Support of C++0x standard
The modified code looks as follows:
bool Find_Cpp0X(const ArrayOfStrings &arrStr) { for (auto it = arrStr.begin(); it != arrStr.end(); ++it) { auto n = it->find("ABC"); if (n != string::npos) return true; } return false; };Consider the way the variable "n" is defined now. Smart, isn't it? It also eliminates some errors including 64-bit ones. The variable "n" will have exactly the same type returned by the function find(), i.e. string::size_type. Note also that there is no string with the iterator definition:
ArrayOfStrings::const_iterator it;It is not very smart to define the variable "it" inside the loop (for it is rather lengthy). So the definition was taken out of the loop. Now the code is short and accurate:
for (auto it = arrStr.begin(); ......)Let us examine one more key word "decltype". It allows you to define the type of a variable relying on the type of another variable. If we had to define all the variables in our code beforehand, we could write it in this way:
bool Find_Cpp0X_2(const ArrayOfStrings &arrStr) { decltype(arrStr.begin()) it; decltype(it->find("")) n; for (it = arrStr.begin(); it != arrStr.end(); ++it) { n = it->find("ABC"); if (n != string::npos) return true; } return false; };Of course, it is senseless in our case but may be useful in some others.
Unfortunately (or fortunately for us :-), the new standard does not eliminate already existing defects in the code despite really simplifying the process of writing safe 64-bit code. To be able to fix an error with the help of memsize-тип or "auto" you must find this error at first. So, the tool Viva64 will not become less relevant with the appearance of standard C++0x.
P.S.
You may download the project with the code here.
Activision Publishing Announces 10 Minute Solution, An Upcoming Video Game For Wii(TM) Based On Anchor Bay Entertainment's Hit Fitness DVDs
Mesh development update
Yesterday we released a new version of the mesh tools. Nothing earth shattering yet, but the main new feature I have been working on is connecting a mesh of computers to an external web site. Basically, you use the mesh to coordinate computers in such a way as to keep one of them connected to an external web site at all times. This scales because we don't all them computers connected to the site at once, just one or a few. You can then have the mesh send information about the local computers to the site and have the site send commands back to the mesh.
A few days ago I demonstrated waking up a computer from the web site. Just click a button on the external site and the command is sent down to one of the members of the mesh and causes a wake-on-lan packet to be sent on the local network. Pretty cool. We hope at some point to launch a "managing your computers as a service" trial sometime in the future.
Yesterday I also released a new version of the Developer Tools for UPnP Technologies with very minor fixes. Hope to get more fixes in soon.
Ylian
opentools.homeip.net
GPU Renderer for Maya - Furry Ball from Art and Animation Studio - New on CUDA Zone
Dude! Who killed my 1st Person Shooter?
[Disclaimer] So bear with me because this post is intentionally aimed at being both whimsical and controversial at the same time. Not to mention point out some huge biases I have around gaming. Some… might even think I’m a purist or worse… a snob! Phooey.
Approximately twelve years ago, when some of today’s gamers were still in diapers, I was struggling with a game addiction. Perhaps I still do but I’m a fan of eating my own dog food. At any rate the game in question happened to be Quake II from Id/Activision. I loved this game! It was super fast, had awesome weapons which were arguably unbalanced in some cases. Best of all it had a robust modding and level design community. One could download all sorts of ‘skins’, ‘levels’, and so on that added more dimensions and flavor to the game.
This leads to my next key point. When I say this game was fast.. I mean it was fast! As fast as one could react to the game. To make it even faster I found myself overclocking, scrambling for more cooling, running a couple 3dfx Voodoo 2 cards in SLI mode, cranking everything I possibly could up. If memory serves I was able to get my frames per second (fps) up to ~70-80fps on my 21” CRT monitor. The game with that configuration was butter smooth. It enabled me to be more competitive and I was soon mastering things like the grappling hook, and rocket jumping.
So what happened? What is the state of my beloved 1st Person shooter genre game today? I’ve played Crysis, Quake 4, and even tried some of these games on an Xbox 360 such as Halo 1-3, Gears of War etc. However; they still just don’t hold that ‘magic’ for me like they used too.
Then it dawned on me. Most of these 1st Person shooter games, even though they can achieve much higher frame rates, have sort of blown it (IMHO) on two major fronts. First: this genre suffers from being slowed wayyyy down to accommodate the Console & Gamepad paradigms. Which might be ok for some game design but not all. Seriously when I play this genre now on a Console or PC I feel like I’m in a bad dream wading through a sea of molasses. In a few words that’s just plain lame. Second: Game balance. I think it’s ok to have a few weapons & tweaks in the game being a bit imbalanced. This is what getting your hands on the “BFG” was all about!!! The only other weapon I’ve seen to date that surpasses the BFG was to be found in the game Shogo: Mobile Armor Division from Monolith. It was the mini-nuke!!! Now THAT was what I call imbalanced! But it was sooo fun to use.
So how do we fix this predicament? Well Game Consumers we have to ask for it! First start by contacting the game publishers, then the developers and let them know what we want and desire. GDC 2010 is just next week and we can start there! Start demanding more! Let’s try to restore this genre to its rightful place! Personally I’d love to see more competitive game play come back to this genre. I’d love to also see a more robust level-modding type community than what we have today. (e.g. I’d like to see more companies like Valve get started!) Let’s toss those crutch-based-algorithm-aiming-enhanced-gamepads out the window and see the real skill of the players start to shine through again!
So there you have it. That is my personal opinion on one of my top 3 favorite game genres. I’d definitely love to hear everyone’s feedback.
Parallel Programming Talk #66 - Listener Question "What is “acquire memory access semantics” and do I need to worry about this in parallel programming?"
PPTalk #66 Welcome to Show 66 of Parallel Programming Talk. Today is March 2nd. On this episode Clay and Aaron will be Answering Listener Questions. Download Link - MP4 Video File: Download Link - MP3 Audio File: First The Big News: Congratulations to Parallel Programming Talk and the entire Intel Software Network TV team for being recognized by the American Marketing Association with a MAX award on February 25th. Intel Software Network TV was recognized in the category of Single Medium Advertising where the finalist included SawStop, Icebreaker, Port of Vancouver, Mt Hood Meadows, Oregon Lottery and Adidas MLS. Game Developers Conference® Mar 9 – 13 2010 @ Moscone Center, San Francisco, CA, USA (map) The Game Developers Conference® (GDC) is the world’s largest professionals-only game industry event. Presented every spring in San Francisco, it is the essential forum for learning, inspiration, and networking for the creators of computer, console, handheld, mobile, and online games. The GDC attracts over 17,000 attendees, and is the primary forum where programmers, artists, producers, game designers, audio professionals, business decision-makers and others involved in the development of interactive games gather to exchange ideas and shape the future of the industry. SIGCSE 2010 - The 41st ACM Technical Symposium on Computer Science Education Mar 10 – 13 2010 @ Milwaukee, WI, USA SIGCSE 2010 will take place in the Midwest Airlines Center in Milwaukee, WI. The two conference hotels, the Hyatt Regency and the Hilton City Center, are connected to the Center by heated skywalk. Unless otherwise noted, all room numbers refer to the Midwest Airlines Center. The SIGCSE Technical Symposium addresses problems common among educators working to develop, implement and/or evaluate computing programs, curricula, and courses. The symposium provides a forum for sharing new ideas for syllabi, laboratories, and other elements of teaching and pedagogy, at all levels of instruction. We invite those interested in computer science education and computer science education research to contribute to SIGCSE 2010. Following SIGCSE tradition, the symposium will provide a diverse selection of technical sessions and opportunities for learning and interaction. Intel® Software Parallelism Techdays Free 1-Day Course on Parallelism and Threading Learn directly from Intel when you attend this free one-day course on parallelism and threading. This is a great opportunity learn about threading your applications for multi-core platforms. This course is targeted for Windows* C++ developers using Microsoft Visual Studio* 2005 or 2008.
- March 16 Iselin, NJ
- March 17 New York, NY
- March 18 Waltham, MA
Don't Hinder Concurrency!
I've just read the article Use Thread-local Storage to Reduce Synchronization from the Intel Guide for Developing Multithreaded Applications and here is my take on it.
Indeed. If threads are a tool for concurrency, synchronization is a tool for suppressing concurrency. Any form of synchronization (no matter locks, semaphores or atomic operations) hinders concurrency. It requires a distributed system to achieve strong global consensus, and consensus in a distributed system can't be cheap. Period.
So the best design of a concurrent system tries to reduce synchronization to a bare minimum (total elimination of synchronization is impossible, otherwise the system will break up into several independent systems). There are various techniques for reducing the need for synchronization - partitioning, privatization, replication, amortization.
The idea of partitioning is to split whole data-set into several mostly independent partitions, a worker thread is bound to each of the partitions, plus there must be some partitioning function that maps an external key to a partition where the data resides. Then, all requests are routed directly to a thread that bound to the required partition. As a result, a worker thread works with the partition's data without any synchronization.
Privatization is a private case of the partitioning with a single partition, i.e. whole data-set is handed over to a single thread which can work with it without any synchronization. The negative side of this technique is that the single thread can become a bottleneck, other threads concurrently running on other cores can overwhelm it with requests.
The idea of replication is to have several independent replicas of a data-set, and propagate updates between replicas explicitly via messages. Data in replicas can be temporary inconsistent, however a lot of systems can tolerate some inconsistency.
Amortization is usually based on some form of thread-local data (placed either on a thread's stack or in a compiler/OS-provided storage). The idea is simple – we collect some updates in thread-local storage and then apply them later in batches. That's what we saw in the article. The main advantage of amortization based on thread-local storage is it's simplicity. Indeed, you do not need to reorganize your data, to route requests to particular threads based on data placement, cope with inconsistencies, etc. So, if it's applicable it's the first thing you must consider.
Well, there are too many things I can say on these things... a way too many to fit into this blog. But what I want to communicate is that you must consider these things as a starting point rather than a final destination, they are a primitive tools for reducing synchronization in your concurrency toolbox. Choose the best tool for a particular situation, combine them, adopt them.
Now a few comments directly on the article.
This solution trades synchronization per event for synchronization per thread. Performance will improve if the number of events is much larger than the number of threads.
I would not agree here, there is no such a tradeoff involved. If a thread had not collected any events in his thread-local storage, then he just does not access centralized data at all. The additional overhead is a single 'if' statement per thread, which is negligible in a context of inter-thread work distribution. This technique does not increase the total number of events.
An additional advantage of using thread-local storage during time-critical portions of the program is that the data may stay live in a processor’s cache longer than shared data, if the processors do not share a data cache. When the same address exists in the data cache of several processors and is written by one of them, it must be invalidated in the caches of all other processors, causing it to be re-fetched from memory when the other processors access it. But thread-local data will never be written by any other processors than the one it is local to and will therefore be more likely to remain in the cache of its processor.
In general this is very true. Indeed, thread-local data reduces amount of inter-core communication, thus reducing amount of costly cache-coherence traffic.
But, this has little to do with shared cashes, even if cores share L3 cache, data still will be transferred between their L1 caches (L1 caches are not shared between cores on most current processors). So I would recommend to just ignore the part on shared caches. Prefer thread-local data and you are on the safe side with any current or future architecture.
There is another important consideration with regard to shared L2/L3 caches (which are featured on many current processors), and this consideration is against thread-local data. Consider the following situation. Moderate size shared object is frequently accessed for reading, but infrequently for writing. If it is split into thread-local parts (which usually implies increase in size), it will not fit into shared L2/L3 cache, thus threads will constantly evict each others data from the cache. However, if the object is implemented as a single centralized entity, it fits into the cache, thus threads will work with cached data without evictions.
So, the tradeoff frequently involved is reduction of synchronization versus increase of total working set. Which to prefer is highly dependent on the situation.
One must be careful about the trade-offs involved in this technique. The technique does not remove the need for synchronization, but only moves the synchronization from a time-critical section of the code to a non-time-critical section of the code.
Well, I would say that the main point of the technique is reduction of synchronization rather than move of the synchronization from one part of the code to another. Amortization via thread-local storage can not involve any movement of synchronization at all. The technique can be applied in two forms: single final aggregation or periodic aggregations. The latter does not involve any movement of synchronization while still reduces synchronization overheads. And it has additional benefit that separate monitoring thread can periodically fetch and output intermediate results.
Consider, for example, the following program:
long total_event_count; __declspec(thread) long thread_event_count; // thread-local cache void thread_function(size_t begin, size_t end) { for (size_t i = begin; i != end; i += 1) { if (predicate(i)) { thread_event_count += 1; // if we have cached enough events, // transfer them to global shared variable if (thread_event_count == THRESHOLD) { _InterlockedExchangeAdd(&total_event_count, thread_event_count); thread_event_count = 0; } } } // transfer the remainder of locally cached events if (thread_event_count) _InterlockedExchangeAdd(& total_event_count, thread_event_count); }
In the above example the synchronization is not moved to another point, but it's still reduced by a factor of THRESHOLD. Separate monitoring thread can periodically read and output total_event_count variable, and there is a guarantee that total_event_count does not lag behind real value of discovered events by more than NUMBER_OF_THREAD * THRESHOLD.
Note that thread-local data may be actually shared between threads, there is nothing preventing this. A method of declaration of a variable is orthogonal to it's “shared-ness”. Address of a variable declared as __declspec(thread)/__thread/omp threadprivate/pthread_key_create()/TlsAlloc() can be passed to another thread, and thus the variable become shared. Just as plain global variable can be ever accessed by a single thread, and so it's local to the thread.
Also note that you can get a flavor of thread-local data with plain global array indexed by a unique thread index. This technique is less dependent on a particular compiler/OS, and makes sharing of thread-local data much easier (infrequent sharing is not dangerous and anyway necessary in any real-world program). Here is a simple example:
// array of "thread-local" data long volatile event_counts [MAX_THREAD_COUNT] = {}; // sequence used to generate unique thread indexes long volatile thread_sequence = 0; // worker thread routine void worker_thread(size_t begin, size_t end) { // obtain unique thread index long my_idx = _InterlockedIncrement(&thread_sequence) - 1; for (size_t i = begin; i != end; i += 1) { if (predicate(i)) event_counts[my_idx] += 1; } } // monitoring thread routine void monitor_thread() { while (termination_condition == false) { // obtain current thread count long thread_count = thread_sequence; long sum = 0; for (long i = 0; i != thread_count; i += 1) sum += event_counts[i]; printf("event count: %u\n", (unsigned)sum); Sleep(1000); } }
However, be aware that the above example contains a nasty instance of false-sharing which kills performance. You can read about how to cope with it in the article Avoiding and Identifying False Sharing Among Threads.
Keep threading!
GDC 2010
Next week is GDC in San Francisco where I will be a copresenter for a talk on vectorization. The talk is geared towards intermediate level programmers who are interested in SIMD programming but should be accessible to most anyone. See you there!
ConfigurationErrorsException
While with Visual Studio 2008 SP1 yesterday, I began receiving a ConfigurationErrorsException, as the following screen capture shows and what was strange was that the same application domain configuration was working just fine earlier that day. In fact there hadn't been any code changes and according to Team Foundation Server that code had not changed in days.
After opening the same solution upon another computer and the code working without the ConfigurationErrorsException I hoped that I would be able to resolve the issue without having to result to uninstalling and reinstalling Visual Studio.
I'm unable to tell you why the exception began appearing although I am able to tell you how I was able to resolve the issue this morning.
After backing up the Visual Studio 2008 settings, restoring the standard C# developer settings, the issue disappeared. I was also able to restore the syntax color settings and a few other settings from the settings backup and the ConfigurationErrorsException has still disappeared.
Have you USED the Partner Program Product Launch KIT?
The Intel Software Partner Program offers members a Product Launch Kits that helps maximize product launch results with a plan that helps orchestrate and closely manage the details.
The official trigger to start is when your new product idea is approved and you start your project plan in the Intel Software Partner Program. Use the templates in the Launch Toolkit to transfer critical launch information to all your stakeholders. In turn, you will be able to improve your brand and maintain a consistent message in your marketplace.
This benefit is a Flash file that walks through the necessary steps of a successful product launch. The Flash file also includes a downloadable zip file with a collection of templates for launch planning, messaging, competitive analysis, and more.
Follow these steps to get started
1) Join the Intel Software Partner Program
2) Launch the Product Tool Kit flash demo
3) Start your new project plan
Enroll in Intel Software Partner Program today and learn how the program can help you deliver innovative solutions to meet your users' demands. Learn More
Novell’s Virtualization and Hypervisor Strategy
After the recent joint announcement between Novell and Citrix for collaboration on enterprise-class virtualization and cloud computing, Michael Applebaum took some time to discuss Novell’s virtualization and hypervisor strategy on the Novell blogs.
For the most part, Novell's virtualization strategy hasn't really changed. According to Michael, "Novell is pursuing a heterogeneous, multi-platform hypervisor and guest OS approach that gives our customers maximum choice, flexibility and value." SUSE Linux enterprise is optimized for several Xen implementations, KVM, VMware ESX and Microsoft Hyper-V. They also have a flexible virtualization pricing strategy for SUSE Linux along with cooperative relationships and support agreements with VMWare, Microsoft and Citrix to provide integrated support options for customers. In addition to these relationships with vendors, Novell contributes improvements back into the Xen and KVM projects.
Michael goes on to say that "Novell’s commitment to interoperability, customer choice and value are fully apparent in our virtualization strategy, partnerships and offerings." You can read the rest of Michael's post to learn more about Novell's virtualization strategy.
New on YouTube - 7 People Talk About CUDA on Tesla, Quadro, GeForce
Manageability Community: What's up?
I thought it might be interesting to take our pulse once in a while. Maybe I'll do this weekly. Or monthly. ( That is, if anyone is really interested in seeing this kind of information.)
Events: On March 3, we hosted a Virtual Seminar announcing our Intel(r) AMT 6.0 release and new features. Our attendence was around 50 people, worldwide. Topics discussed were the Intel Developer Network, Business Client Ecosystem, New Intel AMT features and what's new with the SDK, and finally some information on the Intel Software Partner Program and the SAT testing tool that is coming out soon. This event was recorded and will be available in about a week on our Community Site.
- If you attended this event, let us know your thoughts!
Blogs: 2 New blogs this week. One from Ylian St-Hilaire, My First Look at Intel AMT 6.0, and one from Ajith Illendula, vPro Enabled Gateway - Fast Call For Help.
Content Posted: The following content has been updated on our community:
- The Start Here Guide
- Use Case Documents
- The Setup and Configuration Service (Intel (r) SCS)
- The AMT 6.0 SDK is now out there
- We are now offering the QST SDK
- AES-NI SDK is new also
Forum Posts: The community has 21 posts out there so far this week.
Poll: We have had 23 responses so far on our current Poll (see the Community home page.)
Tweets: Follow ISNMANAGE (1 tweet so far this week) and GaelHofemeier (8 tweets so far this week.) We have been "Tweeting" updates.
Facebook Group: Yes we have one and our Community Manager, Kathy Farell has been posting. I'm thinking we need to move to a Fan Page because I did not see any of Kathy's posts until I physically visited the Group Page.
- Thoughts? Anyone interested in getting Manageability updates via a Facebook Fan Page?
I Market the World's Best Game Performance Tools
An introduction is in order; my name is Aaron Davies and I've been playing video games since I was 5. My dad brought home an Apple II and I was instantly hooked. I learned to type playing Zork and Mystery House. I learned to code so I could write my own games. I vividly remember going to a Stanford football game with my Dad, wishing the game would end soon because he promised to buy me Karateka on the way home. My first console was the Atari 2600. Yes, I personally remember playing Pong, Pitfall, and even E.T. My passion grew stronger with Excitebike and Metroid on my NES. Of all my affinity to gaming however, I was most drawn (and faithful) to PC gaming. The first time I built my own system, I did it just so I could play Falcon 4.0. I remember the first Grand Theft Auto. I was there for Duke Nukem, Wolfenstein 3D, Doom, Red Baron, Space Quest, King's Quest, Flight Simulator, you name it. I knew at a young age that I not only loved to play games, I wanted to turn my passion for games into my career.
There were no game development programs offered "back in the day", so I opted for industrial design. The reason? A lab stocked with 20 bleeding edge SGI O2 systems, and licenses of both Alias Wavefront and Maya. This was where I'd get my 3D experience. To make a long story short, I hired out of school early and had my coursework written off as work experience credits for an early graduation. My first "real" job was creating flight simulators for the US Navy and NASA; I couldn't believe my luck. It fulfilled 2 of my passions: flying and 3d graphics. Yet I would go home every night and play Jetfighter III, which visually blew the socks off the multi-million dollar simulations I was working on. It was time to level up and I took my first job in the games space at a little game dev start-up. Just a few years later I joined THQ, and after doing my fair share of crunch transitioned from Art to Production track. When the time for a change eventually came along, I considered multiple publisher and studio jobs, alongside Intel.
People inside and outside of Intel find out about my background and ask, almost without fail: "Why did you choose Intel?" I'll assume you care to know as well. You're still reading my epistle.
First of all, I got to move to Oregon. If you haven't been to Oregon, make it a bucket list item. This state is absolutely beautiful. I live 10 minutes from downtown to the East, yet only 10 minutes from work to the west. One hour to Mt. Hood for snowboarding. One hour to the coast. You get the idea.
Secondly, after teaching collegiate level courses on game development in my off-hours, and becoming intrigued/inspired by game industry-changing events, I wanted to actually be part of a big technological revolution. I had participated, on the game developer/publisher end, in multiple console and hardware launches and wanted to represent developers from within a hardware company. I wanted to see how (or if) hardware vendors listened to what game developers needed, and designed to fulfill such needs. Larrabee was my promise of involvement in something unconventional and ground-breaking.
I look back over the past 2 years since I joined Intel and realize that I've been involved in some really cool things; some public and others not so much. When I came onboard I inherited marketing and strategic responsibility for what at the time was a small, new tool to support existing and forthcoming Intel® Graphics parts. I'll never forget the first trip to engage with the first game studios to see what would later be released as Intel® Graphics Performance Analyzers. With bloodshot eyes from lack of sleep due to some last minute fixes and long flights, we began to etch a virtual Rosetta Stone for game developers, so they could scale or optimize performance and tap into the millions of Intel® Integrated Graphics based machines in the market today. We returned invigorated at the response we received from developers, armed with feedback justifying some tricky strategic decisions I'd have to go to battle for, and justified in the work we had done. Not only did we internally share the vision of what Intel® GPA had to offer, we now had external game studios who believed as well.
After a really incredible private beta program with some partners even I was surprised we were able to work with, we launched Intel® GPA at GDC'09. Leading up to launch, I shared with the engineering team my expectation for downloads & users of the new, platform-agnostic graphics performance tools. At the time I thought it to be a pretty aggressive number. The engineering manager challenged me to a bet: "Through the end of the year, I'll give you $1 for each download above that number, and you give me $1 for each download below that number." In a moment of weakness of conviction, and thinking that maybe I was just full of myself, I passed on making that bet. With a glib smirk on his face, he handed me a check at the end of 2009 for a substantial amount of money, with the bolded word "VOID" written across it. Had I made the bet I'd be picking up the tab at GDC this year. The good news is that we all realized we really do have something incredible here.
It's hard to believe it's only been one year since last GDC. It's even harder to believe that the incredible new version (3.0) we're releasing at GDC next week is so much better than what we released last year. We've added a breadth of features which allow game devs to visualize and solve performance problems on the entire PC hardware platform (CPU & GPU). I'm thrilled we've been able to partner with 2K/Firaxis to support Civilization V; a real-world example of how the tools can help developers scale their titles to play on as many PC configs as possible. If you'll be at GDC this year you've got to join us Thursday 4:30pm in North Hall Room 122, where you'll see a real-time Civ V/GPA 3.0 demo with Dan Baker from Firaxis. I firmly believe we've created something unique, superior, and (in my opinion critical) accessible whether you're running a Intel® Graphics/CPUs or not. This has been and will continue to be driven by the needs of game developers; our momentum has only just begun.
As the Sr. Marketing Manager for Game Performance Tools, I've been able to work with and become friends with a broad array of genuine AAA game developers, literally across the globe. I guess you could say that after 11 years in the game dev space I'm not personally making or producing games any more. Yet in an uncanny way, I've been able to positively impact more titles that I could have in my previous roles. I market the world's best game performance tools.


