It's not perfect, however. How could you expect it to be? The number of combinations of physical hardware, virtual hardware, and guest/host operating systems are grand indeed.
I'd like to share a few solutions to common problems I came across lately while working with it. From what I can tell these problems are widespread and cause many people headaches. In the same spirit of my previous tools post I'll keep adding to it as time goes by.
Problem: Linux installs freeze when booting at:
NET: Registered protocol family 2
Host: At least Windows 7, 64-bit
Guest: Many linux distros.
Solution: Add the following kernel parameters:
noapic nolapic noacpi
Problem: Even with Guest Additions I can't get a resolution better than 1024x768 .
Host: Any
Guest: Various Linux's
Solution: Add a "Modes" line to the Screen/Display section of
your /etc/X11/xorg.cfg as follows:
Section "Screen" Identifier "Screen0" Device "Videocard0" DefaultDepth 24 SubSection "Display" Viewport 0 0 Depth 24 Modes "1440x900" "1280x800" "1024x768" "800x600" EndSubSection EndSection
Problem: Guest operating system freezes upon any significant network traffic.
Host: At least Windows 7, 64-bit
Guest: Many linux distros.
Solution: Use one of the Intel virtual network cards rather than the PCnet Fast III default.

It's no secret that parallel computing is becoming more important. As clock
speeds have stagnated and the number of cores per die have increased
one thing has become clear. Software developers have to adapt to the current
state of processors by writing code that's more parallelizable.
In the past many programmers have avoided parallel processing when possible mainly due to its complexity even in the face of an obvious increase in throughput. Those that have parallelized have often done it poorly and suffered through some serious misery as a result.
We're running out of options, though. In order to get more done faster we have to do more at once.
Development platform providers have been scrambling lately to try to simplify parallel development and minimize the amount of work we'll have to invest in writing parallel code. It's clear that there's much market share to be gained by handling this problem well.
Microsoft's Approach
In .Net 4.0 Microsoft's attacking the problem by way of the Task Parallel Library and Parallel LINQ. These were previously available as a separate package under the name Parallel Extensions but have now officially joined the framework.
Task Parallel Library
The Task Parallel Library, or TPL, provides a set of tools that simplify common chores associated with parallel programming. Not only does it ease initial development but it also simplifies maintenance by auto-scaling to the capabilities of the machine hosting the code.
Tasks
The core component of the TPL is the Task, which is a single operation that can be executed asynchronously. These tasks are enqueued in a thread pool which manages their execution. The plumbing of maintaining the queue is entirely automated making the job of the application programmer quite simple.
Consider the following example which executes two crude tasks simultaneously.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
/* import the task parallel libary */
using System.Threading.Tasks;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
/* create a task that prints a bunch of "A"s */
Task t1 = new Task(() => {
for(int i = 0; i <= 255; i++) {
Console.WriteLine("A");
}
}
);
/* create a task that prints a bunch of "B"s */
Task t2 = new Task(() => {
for (int i = 0; i <= 255; i++)
{
Console.WriteLine("B");
}
}
);
/* execute each task */
t1.Start();
t2.Start();
/* wait for both tasks to complete */
t1.Wait();
t2.Wait();
}
}
}
Here's a sample of possible output:
A A A A A A B B B B A A A A B B B
In that example two tasks run more or less concurrently. The real advantage of the thread pool in the back end is that I could queue up more tasks than my processors could handle and the pool would manage how they're worked through.
Simpler still you can execute tasks with the Invoke static method of the Parallel class. Note that the Invoke method blocks until all operations passed-in have completed.
Parallel.Invoke(
() => { /* operation 1 */
for(int i = 0; i <= 255; i++) {
Console.WriteLine("A");
}
},
() => /* operation 2 */
{
for (int i = 0; i <= 255; i++)
{
Console.WriteLine("B");
}
}
);
The Parallel class boasts a few other goodies as we'll see below.
Parallel Looping
With the basics out of the way lets look at a real time-saver. It's common to simply repeat an operation a number of times or execute on operation on each item in a list. The TPL provides a means of accomplishing that in parallel with the For and ForEach static methods of the Parallel class.
Here's a Parallel.For example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Parallel.For(0, 24, i => {
Console.WriteLine(i);
}
);
}
}
}
Which could produce output similar to:
0 1 2 12 13 3 4 5 6 8 9 10 11 15 16 17 14 18 19 20 21 22 23 7
Parallel.ForEach is similar except it iterates an IEnumerable. For demonstrative purposes I'll create the functional equivalent of the example above except it will iterate a list of integers.
Parallel.ForEach<int>(Enumerable.Range(0, 32), n =>
{
Console.WriteLine(n);
}
);
Parallel LINQ
Another useful parallel processing tool available in .Net 4.0 is Parallel LINQ, or PLINQ. There really isn't much to it. Consider the following example which looks like your typical LINQ operation except the AsParallel method of IEnumerable is called resulting in an ParallelEnumerable. This allows LINQ operations to be performed in a parallel fashion.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
class Program
{
struct Item {
public int Id;
public int Value;
}
static void Main(string[] args)
{
List<Item> items = new List<Item>();
Random rand = new Random();
foreach (int i in Enumerable.Range(0, 256))
{
items.Add(new Item() { Id = i, Value = rand.Next(10) });
}
int ct = (from item in items.AsParallel()
where item.Value > 5
select item).Count();
Console.WriteLine(ct);
}
}
}
An interesting and simple twist on the above sample that would look familiar to T-SQL developers is that it's easy to specify the maximum number of cores to be used with ParallelEnumerable's WithDegreeOfParallelism method.
int ct = (from item in items.AsParallel().WithDegreeOfParallelism(2)
where item.Value > 5
select item).Count();
Conclusion
Based on the examples above it's clear that parallel programming has become at least a bit simpler with .Net 4.0. I've only scratched the surface here. There are still many more parallel tools (some good, some bad) available in .Net 4.0 and I recommend checking out the Parallel Programming documentation at MSDN for a more complete story.

The reason I find Clojure particularly interesting is that it's designed to be
hosted in the Java Virtual Machine and the .Net Common Language Runtime (via the DLR). From a
practical perspective that's wonderful considering integration with other
commonly used libraries in the business world is a snap. I'm sure it annoys
Lisp purists, but it makes Clojure much more adoptable between 9 and 5.
As is frequently the case this is not something I'm an expert in and am by no means qualified to write a proper tutorial. I would, however, like to share some of the code I whipped up while aquainting myself with Clojure in the hopes it can help other Clojure neophytes. I'll cover the basic language, JVM/CLR integration and Software Transactional Memory (STM).
Factorials, the Classic Example
To demonstrate the basics for someone unfamilier with Lisp I'm going to take an approach that I haven't taken in my previous demonstrations. Here I'm going to solve the same problem a few different ways. It's a common, simple and functional problem: factorials. Consider all the following functionally equivalent function declarations:
Factorial Style 1: Recursion
(defn fact[n]
(if (= n 1) 1
(* n (fact (
dec n)))))
There's a simple, more imperitive looking recursive implementation. By no means idiomatic in Lisp terms, but mathematically correct.
Factorial Style 2: Tail Recursion
(defn fact [n]
;; this is where tail recursion enters
(loop [cnt n acc 1]
;; we're done
(if (= 1 cnt) acc
;; otherwise recurse
(recur (dec cnt) (* acc cnt)))))
Example 2 uses the recur-loop form which, while recursive, won't consume additional stack space as style 1 would. Hence the benefit of tail recursion optimization.
Factorial Style 3: Reduction
(defn fact [n]
(reduce * (range 1 (inc n))))
The approach in example 3, while iterative, is perhaps the most interesting. It uses the reduce and range functions that would be familiar to python programmers. reduce essentially starts out by applying a specified function, multiplication in this case, to two members in a set. The result of the function and the next member in the set are then fed back into the reduction function until the entire set has been processed. The range function is used here to produce a sequence upon which reduce will iterate.
Java Runtime
Now that we've covered some basics let's look at some interaction with the Java runtime. The following is my classic example of twitter status retrivial via the REST API:
(import '(javax.xml.parsers DocumentBuilderFactory DocumentBuilder)
'(javax.xml.xpath XPathFactory XPath XPathExpression XPathConstants))
;; function to retrieve a twitter user's status
(defn getStatus[userName]
(let [domFactory (. DocumentBuilderFactory newInstance)
builder (. domFactory newDocumentBuilder)
;; build the url
url (str "http://twitter.com/users/" userName ".xml")
;; load contents located at the url into the builder
doc (. builder parse url)
;; create xpath plumbing
factory (. XPathFactory newInstance)
xpath (. factory newXPath)
expr (. xpath compile "/user/status/text/text()")]
;; pull the user's status out of the document
(. expr evaluate doc (. XPathConstants STRING))))
(println (getStatus "chrisumbel"))
CLR
Here we go again with a twitter status example, this time in the .Net runtime.
(System.Reflection.Assembly/Load "System.Xml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089") (import '(System.Xml XmlDocument)) ;; function to retrieve a twitter user's status (defn getStatus[userName] (let [ ;; build the url url (str "http://twitter.com/users/" userName ".xml") ;; instanciate an XML document and load the contents located ;; at the url into it doc (doto (XmlDocument.) (. Load url))] ;; pull the user's status out of the document (. (. doc SelectSingleNode "/user/status/text") InnerText))) (println (getStatus "chrisumbel"))
Software Transactional Memory
It's hard to discuss a language these days without evaluating how it handles concurrency. Clojure does not disappoint in this regard as it brings Software Transactional Memory (STM) to the table. Note the dosync call on line 11. Anything that happens within that scope is effectively wrapped in a transaction and is atomic. If a conflict occurs the transaction will be retried.
(import '(java.util.concurrent Executors))
(let [ val (ref 0)
;; thread pool will contain 4 threads
pool (Executors/newFixedThreadPool 4)
;; create a list of 4 tasks that count to 1000
tasks (map (fn [t]
(fn []
(dotimes [n 1000]
;; start a transaction
(dosync
(doseq []
;; add one to the val
(alter val + 1)
(println (str "OP1: " t " : " n " : " (deref val)))
;; add another one to the val
(alter val + 1)
(println (str "OP2: " t " : " n " : " (deref val))))))))
(range 4))]
;; spawn the threads
(doseq [future (.invokeAll pool tasks)]
(.get future))
(.shutdown pool)))
Conclusion
Clojure is an interesting combination of old and new. Lisp was designed in 1958 and is the second oldest high level language. The virtual machines that Clojure runs on are far newer and concepts like STM are only now making their way into the mainstream. Somehow all of these technologies still appear to fit together at least well enough to the job done. I look forward to attempting a serious project in Clojure and definately recommend giving it a look.
Relevant Links
- clojure.org - Official Clojure homepage.
- Clojure github - github home of Clojure.
- enclojure - Clojure plugin for the Netbeans IDE.
- Clojure-CLR github - github home of the CLR/DLR port.
- Clojure API - Clojure's core API documentation.

When Google Wave was first announced I was pretty excited. The concept seemed
perfect. Broad like twitter but rich like email. Brief like instant
messenger but collaborative like a message board.
Things have been somewhat slow going in beta thus far. But hey, it's still beta. If Google refines it a bit and wave catches on (what actual does catch on these days seems to be a crap-shoot) it has the potential to provide tons of value.
One of the possibilities I find particularly interesting is the use of robots. No, there's nothing underhanded about it, a robust robot API is provided for that very purpose. Automated programs that are participants in the conversation.
Shortly after getting development sandbox access I had to get to work on one. While I'm going to keep the features of the actual bot I'm writing close to the vest for now I'll at least share an example I used while learning.
Platform
Google Wave robots must exist on Google's AppEngine, at least for now (this
restriction will ultimately go away). That limits your language choice to
either Python or Java while using the AppEngine SDK. When using Java you also have to include
the json.jar and jsonrpc.jar libraries in your /war/WEB-INF/lib/, both of which can be
found here.
I got started developing for Wave with Java. I'm not exactly sure how that happened considering how I love me some Python. Nonetheless I dusted off my Java cap and got to work. It's been a while, be patient with me, please.
Handling
From a Java point of view Wave robots are simply servelets that process events. What kind of events? Anything from a new participant entering a wave (a conversation) to a blip (the basic atom of a wave) being started or completed. What's important, however, is that you declare what events you plan on handling up front. That's accomplished by creating a /war/_wave/capabilities.xml file similar to what follows.
<?xml version="1.0" encoding="utf-8"?>
<w:robot xmlns:w="http://wave.google.com/extensions/robots/1.0">
<w:capabilities>
<w:capability name="BLIP_SUBMITTED" content="true" />
</w:capabilities>
<w:version>1</w:version>
</w:robot>
That example specifies that the servlet will be called after a blip is completed. Note that if you want to change what events are handled in this file you must increment the version tag in order for your changes to take effect.
Servlet
I might as well hit you strait up with it. Essentially you have to subclass com.google.wave.api.AbstractRobotServlet and override processEvents. It's within processEvents that you'll perform your magic.
In the case of this example I'll read out the text of the previously
completed blip (the one that fired this event) and try to find stock ticker
symbols by way of the pattern "ticker:
import com.google.wave.api.*;
import java.net.*;
import java.io.*;
import org.json.*;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class StockPriceBotServlet extends AbstractRobotServlet {
private static final long serialVersionUID = 1L;
@Override
public void processEvents(RobotMessageBundle bundle) {
Wavelet wavelet = bundle.getWavelet();
String ticker;
for (Event e: bundle.getEvents()) {
if (e.getType() == EventType.BLIP_SUBMITTED) {
/* grab the text of the blip that fired this event */
String userBlipText = e.getBlip().getDocument().getText();
/* search for the trigger to act */
Matcher matcher = Pattern.compile("(ticker\\:)(\\w*)").matcher(userBlipText);
/* iterate all matches */
while (matcher.find()) {
/* add a blip to the wave */
Blip blip = wavelet.appendBlip();
TextView textView = blip.getDocument();
ticker = matcher.group(2);
try {
/* connect to google */
URL url = new URL(String.format("http://www.google.com/finance/info?client=ig&q=%s", ticker));
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
StringBuilder sb = new StringBuilder();
/* read response from google into a StringBuilder */
while ((inputLine = reader.readLine()) != null)
sb.append(inputLine);
/* get rid of wrapper google adds */
sb.delete(0, 4);
/* parse response into a JSON object */
JSONObject o = new JSONObject(sb.toString());
/* pull out the property named "l" and send it back to wave */
textView.append(String.format("%s: %s", ticker, o.getString("l")));
reader.close();
} catch(Exception ex) { }
}
}
}
}
}
Deployment
Before you deploy you must set up your /war/WEB-INF/web.xml like so:
<web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5">
<servlet>
<servlet-name>StockPriceBot</servlet-name>
<servlet-class>StockPriceBotServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>StockPriceBot</servlet-name>
<url-pattern>/_wave/robot/jsonrpc</url-pattern>
</servlet-mapping>
</web-app>
Now you must deploy the bot to AppEngine as you would any Java AppEngine project.
Use
Now it's time to actually use this contraption. It's rather strait forward. Just invite <app name>@appspot.com to your wave where <app name> is the appspot designation of the code you just deployed. From there just have a conversation with the bot as follows:

Next Steps
Wave Eliza, perhaps?

I've been fiddling with
Lucene a good bit of late and have been quite
impressed. It's more than just a "blazing fast" full-text indexing system,
especially when implemented via
Solr. With Solr it becomes
an incredibly scalable, full-featured and extensible search engine platform.
I had always assumed that the Lucene stack wasn't for me. For the most part I store my data either in SQL Server or MySQL, both of which have perfectly adequate full-text search capability. It turns out that I could have saved myself a few headaches and saved my employer some money by adopting Solr and not writing my own faceting, caching, etc.
Naturally, Lucene/Solr isn't for everyone. If you just have a few hundred-thousand rows of text that you want to perform some basic searches on under light load then you're probably better off using the full-text search facility within your RDMS.
However, If you need to scale out widely, perform faceted searches or use some advanced/custom search techniques then it's probably worth looking into Solr, even if you're already deployed under an RDBMS with full-text support.
In this article I'll outline the *VERY* basics of getting Solr up and running using SQL Server as a data source. While I'm actually doing this in production under Linux I'm going to tailor my instructions to Windows here to appeal to the average SQL Server DBA. I'll also employ the AdventureWorks sample database for demonstrative purposes.
Note that you'll have to have TCP/IP enabled in your SQL Server instance. Named pipes, VIA and shared memory won't cut it.
Step 1: Download and install Java
Solr and Lucene are written in Java so a Java Runtime is a prerequisite. It can be downloaded here.
After installation make sure to set the JRE_HOME environment variable to your Java install directory i.e. C:\Program Files\Java\jre6
Step 2: Download Tomcat
Solr requires a servlet container. I recommend
Tomcat which can be downloaded
here. Then extract it to C:\tomcat6 (Note that I'm going to hang this all right off C:\ to keep the tutorial simple).
Step 3: Download Solr
This whole thing's about Solr, right? You can pick it up here. Extract the contents to a temporary location.
Step 4: Move Solr into Tomcat
Copy:
- apache-solr-1.4.0\example\solr to c:\tomcat6
- apache-solr-1.4.0\dist\apache-solr-1.4.0.war to c:\tomcat6\webapps\solr.war
Congratulations! Solr is essentially operational now, or would be upon starting tomcat. It'd just be devoid of data.
Step 5: Download and install a SQL Server JDBC driver
In order for Java to talk to SQL Server we'll have to supply a JDBC driver. There are many available but I used Microsoft's which can be downloaded here. Note that there's also a unix version available.
Now create a C:\tomcat6\solr\lib folder. Copy the file sqljdbc4.jar out of the archive downloaded above into it.
Step 6: Configure the import
Create a C:\tomcat6\solr\conf\data-config.xml file and put the following content in it, modifying it to the details of your configuration, naturally. This file defines what data we're going to import (SQL statement), how we're going to get it (definition of JDBC driver class) and where form (connection string and authentication information). The resultant columns are then mapped to fields in Lucene.
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://localhost\INSTANCENAME;databaseName=AdventureWorks"
user="TESTUSER"
password="TESTUSER"/>
<document name="productreviews">
<entity name="review" query="
SELECT ProductReviewID, ProductID, EmailAddress, Comments
FROM Production.ProductReview">
<field column="ProductReviewID" name="id"/>
<field column="ProductID" name="product_id"/>
<field column="EmailAddress" name="email"/>
<field column="Comments" name="comments"/>
</entity>
</document>
</dataConfig>
Step 7: Tell Solr about our import
Add the following requesthandler to C:\tomcat6\solr\conf\solrconfig.xml:
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">C:\tomcat6\solr\conf\data-config.xml</str>
</lst>
</requestHandler>
This essentially allows Solr to perform operations of the data import we defined above upon a visit to the /dataimport URL.
Step 8: Configure schema
Ensure the fields are set up correctly in C:\tomcat6\solr\conf\schema.xml. There will be plenty of example fields, copy fields dynamic fields and a default search field in there to start with. Just get rid of them.
<fields> <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="comments" type="text" indexed="true" stored="true"/> <field name="email" type="string" indexed="true" stored="true"/> <field name="product_id" type="int" indexed="true" stored="true"/> <field name="text" type="text" indexed="true" stored="false" multiValued="true"/> </fields> <copyField source="comments" dest="text"/> <copyField source="email" dest="text"/> <defaultSearchField>text</defaultSearchField>
There's quite a bit of power that I won't go into in this article dealing with Solr schemas. Dynamic fields, copy fields, compression... Needless to say it's worth reading up on which you can do here.
Step 9: Start Tomcat
OK! We're finally configured well enough for an import. All we have to do is start up Tomcat. Make sure you're in Tomcat's directory as the quick-and-dirty configuration I showed you here requires it in order to find the Solr webapp.
c:\tomcat6>.\bin\startup.bat
If you'd like to move the Solr webapp elsewhere on the filesystem, remove the requirement for starting in Tomcat's directory or perform an advanced configuration please see the Solr with Apache Tomcat article in the Solr Wiki. Pay special attention to the section labeled, "Installing Solr instances under Tomcat" where they show you how to create contexts.
Step 10: Import
Now visit http://localhost:8080/solr/dataimport?command=full-import with a web browser. That'll trigger the import. Because we're just importing a small amount of test data the process will be nearly instantaneous.
Step 11: Observe your results
That's it! You can verify your work by issuing a query against Solr with a RESTful query like http://localhost:8080/solr/select/?q=heavy&version=2.2&start=0&rows=10&indent=on that searches the index for all reviews with the word heavy in the comments.
Pitfalls
There are a number of reasons a data import could fail, most likely due to problem with the configuration of data-config.xml. To see for sure what's going on you'll have to look in C:\tomcat6\solr\logs\catalina.*.
If you happen to find that your import is failing due to system running out of memory, however, there's an easy, SQL Server specific fix. Add responseBuffering=adaptive and selectMethod=cursor to the url attribute of the dataSource node in data-config.xml. That stops the JDBC driver from trying to load the entire result set into memory before reads can occur.
Next Steps
So we've gone from zero to a functioning Solr instance rather quickly there. Not too shabby! However, we've only queried Solr through REST. Libraries like solrnet are handy for wrapping objects around the data in .Net. For example:
/* review domain object */
public class Review
{
/* attribute decorations tell solrnet how to map
the properties to Solr fields. */
[SolrUniqueKey("id")]
public string Id { get; set; }
[SolrField("product_id")]
public string ProductID { get; set; }
[SolrField("email")]
public string EmailAddress { get; set; }
[SolrField("comments")]
public string Text { get; set; }
}
class Program
{
static void Main(string[] args)
{
/* create a session */
Startup.Init<Review>("http://localhost:8080/solr");
ISolrOperations<Review> solr =
ServiceLocator.Current.GetInstance<ISolrOperations<Review>>();
/* issue a lucene query */
ICollection<Review> results = solr.Query("comments:heavy");
foreach (Review r in results)
{
Console.WriteLine(r.Id);
}
}
}
Resulting in:
2 4
If you're totally new to Solr it's worth checking out the wiki. It outlines the handy features such as replication, facets and distribution.

Digg it
Reddit
Delicous
Facebook










