<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: A quick foray into linear algebra and Python: tf-idf</title>
	<atom:link href="http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/feed/" rel="self" type="application/rss+xml" />
	<link>http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/</link>
	<description>Everything tech...</description>
	<pubDate>Mon, 07 Jul 2008 10:04:45 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Tim Trueman</title>
		<link>http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-110</link>
		<dc:creator>Tim Trueman</dc:creator>
		<pubDate>Mon, 30 Jun 2008 20:19:00 +0000</pubDate>
		<guid isPermaLink="false">http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-110</guid>
		<description>KARL,

This code doesn't read from a file at all, in order to keep it simple and avoid unnecessary code I just made it use strings as if they were separate documents.

The way to use this code is to replace &lt;code&gt;”"”DOCUMENT #1 TEXT”"”&lt;/code&gt; with your space-delimited document inside the three quotes (e.g. &lt;code&gt;"""The quick brown fox jumped over the lazy dog"""&lt;/code&gt;). Do this for each document you are using.</description>
		<content:encoded><![CDATA[<p>KARL,</p>
<p>This code doesn&#8217;t read from a file at all, in order to keep it simple and avoid unnecessary code I just made it use strings as if they were separate documents.</p>
<p>The way to use this code is to replace <code>”"”DOCUMENT #1 TEXT”"”</code> with your space-delimited document inside the three quotes (e.g. <code>"""The quick brown fox jumped over the lazy dog"""</code>). Do this for each document you are using.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: KARL</title>
		<link>http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-109</link>
		<dc:creator>KARL</dc:creator>
		<pubDate>Sun, 29 Jun 2008 10:31:32 +0000</pubDate>
		<guid isPermaLink="false">http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-109</guid>
		<description>Hi Tim

thanks for this code - need to do term frequency calc for entries in a database. 
not really that familiar with python. If i have a space separated file of words - where in the code above do i put the file name?
thanks</description>
		<content:encoded><![CDATA[<p>Hi Tim</p>
<p>thanks for this code - need to do term frequency calc for entries in a database.<br />
not really that familiar with python. If i have a space separated file of words - where in the code above do i put the file name?<br />
thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: m13</title>
		<link>http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-106</link>
		<dc:creator>m13</dc:creator>
		<pubDate>Fri, 20 Jun 2008 10:53:26 +0000</pubDate>
		<guid isPermaLink="false">http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-106</guid>
		<description>yes, currently I am writing my own tokenizer that will be applied before feature selection/weighting, that's why I needed to understand whether this code includes (simple) tokenization or not (as I already mentioned, I dont know python that well) ;)

thank you so much for replying in such a short time!</description>
		<content:encoded><![CDATA[<p>yes, currently I am writing my own tokenizer that will be applied before feature selection/weighting, that&#8217;s why I needed to understand whether this code includes (simple) tokenization or not (as I already mentioned, I dont know python that well) <img src='http://timtrueman.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>thank you so much for replying in such a short time!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Trueman</title>
		<link>http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-105</link>
		<dc:creator>Tim Trueman</dc:creator>
		<pubDate>Thu, 19 Jun 2008 21:43:56 +0000</pubDate>
		<guid isPermaLink="false">http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-105</guid>
		<description>Glad you've found it somewhat useful! &lt;code&gt;document.split(None)&lt;/code&gt; takes the document and splits it into an array containing each word in the document (None means use spaces as the separator). This is effectively a simple tokenizer. A more sophisticated tokenizer would provide better results perhaps but for the example I decided it was good enough.

Does that answer your question(s)?</description>
		<content:encoded><![CDATA[<p>Glad you&#8217;ve found it somewhat useful! <code>document.split(None)</code> takes the document and splits it into an array containing each word in the document (None means use spaces as the separator). This is effectively a simple tokenizer. A more sophisticated tokenizer would provide better results perhaps but for the example I decided it was good enough.</p>
<p>Does that answer your question(s)?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: m13</title>
		<link>http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-104</link>
		<dc:creator>m13</dc:creator>
		<pubDate>Thu, 19 Jun 2008 11:36:28 +0000</pubDate>
		<guid isPermaLink="false">http://timtrueman.com/2008/02/10/a-quick-foray-into-linear-algebra-and-python-tf-idf/#comment-104</guid>
		<description>hey Tim, I am not planning to use your code (besides I dont know python), I am just trying to understand it. I am doing a project reg TC and I will be using tf-idf for feature selection/weighting.

From what I understand, the above code is not for tokenized text right?

No matter the case, I've been searching google for 2 days now, and your post is the most helpful thing I found ;)

many thanks for explaining tf-idf in terms of (pseudo)code</description>
		<content:encoded><![CDATA[<p>hey Tim, I am not planning to use your code (besides I dont know python), I am just trying to understand it. I am doing a project reg TC and I will be using tf-idf for feature selection/weighting.</p>
<p>From what I understand, the above code is not for tokenized text right?</p>
<p>No matter the case, I&#8217;ve been searching google for 2 days now, and your post is the most helpful thing I found <img src='http://timtrueman.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>many thanks for explaining tf-idf in terms of (pseudo)code</p>
]]></content:encoded>
	</item>
</channel>
</rss>
