« September 2004 | Main | November 2004 »

October 31, 2004

这么多Spam

太多spam了,受不了了,决定暂时关闭comments 功能。
本来想装mt-blacklist,发现很多包的问题很麻烦。
以后有时间了自己改一下mt-comments.cgi,要求用户输入一个密码才能发表评论,
这样就无法自动发垃圾comments了。

Posted by Roy at 12:50 PM | Comments (0)

October 30, 2004

CE BCE BC AD

CE(Common Era) = AD(Anno Domini, or the year of the Lord).
BCE(Before Common Era) = BC(Before Christ).

CE and BCE are eventually expected to replace AD and BC.

From http://www.religioustolerance.org/ce.htm:
"Wikipedia: the free encyclopedia" states that the new notation is used by "Many non-Christians or secular persons." However, we suspect that the majority of users are actually Christians who want a notation that does not offend or distress persons of other religions.

The word "common" simply means that this is the most frequently used calendar system: the Gregorian Calendar. There are many religious calendars in existence, but each of these are normally in use in only a small geographic area of the world -- typically by followers of a single religion.

Also refer to wikipedia:
http://en.wikipedia.org/wiki/Common_Era
http://en.wikipedia.org/wiki/Anno_Domini

Posted by Roy at 03:47 PM | Comments (0)

Life is struggle

发信人: steve (阿粗 改过自新), 信区: EE
标 题: Re: *****直接后果就是Re: [转载] Faculty Position
发信站: Unknown Space - 未名空间 (Fri Oct 29 22:38:08 2004), 转信

【 在 thanksgiving (LEFT) 的大作中提到: 】

说个身边的事吧:我们这里算是中等rank的学校,老板不算有名也不算混混,总之是
比上不足比下有余就对了。组里毕业过两个师兄,都是中国人。大师兄2002年毕业,
做的方向很冷。大师兄为人纯朴,一心致力学术,6年来也没有参加过什么intern,
专心打理他的一亩三分冷门地。paper发了真的不少,我们几个都很佩服。可惜生不逢
时,如今申请faculty position的人太多了,很多中国人不管想不想搞学术都往这里
蹭搞得僧多粥少。大师兄投的很多简历大多石沉大海。眼看opt就要到期竟然没有一个
可以去得地方,工业界就更别提了,因为做的方向太冷,resume都不知道怎么写才好。
opt结束之后,大师兄回国了。好久也没有他的消息。最近听说他在国内的一个中小公
司谋得一份职位,做着和原来research八竿子打不着的工作。二师兄2003年毕业,总算
老天有眼,在一个中等的学校找到一份tenure track, 年薪6K(9个月的)。当时我们
group里还开了香槟庆祝呢!可是去了才知道不是这么回事。上个月和二师兄私下里
通电话他向我们大倒苦水。原来那个学校故意多招tenure track然后末位淘汰(这是
系里一个ty告诉他的,学校官方当然不回这么明讲)。也难怪,申请faculty的人太多,
如今是卖方市场,学校根本不愁没有没有assist. professor。二师兄也很担心6年以
后拿不到tenure也不知何去何从,反倒羡慕起大师兄来了。

Posted by Roy at 12:15 AM | Comments (0)

October 29, 2004

Halloween coming soon

I went to a party to carve a pumpkin by myself!
It's my first pumpkin.

pumpkin.JPG
pumpkin2.JPG

Havn't got a camera, so I took two picutres of it with a grungy webcam.

Mean, isn't it?

And there'll be a "trick or treat" party on Sunday at Lawn.
I guess I won't have time to go there.

I see some people in really wierd costumes today.
I'm expecting to see more on Halloween.

Posted by Roy at 11:20 PM | Comments (0)

October 26, 2004

关于订机票

发信人: stephen (木匠*从水木到土木), 信区: Oversea
标 题: 也说订机票
发信站: BBS 水木清华站 (Wed Oct 27 00:17:50 2004), 站内

一般订机票都是去expedia.com看,可最近发现expedia.com的票价往往不是
很低,究其原因,是因为这两年来出了不少的discount carrier,他们的票
价虽然不见得是最低的,但一般来说不会因为时间临近出发时间而变得超级
高,而且偶尔会有些促销,票价还是很合理的。然而这些公司都不会通过别
人来买票,所以expedia.com上是找不到的。下面把这些公司汇总下,或许
能发现些意外的低价呢。

JetBlue www.jetblue.com
Independence Air www.flyi.com
Song www.flysong.com
Ted www.flyted.com

因为我一般都是在东部旅行,所以这些公司基本都是飞东部的。
--


Some also suggests flychina.com for booking tickets to China.

Posted by Roy at 04:49 PM | Comments (0)

October 22, 2004

On Programming Languages

发信人: DVanguard (Blue+Crab), 信区: CS
标 题: Re: top CS conferences
发信站: Unknown Space - 未名空间 (Fri Oct 22 19:05:34 2004) WWW-POST

In my humble opinion, programming languages might have "several" top
conferences because:

1) programming language research is actually a VAST domain with a vaguely
defined boundary. Different researchers self-indentified with this area might
be working on completely different issues. For a typical non-PL CS student,
the area can be easily associated with some compiler construction stuff, such
as parsing or optimization, since a course on compilers is normally what we
first learned at undergrad time (remember those sleepless nights of hacking on
scanning and parsing?:-)), but in effect, this research area has some deep
reaches on both theoretical and applied ends. On the theory side, higher order
logic, formal semantics, theorem proving, type systems are all very relevant,
and that's I guess why a conference like LICS (a top conference in logic) and
POPL (arguably "the" top theory conference in PL) was mentioned by a previous
post. On the less theoretical side, program analysis/model checking, i.e. what
properties we can know of a program without running it, such as whether it is
secure or whether there might be memory leak, has always been a big topic.
There are also many projects with an experimental flavor, just like what you
see in a typical systems conference. A conference like PLDI has always been
good at this style of projects. (PLDI is a lot more than this I should say).
Of course let's not forget language design, depending on whether you are a
believer in object-orient languages or functional languages, you might have
different target conferences in mind, OOPSLA or ECOOP for OO languages, and
ICFP for functional languages. Last but not least, software engineering and
programming languages also have a lot of things overlapping each other. Many
software engineering researchers go to OOPSLA or ECOOP, and many programming
language researchers go to a conference like ICSE (a top conference in
software engineering).

2) because of 1), if you are a PL researcher, especially a Ph.D. student, the
conferences you can submit papers might be far fewer than those PL conferences
available, because each conference has its own "flavor" and the innate
characteristic of your research has more or less "disqualified" you of
publishing in certain conferences. For instance, do you expect a Ph.D student
who researches on register allocation to submit a paper on theorem proving,
or vice versa? :-)

3) conferences in PL tend to have a small number of accepted papers. The
number of papers accepted in each conference tend to be within the 20-30
range, if not fewer, and this is true as far as I know to all these high
profile conferences including POPL, PLDI, OOPSLA, ECOOP and ICFP. I have heard
from friends some top conferences in other areas might even accept 60 papers,
so I guess one such conference could cancel out 2 or 3 in PL. :-) In addition,
the aforementioned PL conferences all have a low acceptance rate. The six
conferences mentioned above seem to have never had an acceptance rate beyond
20%, and normally a lot lower.

BTW, it's really hard to estimate the number of groups in US doing programming
languages now, but I guess most schools in reasonable standing have some
professors in this area. Corporate research centers like Microsoft, IBM and
SUN all have sizable groups. Besides, Europe has always been a powerhouse in
this area. Places like INRIA France are perhaps as competitive as any research
centers on US soil, and many researchers in UK, Switzerland, Italy, Germany
and Denmark, etc, are out there with established status. So, it's actually
sweatily crowded. :-)

Posted by Roy at 10:42 PM | Comments (0)

October 21, 2004

Useful review of Register Renaming and Out-of-order Execution

Agian, almighty wikipedia:
http://en.wikipedia.org/wiki/Register_renaming
http://en.wikipedia.org/wiki/Out_of_Order_execution

There are many confusing terms due to some historical reasons.

Posted by Roy at 02:37 PM | Comments (0)

October 16, 2004

Research Project

Currently I am working to put ISR and strata together to achieve both security and efficiency.
Prof. Jack Davidson wants to show efficiency of doing derandomization in Strata, but Prof. Dave Evans wants to see more security since this project also works as Malware project, which mainly serves for security.

First, simple jump to malicious codes can be easily prevented in Strata by checking whether the address it's jumping to violates its allowed range. So ISR against code injection can't buy much here. In order to justify the necessity of combining ISR and strata, I must find some exploits which can work around simple check but would be prevented by ISR.

Second, which encryption method should I use? Of course I'll try simple XOR operations first, but is it secure enough? For example, let's say the key is 64 bits long, but on x86 which has varied instruction length, simple instructions like jumps might only be one byte. So if a hacker somehow get the first 16 bits by brute force, he can inject code like this:
A Malicious instruction; JUMP L1; xxx
L1: Another Malicious instruction; JUMP L2; xxx
L2: ...

Therefore although hacker didn't know the high part of a key, he can still apply exploits.
So I should look for some strong encryption methods. However would that increase overhead?

Third, indirect jumps. Could Diablo identify all the headers of basic blocks? Since indirect jumps are undecidable at static time. However it declares to be able to handle those cases by preserving more information. I must see if such bad things will happen. If this would happen, there must be a coordination between randomizer and derandomizer, namely how to do the alignment.

Posted by Roy at 11:53 PM | Comments (0)

Diablo, a link time program rewriter

I think I have to write down some notes here.
Currently I'm trying to do Instruction Set Randomization on statically linked executable on x86 linux machines.

ELF files can be directly operated or by BFD library. Many GNU tools (to be more specific, binutils) such as objcopy, objdump, ld, gdb are using BFD. (Not sure if as also uses it.)

But direct operation on ELF doesn't work well with my purpose. Since I'm want to transform some basic blocks but keep other intact.

Diablo is in some sense a full-fledged tool that can handle these issues.


Diablo reads in an executable and a linker map file. This map file can be created by most linkers and it describes the memory mapping of the relocatable object files in memory. It then builds a graph representation of the object files listed in the map that is very close to the final executable (all used relocatable object files should still be available. This means you should compile each object file in turn, using the -c option of your compiler, or use the option -save-temps in gcc. The executable is also needed, but is mainly used to verify the graph). In this graph the nodes are the sections in the object files and the edges are memory references from one object file section to another (the relocation information is used to find these references). Next the code sections in the object files are split into smaller blocks (basic
blocks) by using the disassembly of this section. By doing this a new graph is created that doubles as both an interprocedural control flow graph (ICFG) and a fine grained memory reference graph. On these graphs all kinds of analyses and optimizations can be applied. Instructions in the basic blocks are represented with both a very simple architecture independent representation (like the used registers) and an architecture dependent representation (the full disassembly of the instruction). Some analyses/optimizations use the independent representation, others use the dependent. How well you need to understand both program graphs depends on what you are trying to do. For your purpose it might be sufficient to simply work on the ICFG, and in that case, you shouldn't have to worry about the other representation. As said, some more information about what you will be using Diablo for would be useful.

Finally some notes on why we patch the toolchain. We patch the toolchain for two reasons: first of all, some toolchains (like binutils) try to compact relocation information. In doing so, they make it impossible for us to infer the target address from the memory reference the relocation represents. We simply turn of that optimization. Second, disassembling a program when data is present in the code sections is recursive undecidable. But the assembler knows what bytes in the code section are data, and what bytes in the code section are instructions. We let the assembler export this information to diablo.

Below are my analysis and guess.
At the very beginning, Diablo reads in the executable and map file, and does some computation. After that, some basic information is stored in the root object and subobjects.
However in order to get the full set of information you have to merge them.

After merge, Diablo can iterate through all the sections of the object. At this time the data field of a section holds the raw data. But the data field is a void-type pointer whose content varies at different stages. After disassembly the raw data is converted into a simple data graph whose node is of t_node type, and each node is actually of t_ins type, which implies an instruction. And after control graph construction, the data field is then converted into t_cfg type. At this time the data is a control graph, so that functions, basic blocks and instructions can be extracted.

In order to write back, the data must be converted back to the original raw data. To do so Diablo first deflowgraph it and then assemble it. After that this raw data can be written back.

The code director is organized into different parts. Kernel directory provides core services and data structures, Fileformats directory provides common read/write supports, and Arch directory provides architecture dependent services.

In order to implement my purpose, I think before disassembly I must add a new field to preserve the original data. And during disassembly I should track the offset of each instruction so that I can tell the start point and end point of every basic block. And instead of calling deflowgraph and assembly, I only need to write back the transformed raw data.

A difficult task is to tell which original object file a basic block belongs to. In order to implement this some information must be preserved during merge or reading in.

Posted by Roy at 09:57 PM | Comments (0)

October 14, 2004

Reading Books

GEB,公认的经典之作,但涵盖面太宽,不知能否看得下去。
现在在看Bach的介绍,什么fugue,canon,一坨专业术语。。
一些连接:
http://www.faqs.org/faqs/books/hofstadter-GEB-FAQ/
http://geb.stenius.org/old/
引自 http://www.cp1897.com.hk/Choice?Cpid=5
  這本書的內容是如此寬泛,講了音樂( 巴赫 ),講了藝術( 艾舍爾 Maurits Cornelius Escher,1898 - 1972 ),講了分子生物學、電腦語言、人工智能以至禪。多年來,許多讀者讀畢全書,竟然歸納不出這本書究竟是要說什麼。為此,作者特意為 1999 年的 20 週年紀念本,加了一篇 23 頁的新序言加以說明。作者所想討論的,其實是一個普遍的問題:“自我”是什麼。

  作者真正想寫的就是這個問題。盡管全書涉及廣泛,核心卻是哥德爾,是數學基礎。哥德爾( Kurt Godel,1906 - 1978 )是一位奧地利出生的數學家。 1931 年他發表的現在被稱之為不完全性的定理,是 20 世紀最具革命性的發現。大體上,這個定理是說,在任何公理體系中,必定有這樣的命題,用這個公理體系“自身”既不能證明其真,也不能證明其偽。關於哥德爾,有一本評傳值得推薦,那就是上海譯文出版社 1997 年出版的,著名華裔數學家王浩寫的《哥德爾》。

还借了一本Mark Twain的The Adventure of Tom Sawyer,准备朗读出来,提高口语。有人认为在Mark Twain之后才有了American Literature,terrible。
有时间的话要把这一系列3本小说都看完,这个好像是第一本。
以后有空了再把原版的LOTR借出来看看,hoho。

这几个link都不咋地。
推荐这两个:
http://www.geocities.com/g0del_escher_bach/
http://www.people.vcu.edu/~elhaij/GEB/

Posted by Roy at 03:23 PM | Comments (0)

Google Desktop

Google is really a killer.
不过好像要牛机才能装上。
http://desktop.google.com/

Posted by Roy at 03:21 PM | Comments (0)

October 10, 2004

Academic Writing Tips

http://www.cis.upenn.edu/~stevez/writing-tips.html

这个周末去了Fairfax姑姑家,可以在我的Personal目录下看到在Great Falls Park照的照片,以及在 摩门教堂门口照的照片。据说摩门教徒必须将20%的工资捐献出来,因此该教很有钱。

Here are a few thoughts about academic writing.

Avoid using "we".
Philosophy: The word "we" is often used by lazy writers because it provides an easy way to give a sentence a subject. The problem is that doing so usually dilutes the impact of the sentence or obscures the true subject.

Here is a real-world example (taken from a published paper): "In this paper we focus on statically checking behavioral properties of ..." The authors of the paper have little to do with the main point of the paper. The sentence above would be better as: "This paper focuses on statically checking behavioral properties of ..." This version emphasizes the true subject of the sentence, "this paper". It's also shorter.

Unless the true subject of the sentence is the authors, avoid using "we". An acceptable use is: "We would like to thank the anonymous referees for providing helpful feedback on the earlier draft of this work."

Parallelism is good.

When a paragraph, bullet list, or sentence contains similar components, those components should use parallel construction. Opportunities for parallelism include: similar sentence structure, repeated verbs, repeated subjects. Required parallism: verb tense and noun plurality.

Citation references are not nouns.
Philosophy: The point of writing is communication to the reader. Because citation references are often numbers or alpha-numeric strings, it is difficult for the reader to ascribe them meaning. The reader should not need to refer to the bibliography to understand a sentence.

Example: "As shown in [7], static type systems ..." should be "As shown by Harper et al. [7], static type sytems..." or "As shown previously, static type systems ... [7]."

With citation styles that use the author's name as the index it is sometimes permissible to use the reference as a noun. For example, "As shown in (Harper et al. 1999), static type systems ...". But in this style even better would be "As shown by Harper et al. (1999), static type systems...".

Good writing is readable. (Read your writing out loud.)
Reading a sentence or paragraph aloud can reveal defects in its structure. Paragraphs that use the same sentence structure too frequently often sound choppy or awkward when read aloud. Complex phrases that trip up the tongue indicate that the sentence may need to be edited.

When in doubt, look it up.
There are many excellent resources to improve writing skills. Two of my favorite online resources are Strunk & White and www.dictionary.com. One writing book intended especially for Computer Scientists is "Bugs in Writing: A guide to debugging your prose" by Lyn Dupre. Also see "An Evaluation of the Ninth SOSP Submissions, or, How (and How Not) to Write a Good Systems Paper" by Roy Levin and David D. Redell and "How To Have Your Abstract Rejected" by Mary-Claire van Leunen and Richard Lipton.

Posted by Roy at 10:03 PM | Comments (0)

October 05, 2004

终于搞定了SimpleScalar的第二个作业

为了调通他,我好几个晚上早睡的计划都被他打乱了。。shit
simulator模拟程序执行,最大的错误可能就是跑到错误的地方取了错误的指令,于是就一路错下去。。。
而且非常难调试,不管是用gdb还是看output。
我就是忘记了squash一个指令的所有side effect,只squash了一部分,就出现了极难找出来的错误。。
这个是搞定了,我的research project怎么办啊,现在每周都好怕见老板。。。
睡眠不足,别的人怎么能每天都三四点睡觉的呢?shit

Posted by Roy at 11:46 PM | Comments (2)

October 04, 2004

CMU就是CMU阿

啧啧,CS的faculty们都有wiki了:
http://pl.ug.cs.cmu.edu/csd/

Posted by Roy at 11:21 PM | Comments (0)

Automobile Words

http://216.239.39.104/search?q=cache:yC2xRzSqvEgJ:www.pcauto.com.cn/teach/qczs/10309/22081.html+stick+shift&hl=en&lr=lang_zh-CN

So many used cars are stick-shifts, I want to have an automatic one.

Posted by Roy at 04:49 PM | Comments (0)

October 02, 2004

Debian on Dell Inspiron 600m

ACPI似乎不支持,一开始BIOS版本是A13,可以看到内核信息中有“Dell Inspiron with broken BIOS detected. Refusing to enable the local APIC”。升级到A14后,好像是不抱错了,可poweroff还是不能自动关闭电源。
Fn Key还没设置,debian下面可以直接安装一个包来支持,可能还需要设置一下。
可以参考这几篇文章:
http://zhouzheng.8u8.com/dell300m/linuxon300m-cs.htm
http://www.softlab.ece.ntua.gr/~amanous/Inspiron-Linux/
http://www.users.fast.net/~eclectic/debian-8600.html

按照ndiswrapper主业的说明一步步安装倒是装上了驱动,运行ndiswrapper -m后会在/etc/modprobe.d/ndiswrapper中设置wlan0为ndiswrapper。
对linux系统是如何管理各个network interface,各驱动程序如何注册interface的机制还不是很清楚,对了,可以问问雄鸡。
雄鸡,你要是看到了这一片,记得留言阿。
当然还要安装wireless-tools和waproamd,前者是手动设置,后者是自动设置。
iwconfig用于设置网卡,iwlist wlan0 scan可以指示wlan0去扫描可用的服务器。
dhclient wlan0可以使得它自动分配ip。
现在的问题是,不知道是waproamd使得我的系统狂慢,还是ndiswrapper包装windows的驱动作的不够好导致系统狂慢,而且modprobe -r ndiswrapper和waproamd -k把两者去掉都不行,必须在开机前就使得驱动不工作。
更bt的是,用top命令看CPU和VM使用都很少,是不是这种方法不够精确,看不到内核啊?

唉,虽然能上网了,系统慢得几乎不能用,还得找找solution。

Posted by Roy at 05:51 PM | Comments (0)

最棒的ReiserFS tool under windows

http://yareg.akucom.de/
需要DotNET。

Posted by Roy at 11:48 AM | Comments (0)

见鬼的Broadcom BCM4306

Broadcom的无线网卡根本没有开放驱动程序,真是bt。可能这里面(802.11g)有它的一些先进技术,暂时还不愿公开吧。
BCM4301在sf上有一个开源项目,但BCM4306似乎还不被支持。
这里有个Dell Inspiron 600m安装debian的简介,但我的网卡是帅旗帮我加钱升级的,因此我也没仔细看那一段。
其他的都很好,基本上不需要安装驱动,显卡选ATI,有线网卡直接被tg3模块所支持。
还是对linux不熟练,我以为lspci显示出来的设备就是有驱动了,就在狂设置无线网卡,后来才意识到怎么也搞不出interface,估计是驱动没装上。
一开始还奇怪怎么在各个log里面就是找不到无线网卡的消息呢。
直到在/lib/modules/2.6.../kernel/drivers/net/wireless/找不到驱动,才相信是真的没有驱动了。
估计PCI总线设备都有一定的标准,系统直接可以查询出设备名,但不需要支持他。我开始看到什么IRQ等等信息显示在那里以为就装上了呢。lspci -vv显示的IRQ是7。windows中它用的不是这个。
我查log里面parport0(并口0?)报告说IRQ7冲突,改用轮询机制,不知道是不是因为这个。
最后在网上search了一下,终于相信只有sf上的ndiswrapper可以解决它,郁闷啊,昨天晚上好几个小时等于是浪费了。
Broadcom这个bt!
更郁闷的是slashdot上面说802.11g标准还没有完全定稿(也许帖子old了?),叫大家不要买,ft。

> It seems, that the specs haven't been released yet. There are quite a few Wlan
> cards out there based on the Broadcom chips (nearly all cards, that support
> 802.11g), so it's quite a shame. (Actually this fits the the TrueMobile 1180,
> 1300 and 1400, speaking of Dell wireless lan cards).
...
> The same problem is with the Intel Prowireless 2100 (Centrino) WLan card. No
> Linux support available yet, which is another choice for the Dell notebooks at
> the moment.


Don't expect specs or opensource drivers for any of these pieces
of hardware until these vendors figure out a way to hide the frequency
programming interface.


Ie. these cards can be programmed to transmit at any frequency,
and various government agencies don't like it when f.e. users can
transmit on military frequencies and stuff like that.


The only halfway plausible idea I've seen is to not document the
frequency programming registers, and users get a "region" key file that
has opaque register values to program into the appropriate registers.
The file is per-region (one for US, Germany, etc.)and the wireless
kernel driver reads in this file to do the frequency programming.


So don't blame the vendors on this one, several of them would love
to publish drivers public for their cards, but simply cannot with
upsetting federal regulators.


Posted by Roy at 11:28 AM | Comments (0)

Reverse engineering approaches

http://dxr3.sourceforge.net/re.html

Posted by Roy at 11:27 AM | Comments (0)

October 01, 2004

Categories of automobile

http://en.wikipedia.org/wiki/Category:Automobiles
http://en.wikipedia.org/wiki/List_of_automobile_manufacturers

Posted by Roy at 10:39 PM | Comments (0)