Wednesday, December 30, 2009

查找X86_64下内核代码段物理地址的方法

查找X86_64下内核代码段物理地址的方法

一. 目标: 我想找出X86_64 LINUX 2.6.18的内核代码段的物理地址. 找出这个地址有什么用呢? 一般人是用不到的. 不过如果你知道什么是DMA ATTACK. 同时又想用这个方法来攻击X86_64 LINUX的话, 你也许就会用到内核的物理地址了.

二. 方法:

错误的方法:

a. 在最开始的时候, 我一下子就想到了内核里面提供的VIRT_TO_PHYS函数. 看它的名字,就是把虚拟地址转成物理地址的. 同时,内核代码段的虚拟地址很容易获得,直接 grep _text System 就可以了.

但是当我按照上面的思路写了一个kernel module之后,发现得到的地址明显不对. 因为得到的物理地址已经超过了我机器上已经插的内存条的总量. 后来仔细看了下VIRT_TO_PHYS的说明,发现它只是对KMALLOC() 分配的内存有效. 不过在X86-32 体系下, 它好像是可以用在内核代码段上的了 .

不管怎么说,此路不通. 那么只好用其他办法了. 在大概20多天的时间里,我一直没找到好的方法, 直到今天偶然看到了ZX_WING 大侠的这篇文章: http://linux.chinaunix.net/bbs/thread-1032711-1-1.html
(btw:同时要感谢把这篇文章顶起来的兄弟)

正确的方法:

b. 看了ZX_WING老大的LINKER SCRIPT一文之后, 你就会发现, 内核加载的物理地址是在LINKER SCRIPT里面指定的. 那么接下来就是看这个SCRIPT了.
c. 首先找X86_64的连接文件, 在2.6.32里面的时候,X86 32 和64的已经合并成一个文件了. 不过在我用的2.6.18里面还是分开的2个文件. 他的位置在ARCH/KERNEL/X86_64 (或者X86)/下面.
d. 下面是其中的一小段:
http://lxr.linux.no/#linux+v2.6.18/arch/x86_64/kernel/vmlinux.lds.S
SECTIONS
17{
18 . = __START_KERNEL;
19 phys_startup_64 = startup_64 - LOAD_OFFSET;
20 _text = .; /* Text and read-only data */
21 .text : AT(ADDR(.text) - LOAD_OFFSET) {

明显用到了宏__START_KERNEL, 那么就把它相关的找出来吧. 分散在几个.H文件里面, 如下:#define __PHYSICAL_START CONFIG_PHYSICAL_START
#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
#define __START_KERNEL_map 0xffffffff80000000UL
#define LOAD_OFFSET __START_KERNEL_map
关于CONFIG_PHYSICAL_START的说明如下:
http://lxr.linux.no/#linux+v2.6.18/arch/x86_64/Kconfig#L493
config PHYSICAL_START
494 hex "Physical address where the kernel is loaded" if (EMBEDDED CRASH_DUMP)
495 default "0x1000000" if CRASH_DUMP
496 default "0x200000"
497 help
498 This gives the physical address where the kernel is loaded. Normally
499 for regular kernels this value is 0x200000 (2MB). But in the case
500 of kexec on panic the fail safe kernel needs to run at a different
501 address than the panic-ed kernel.

从上面可以看出PHYSICAL_START 的默认地址是 0X200000. 同时KERNEL 代码的起始虚拟地址是0XFFFFFFFF80000000. 那么这2个地址对应的内容应该是一样的. 只不过前者是物理地址,后者是虚拟地址. 最后, 装个X86_64的QEMU虚拟机验证下(我用的是QEMU版本是0.11.91,老的版本,比如0.9.x, 0.10等好像支持的不好, 我总是装不上X86_64版本的CENTOS 5.1).按ALT+CTL+2, 切换到QEMU MONITOR, 然后输入下面2个命令:
x /10x 0xffffffff80000000
xp /10x 0x200000
发现内容是一样的. 证明前面的推理完全正确.

不过还是有个地方不太明白:
从下面三行来看:.
= __START_KERNEL;
19 phys_startup_64 = startup_64 - LOAD_OFFSET;
20 _text = .;
_text的其实地址似乎应该是__START_KERNEL, 也就是0XFFFFFFFF80000000+0X200000 = 0XFFFFFFFF80200000. 但是我用GREP _TEXT 得到的结果是0XFFFFFFFF80000000. 没有用到PHYSICAL_START. 不知道为什么会这样? 哪位可以解释下? 多谢

Sunday, December 20, 2009

INTEL手册中,APIC部分阅读笔记

INTEL手册中,APIC部分阅读笔记

以前看手册的时候没仔细看APIC这部分,最近要用到,所以又看了下。几分钟前刚看完,趁热打铁,把有印象的东西记录一下。免得自己以后忘记,也希望对新手有所帮助。不对的地方也欢迎指正。


参考资料:
INTEL手册的下载连接看这里:
http://biosren.com/thread-92-1-1.html
APIC部分在第三册A的 CHAPTER 10.

正文:
APIC的全写是Advanced Programmable Interrupt Controller,用来管理中断的。注意不要和ACPI(Advanced Configuration and Power Interface电源管理)搞混。APIC的前身是PIC,比如8259A.不过现在已经很少用了。

APIC实际上分成2类。一类叫LOCAL APIC,直接连在PROCESSOR上,每个PROCESSOR一个。还有一类叫I/O APIC,用来管理外设过来的中断,一般一个机器里面就一个(即使是多核系统)。在INTEL IA32手册中讲的是LOCAL APIC,下面说的APIC都是指LOCAL APIC。所谓的LOCAL,应该是相对PROCESSOR而言的,因为离PROCESSOR比较近,所以叫LOCAL.

LOCAL APIC可以处理以下的中断来源:
1)本地相连的I/O设备。比如直接连在LINT0,LINT1管脚上的设备。不过我不知道一般什么样的设备是这么连的。
2)外部的I/O设备。这些设备产生的中断先经过I/O APIC,然后再通过LOCAL APIC到达处理器。
3)Inter-processor interrupts (IPIs) 处理器之间的中断。现在多处理器结构已经很常见了。当一个处理器想中断另外一个的时候,就可以用IPI。
4)APIC定时器中断。APIC上自带了定时器,这个在OS中也是很常用的。
5)Performance monitoring counter interrupts,性能监视计数器中断。看来INTEL还是很替软件开发人员考虑的,直接在硬件上作了个性能监视计数器。
6)温度传感器中断。估计是防止CPU温度过高用的。在PENTIUM 4 AND XEON的处理器上有。
7)APIC内部错误中断。

APIC可以看成是一个独立的硬件,有自己的一堆寄存器,叫做local vector table or LVT。可以进行读写来控制APIC的某些特性和设置。

APIC实际上又可以细分为3个版本。(这里插一句,以前只觉得软件版本特别多,对硬件没啥很深的感触。现在看看INTEL的东西,版本也不少。)最早的P6家族用的是APIC。PENTIUM 4 AND XEON 用的是xAPIC,然后还有个x2APIC。以后不知道是不是再出什么x4APIC OR x8APIC。

检测APIC版本可以用CPUID指令。

在xAPIC模式中,它的寄存器是通过内存映射的方式映射到一段物理地址。有一个默认值。为了防止同其他地址冲突,这个基地址又可以重新指定到另外一个地方。估计BIOS开发人员在对付APIC时会用到这个功能。

在x2APIC模式中,取消了内存映射方式来读取APIC的寄存器,而是采用了MSR的方式。MSR的全写是Model-specific register。也就是每个型号特有的寄存器。这样的好处是不用再担心内存地址的冲突问题。

不同的APIC模式的开启,关闭以及状态切换之间又有一些规则。如果你要自己改这些地方,就要按照规定来。同时注意在x2APIC模式下,写入寄存器的时候不保证顺序,所以要自己小心,比如用个BARRIER之类的。

最后说下MSI(MESSAGE SIGNALLED INTERRUPTS). 在PCI 3.0中已经有了MSI,不过是可选的。到了PCI EXPRESS,这个就变成必须支持的了。实现的方式主要靠2个寄存器:a Message Data Register(MDR) and a Message Address Register(MAR)。当要发送MSI的时候,PCI设备往MAR写一个MDR规定好的数据即可。具体的含义可以看手册。

PCI EXPRESS 和 (传统)PCI 的简略区别

PCI EXPRESS 和 (传统)PCI 的简略区别

术语表:
PCIe: PCI EXPRESS
PCI: 传统 PCI.

简介:
我以前做过PCI BUS的一些东西,后来停了一段时间,错过了PCI-X.现在又开始要做外设了,于是开始看PCI EXPRESS. 在看PCIe SPEC的时候,有意识的把新的PCIe和PCI比较.这样加深了解. 下面的都是个人观点,适合那些从PCI直接转到PCIe的人.本人水平有限,文中或许有不对的地方,欢迎大家指正.

1) LAYERED DESIGN(分层式设计). PCIe很像网络,分成3层,从上到下为 TRANSACTION LAYER, DATA LINK LAYER, PHYSICAL LAYER. PCI是不分的. 分层设计的好处是灵活.比如每一层可以有不同的协议实现. 同时只要2层的接口没变,那么改下层的实现的时候,可以不改上层. 从PCI EXPRESS 1.0 到3.0 (还没正式出), 每LANE的速度每次都翻一倍.估计其中也有分层设计带来的便利.另外,我觉得因为PCIe采用了串行总线,而不是象PCI那样用并行总线,这应该也是导 致网络式设计的一个重要原因.

坏处:实现起来比较复杂?

一点感想:刚看PCI EXPRESS规范的时候,差点以为错下了一个讲网络的文件.因为里面充斥了网络术语,比如3个分层,PACKET,SWITCH等. 这点很出乎我的意外.不过仔细想了下,这样设计确实有上面说的好处.以前还和一位老师讨论过直接在网卡上做流量控制,错包重发等.那时他的结论就是如果这 样做了,最后就变成了TCP/IP. 没想到PCI EXPRESS差不多也是这样做的.意外啊意外.


2) ADDRESS SPACE(地址空间). PCIe有5个ADDRESS SPACE. PCI有3个.PCIe除了继续保留PCI中的IO,MEMORY, CONFIGURATION外,又加了2个.那就是 TRUSTED CONFIGURATION 和MESSAGE. 看来现在安全设计越来越重要了. MESSAGE 那个估计就是为了取消中断引脚的(这样可以降低成本).

Wednesday, December 9, 2009

A problem when insalling Xen kernel devel package on CentOS5

Problem: I want to compile kernel modules on Dom0 of xen. So I installed xen development package using following command:
yum install kernel-xen-devel
However, after that, I still cannot compile the module. It turns out that the version of kernel-xen-devel is the newest (2.6.18-164.6.1) one while my kernel is the old one (2.6.18-128). It seems that the yum just insalled the newest package regardless of my current kernel.

Solution: install the new xen kernel (or update, but it takes a lot of space)

yum update kernel-xen

Tuesday, December 8, 2009

Monday, December 7, 2009

调试KERNEL时,找出当前进程信息的方法

我们知道,一个常用的内核数据结构叫做 current. 它指向当前的进程,包括很多有用的信息,比如PID, 进程名等。如果是自己写内核代码,那么可以直接引用current。 但是如果是调试的话,直接p current是不行的。因为current是一个宏。

那么这时如何找出current呢?在Linux 2.6版本中,current是放在堆栈(STACK)的最低下。假如内核栈是8K。起始地址为0XC100,2000. 那么current就在0XC100,0000 (堆栈是倒着长的). 根据这个情况,可以用如下方法找出current。
1)找出ESP。p $esp. 假设输出为0XC123,4566
2) 把esp与上0XFFFF,E000 (低13位置为0)。 得到0XC123,4000. 这个值就是CURRENT的值了。可以转换后进行使用,比如
3) p ((* struct thread_info) 0xc1234000)->task->comm

以上方法在X86-32, ARM上都可以。 其他体系结构不太清楚,似乎也可以。或者那位知道的可以补充。

Sunday, December 6, 2009

printk format for 64bit

---------------------------------------------------------
int %d or %x
unsigned int %u or %x
long %ld ot %lx
unsigned long %lu or %lx
long long %lld or %llx
unsigned long long %llu or %llx
size_t %zu or %zx
ssize_t %zd or %zx

ref:
http://groups.google.com/group/linux.kernel/browse_thread/thread/2ea0571f2f5a72da/22a6f3295e25688b?lnk=raot&pli=1

Monday, November 30, 2009

PCI DMA

http://bbs.driverdevelop.com/read.php?tid-115630-fpage-5.html

对优化的一点感想

优化分成很多层次, 1)设计层次,2)C代码实现,3)汇编代码实现. 其中的3)基本上由编译器做掉了, 程序员可以控制的不多,最多是选下优化策略,比如GCC的O1,O2,O3. 或者VS的速度优先还是节省空间优先.

在优化之前,首先要明确目标. 节省空间和提高效率并不一定可以同时做到,有时还会互相矛盾. 所以你要明确到底是节省空间重要,还是提高速度更重要.

其次,要明确程序的什么地方是效率最低的,需要优化的. 我觉得大多数情况下, 20/80原理基本适用. 也就是说20%的代码占掉了80%的空间或者时间. 所以重点是优化那20%的代码.其他部分没必要太费心. 类似的,程序员要知道那些操作是浪费时间或者空间的. 费时的操作有:I/O, SYSTEM CALL,没用的PRINTF等. 浪费空间的的例子比如一个很大的数组,实际上用到的很少几个元素. 关于如何找到效率最低的代码部分,可以利用各种PROFILE工具.

最后,举1,2个设计阶段优化的例子.例子1, 假设你的程序要多次处理一个文件,这时有2种做法. 做法1,每次要处理文件时打开文件,处理完关闭. 做法2,只打开一次文件,然后把内容保存到内存里.后面直接对内存里的数据操作,最后再看情况是否写回文件.做法1节省空间,但是浪费时间. 做法2浪费一些空间但是节省时间. 最后怎么做就看文件的大小以及打开的次数到底多不多来看了.

例子2, 假设你有一个定时器,每过一段时间就做一些事情.那么这个定时器的间隔时间就是一个重要的参数. 间隔1秒和间隔10的差别不言而喻. 在满足设计要求的情况下,间隔时间越长,占用的CPU时间越少.

总之,设计的时候先明确要实现的目标,然后在满足设计目标的前提下进行优化.

Monday, November 16, 2009

Intel e1000 网卡发包过程小记

Intel e1000 网卡发包过程小记

最近对 Intel 1G网卡的发包过程比较感兴趣,大概研究了下代码,特此记录一下。

参考资料:
1。Intel 82547 网卡开发手册。其他Intel网卡的手册应该也可以从网上下载到。http://linux.chinaunix.net/bbs/thread-1142051-1-2.html
2。Linux e1000网卡驱动。http://lxr.linux.no/#linux+v2.6.30/drivers/net/e1000/e1000_main.c

发包过程:
1。linux os会调用网卡的start_xmit()函数。在e1000里,对应的函数是 e1000_xmit_frame,
2。e1000_xmit_frame又会调用e1000_tx_queue(adapter, tx_ring, tx_flags, count)。这里的tx_queue指的是发送Descriptor的queue。
3。e1000_tx_queue 在检查了一些参数后,最终调用 writel(i, hw->hw_addr + tx_ring->tdt)。这里的tx_ring->tdt中的tdt全写为 tx_descriptor_tail。从网卡的开发手册中可以查到,如果写了descriptor tail,那么网卡就会自动读取 descriptor,然后把包发送出去。

descriptor的主要内容是addr pointer和length。前者是要发送的包的起始物理地址。后者是包的长度。有了这些,硬件就可以通过dma来读取包并发出去了。其他网卡也基本会用descriptor的结构。

TSO:

INTEL E1000相对来说是一个比较复杂,功能繁多的网卡. 相反, 老的 RTL8139网卡就简单很多.早期的 RTL8139 网卡的功能很少. 它就是把OS发给它的包放到网卡上发出去. 最高速度好像也就是10Mbit 或者100Mbit. 随着科技的进步, INTEL 1000支持的功能又多了. 很明显的一个就是TCP SEGMENTATION OFFLOADING (缩写 TSO, 在驱动的代码中经常可以看到).

下面先解释下什么是TSO:我们知道网络是分成很多层的, TCP 在中间,下面又有IP, ETHERNET 协议 (对应不同的层). TCP 可以发一个很大的包,比如说2K B. 但是ETHERNET 可能不支持. 比如ETHERNET 只支持1.5K B. 那么怎么发送2K B的TCP包呢? 简单的办法就是把它分成2个. 第一个是1.5KB. 第2个是0.5KB. 这个过程就叫TCP SEGMENTATION (我不清楚中文是怎么翻译的. 不好意思).那么OFFLOADING 是什么意思呢? 它的本义大概是"卸载". 在这里可以理解为"放下来". "下来"是哪里呢? 由于一般说软件(OS) 是跑在硬件"上面"的, 所以"下来"也就是下到硬件(网卡)上来.所以TSO的含义就是把TCP SEGMENTATION 放到了网卡上来做. 这些工作本来是OS做的. 现在网卡硬件可以做了,结果就是OS更简单了. 而且硬件实现一般来说也会更快速一些. 所以INTEL E1000 支持1Gbit.

INTEL E1000 另外一个和RTL8139不同的地方在于对发送包 (SKB) 的处理. 8139 的驱动里, 先通过pci_alloc_consistent (2.6.18. 到了2.6.29又变了)来分配一块可以用来DMA的内存,然后调用 skb_copy_and_csum_dev 把OS传来的数据复制到可以DMA的内存那里. 这个复制的过程要消耗一些时间,影响效率.在INTEL E1000里采用了另外一种做法. 在e1000_tx_queue之前,又调用了 e1000_tx_map(). 这个函数的主要功能就是为SKB里的数据建立一个可以DMA的地址. 这样就不用复制内存了. 建立一个DMA地址的过程似乎比较快(我猜的), 所以效率应该也提高了.


From the intel manual:

3.5.2 Transmission Process

The transmission process for regular (non-TCP Segmentation packets) involves:
• The protocol stack receives from an application a block of data that is to be transmitted.
• The protocol stack calculates the number of packets required to transmit this block based on the MTU size of the media and required packet headers.
• For each packet of the data block:
• Ethernet, IP and TCP/UDP headers are prepared by the stack.
• The stack interfaces with the software device driver and commands the driver to send the individual packet.
• The driver gets the frame and interfaces with the hardware.
• The hardware reads the packet from host memory (via DMA transfers).
• The driver returns ownership of the packet to the operating system when the hardware has completed the DMA transfer of the frame (indicated by an interrupt).

The transmission process for the Ethernet controller TCP segmentation offload implementation involves:
• The protocol stack receives from an application a block of data that is to be transmitted.
• The stack interfaces to the software device driver and passes the block down with the appropriate header information.
• The software device driver sets up the interface to the hardware (via descriptors) for the TCP Segmentation context.
• The hardware transfers the packet data and performs the Ethernet packet segmentation and transmission based on offset and payload length parameters in the TCP/IP context descriptor including:
— Packet encapsulation
— Header generation & field updates including IP and TCP/UDP checksum generation
— The driver returns ownership of the block of data to the operating system when the hardware has completed the DMA transfer of the entire data block (indicated by an interrupt).

TIPS:
1) E1000 支持3种DESCRIPTOR. 可以由多个DESCRIPTOR可以组成一个PACKET. 对于RTL8139, 一个DESCRIPTOR就对应一个ETHERNET PACKET.
2) 几个关键数据结构简介:
struct e1000_tx_ring {
178 /* pointer to the descriptor ring memory */
179 void *desc;
180 /* physical address of the descriptor ring */
181 dma_addr_t dma;
182 /* length of descriptor ring in bytes */
183 unsigned int size;
184 /* number of descriptors in the ring */
185 unsigned int count;
186 /* next descriptor to associate a buffer with */
187 unsigned int next_to_use;
188 /* next descriptor to check for DD status bit */
189 unsigned int next_to_clean;
190 /* array of buffer information structs */
191 struct e1000_buffer *buffer_info;
192
193 spinlock_t tx_lock;
194 uint16_t tdh;
195 uint16_t tdt;
196 boolean_t last_tx_tso;
197};


/* wrapper around a pointer to a socket buffer,
164 * so a DMA handle can be stored along with the buffer */
165struct e1000_buffer {
166 struct sk_buff *skb;
167 dma_addr_t dma;
168 unsigned long time_stamp;
169 uint16_t length;
170 uint16_t next_to_watch;
171};

3) E1000 最多支持64K TX DESCRIPTORS. 并且支持RX DESCRIPTORS. 相反,RTL8139只支持4个TX DESCRIPTOR, 不支持RX DESCRIPTOR. 一般的E1000 PACKET都由多个DESCRIPTOR组成(平均为4).


Another version:
http://linux.chinaunix.net/bbs/thread-1144212-1-3.html

Thursday, November 5, 2009

Using qemu to find out physical address of a given virtual address for Xen

Using qemu to find out physical address of a given virtual address for Xen

Environment: Xen 3.3 32bit PAE is installed as a QEMU virtual machine.
Input: 0xc0100000 (the virtual address of domain 0 kernel)
Output: the physical address of 0xc0100000.

Process:
1. Use "info registers" cmd in qemu monitor to get cr3. cr3 is 0x29cd00. This is the physical base addr of page directory pointer table (PDPT).
2. Get the top two bits of virtual address; it is the index for page directory pointer entry.
For 0xc0100000, the highest btye is 0xc, which is 1100(b). So the index is 11(b) = 3 .
3. The length of one entry of PDPT is 64bits (intel cpu manual 3a,3.8.5) = 8 byte. 3*8=24(d) = 0x18.
4. cr3+0x18 contains the entry for page directory table.
cr3+0x18 = 0x0029cd00+0x18 = 0x0029cd18.
xp /20hx 0x0029cd18 = 0x390b 6001. This is the base addr for page dir table.
5. Bits 21 to 29 of virtual address is the index for page dir table.
For 0xc010000, the top 4 bytes are 1100,0000,0001,0000 (b). Bits 21 to 29 are:00,0000,000(b). That is 0. So the index for page dir table is 0.
6. xp /20hx 0x390b,6000 (the lower byte(s) contains some flags, just ignore them for now.)
The output is 0x3dbc,a067.
7. Bits 20 to 12 of virtual address is the index for page table. (For 2MB pages, it is different)
That is 1,0000, 0000(b), which is 0x100. Since each entry is 8 byte (64bits). The position for page tabe is 0x100*8= 0x800.
8. The lower bits of 0x3dbc, a067 are some flags. Just ignore 067 for now. The physical address for page table is
0x3dbc,a000 + 0x800 = 0x3dbc,a800.
xp /20hx 0x3dbc, a800. The output is 0x3d10,0063. Again, the lower bits are flags. So we get final result: 0x3d10,0000. (I skipped the computation for offset with a page.)

Note: To verify that they are actually point the same data, use "x" and "xp" cmd in qemu monitor to show their content. E.g. x /20hx 0xc0100000 , and xp /20hx 0x3d100000. The output should be the same.

Reference:
Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3A

Wednesday, October 28, 2009

using openssl to calculate sha1 hash in linux

http://buttom-meal.blogspot.com/2009/04/linux-c-openssl-sha1.html
http://www.ibm.com/developerworks/linux/library/l-openssl.html

On centos 5
yum install openssl, openssl-devel

/*
Code snippet to calculate SHA1sum using openssl libs.
Copyright 2005 Junichi Uekawa, given to public domain.

$ gcc openssltest.c -lssl
$ ./a.out < ./a.out
eae8189278303caaa78f2d89e6a6ebeb7d37b554
$ sha1sum ./a.out
eae8189278303caaa78f2d89e6a6ebeb7d37b554 ./a.out
*/

#include "openssl/ssl.h"

main ()
{
SHA_CTX s;
int i, size;
char c[512];
unsigned char hash[20];

SHA1_Init(&s);
while ((size=read (0, c, 512)) > 0)
SHA1_Update(&s, c, size);
SHA1_Final(hash, &s);

for (i=0; i < 20; i++)
printf ("%.2x", (int)hash[i]);
printf ("\n");
}

Intel e1000 NIC driver analyses

http://www.google.com/url?sa=t&source=web&ct=res&cd=36&ved=0CBwQFjAFOB4&url=http%3A%2F%2Flinux.chinaunix.net%2Fbbs%2Fviewthread.php%3Ftid%3D1094725&ei=R6LoSoKDH9WelAeprtyACA&usg=AFQjCNGKr5LSLwVivm-CShmPfDL04sWj-g

Wednesday, October 14, 2009

Meanings of wa, hi, si in "top" command output

The summary area fields describing CPU statistics are abbreviated. They provide information about times spent in:
us = user mode
sy = system mode
ni = low priority user mode (nice)
id = idle task
wa = I/O waiting
hi = servicing IRQs
si = servicing soft IRQs
st = steal (time given to other DomU instances)

Tuesday, September 29, 2009

虚拟地址到物理地址的转换过程

今天在QEMU上做了下实验。手工跟踪了下虚拟地址到物理地址的转换,又弄清楚了一些。下面记录下实验过程。

目的:给出虚拟地址0XC100,0000. 找出它的物理地址(我们已经知道答案是0X0100,0000. 但是要看一下是怎么得出这个结论的)。

环境:QEMU 0.9.2. 虚拟机是LINUX 2.6.18

方法及步骤:
1)在QEMU里按ALT+CTRL+2。切换到MONITOR模式。输入:info registers得到CR3内容:0x1132b000. 这个地址就是PAGE DIR TABLE的起始地址。
2) 虚拟地址的 C1加上后面2个0 决定了它在PAGE DIR TABLE里的位置。C1后面加2个0(总共10BIT) 翻译成2进制就是 1100,0001,00. 也就是十进制的772,16进制的304.
3)PAGE DIR TABLE里的0X304项的地址应该是 0X304*4=0XC10. 因为每个PAGE DIR ENTRY占4个字节。并且X86 CPU是按照字节寻址的。
4)查看 PAGE DIR TABLE里0X304的内容。在QEMU里输入xp /10hx 0x1132bc10 ( 0x1132bc10 =cr3+0xc10). 发现这个地址的内容是0X0130,F163. 注意X86是LITTLE ENDIAN的。
5)0X0130F (PAGE DIR ENTRY 的前20 位)是PAGE TABLE 的基地址。后面的是一些FLAG. PAGE TABLE的真正地址要再乘以0X1000. 所以PAGE TABLE的物理地址是 0X0130, F000.
6) 这时在看虚拟地址的中间10位。在本例中都是0。所以只要看PAGE TABLE的第一个ENTRY就可以了。用 xp /10hx 0x130f000. 发现内容是 0X0100,0163.
7)在 0X0100,0163 中,前20位是PAGE FRAME NUMBER. (后面的163是FLAG)其真正的地址应该再乘以0X1000。所以对应的地址就是0X0100,0000。由于虚拟地址的后面12位都是0,所以最后的真正物理地址就是0X0100,0000。也就是正确答案。

最后,几个想法:
1)在PAGE DIR, PAGE TABLE里放的是物理地址相关的或者物理的PAGE FRAME NUMBER. 这个是很容易理解的。如果放的是虚拟地址,那么又要用到另外一个PAGE DIR, PAGE TABLE来解释。结果就变成死循环了。
2)由于0XC100,0000在本例中是内核的起始地址,所以什么时候切换到QEMUMONITOR都没有关系。所有的进程这部分都一样的。我后来又转换过一次。第二次的CR3地址不同,也就是说此时运行的是另外一个进程。但是它的PAGE DIR ENTRY的内容和第一个是一样的。
3)如果是看虚拟地址0XC000,0000的话,结果有点奇怪。我猜是由于最开始的物理地址和BIOS啥等有关。目前还没完全搞清楚。

Saturday, September 12, 2009

change CD-ROM in QEMU

http://www.chkh.com/Article/HTML/19486.html

在qemu中按ctrl+alt+2切换到qemu monitor模式 输入?或help可以查看可用命令及使用说明。(在其他版本的qemu中,运行qemu加载OS后,这个shell就会自动变成qemu monitor模式)change device filename -- change a removable media看来它就是用来换盘的了:

change ide1-cd0 /rhel4/EL_disc2.iso

http://blog.chinaunix.net/u/7793/showart_1793074.html
在用Qemu安装系统的时候,需要切换安装盘,除了GUI的操作,可以在控制台里面通过命令操作,但是有一点注意。change命令里面的对象,cdrom 的 device名为:ide1-cd0 而不是cdrom 了。

Tuesday, September 8, 2009

QEMU core functions call map

cpu-exec.c
static TranslationBlock *tb_find_slow(target_ulong pc, target_ulong cs_base, uint64_t flags)

tb_gen_code()

exec.c
cpu_gen_code()

translate-all.c
gen_intermediate_code()

target-i386/translate.c

Friday, September 4, 2009

SVN usage

setup, email notification, cvs:
http://www.51testing.com/?uid-75198-action-viewspace-itemid-114376
svn book:
http://svnbook.red-bean.com/

Sendmail problem:
When I use sendmail on ubuntu server, it reports an error 67 (or 66). From the /var/log/mail.log, it says cannot qualify my own hostname. Finally, I found out the /etc/hosts file is not correct. I put the real IP, host name and domain name in that file (not only 127.0.0.1 one). Then it works.

Friday, August 28, 2009

livekd: error finding i386kd.exe

Problem: after running livekd.exe in the console window, it says: error finding i386kd.exe
Reason: livekd cannot find the path of "Debuging tools for Windows" .
Solution: edit "path" environment variable, add the path to "Debuging tools for Windows" .

Monday, August 17, 2009

Trailing a Growing File in Perl

http://oreilly.com/catalog/cookbook/chapter/ch08.html
Trailing a Growing File

Problem
You want to read from a continually growing file, but the read fails when you reach the (current) end of file.
Solution
Read until the end of file. Sleep, clear the EOF flag, and read some more. Repeat until interrupted. To clear the EOF flag, either use seek: for (;;) {
while () { .... }
sleep $SOMETIME;
seek(FH, 0, 1);
}
or the IO::Handle module's clearerr method: use IO::Seekable;

for (;;) {
while () { .... }
sleep $SOMETIME;
FH->clearerr();
}
Discussion
When you read to the end of a file, an internal flag is set that prevents further reading. The most direct way to clear this flag is the clearerr method, if supported: it's in the IO::Handle and FileHandle modules. $naptime = 1;

use IO::Handle;
open (LOGFILE, "/tmp/logfile") or die "can't open /tmp/logfile: $!";
for (;;) {
while () { print } # or appropriate processing
sleep $naptime;
LOGFILE->clearerr(); # clear stdio error flag
}
If that simple approach doesn't work on your system, you may need to use seek . The seek code given above tries to move zero bytes from the current position, which nearly always works. It doesn't change the current position, but it should clear the end-of-file condition on the handle so that the next picks up new data.
If that still doesn't work (e.g., it relies on features of your C library's (so-called) standard I/O implementation), then you may need to use the following seek code, which remembers the old file position explicitly and returns there directly. for (;;) {
for ($curpos = tell(LOGFILE); ; $curpos = tell(LOGFILE)) {
# process $_ here
}
sleep $naptime;
seek(LOGFILE, $curpos, 0); # seek to where we had been
}
On some kinds of filesystems, the file could be removed while you are reading it. If so, there's probably little reason to continue checking whether it grows. To make the program exit in that case, stat the handle and make sure its link count (the third field in the return list) hasn't gone to 0: exit if (stat(LOGFILE))[3] == 0
If you're using the File::stat module, you could write that more readably as: use File::stat;
exit if stat(*LOGFILE)->nlink == 0;

See Also
The seek function in perlfunc (1) and in Chapter 3 of Programming Perl ; your system's tail (1) and stdio (3) manpages

Friday, August 14, 2009

ABOUT TCPDUMP DROP PACKET

http://blog.tianya.cn/blogger/post_show.asp?BlogID=227219&PostID=16646525&idWriter=0&Key=0

tcpdump丢包分析
作者:waitquiet 提交日期:2009-3-3 18:29:00 | 分类: | 访问量:213 

  通过tcpdump抓包时,结束后tcpdump会给出如下统计信息:
  1552 packets captured
  1586 packets received by filter
  34 packets dropped by kernel
  
  其中“captured”的计数指的是应用层捕获到的数据,“received by filter”和“dropped by kernel”的计数由内核维护,应用层通过getsockopt来获取。收到一个包,“received by filter”会加1,如果sock的接收buffer被填满时,则把这个数据包丢弃,将“dropped by kernel”加1。
  if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >= (unsigned)sk->sk_rcvbuf){
   spin_lock(&sk->sk_receive_queue.lock);
   po->stats.tp_drops++;
   spin_unlock(&sk->sk_receive_queue.lock);
  }
  通过调节/proc/sys/net/core/rmem_default和/proc/sys/net/core/rmem_max能够改变sk_rcvbuf的大小。
  
  正常“captured”加上“dropped by kernel”应该等于“received by filter”的大小,有的时候出现不等的情况应该是还有一些数据包在sk_rcvbuf中,还没有被应用层收到的原因。

Sunday, August 9, 2009

cell phone anti-theft software

General phones:
http://techzoogle.com/antitheft-technologies-for-cell-phones/
http://www.ghacks.net/2008/05/17/anti-theft-software-for-mobile-phones/
http://www.wimp-software.co.uk/

iPhones:
http://www.gadgettrak.com/products/iphone/
http://theory.isthereason.com/?p=2302
http://thinkabdul.com/2008/02/23/iphone-findme-free-anti-theft-software-for-apple-iphone-to-retrieve-gsm-cell-id-location-via-twitter/

Monday, August 3, 2009

Develop windows drivers: resources and tips

http://students.cs.byu.edu/~nbushman/drivers.htm


Author: Nate
Bushman
     Last Modified: 18

August 2003

People keep on asking me, "Hey, how can I learn how to write
drivers?"  Well, I

wrote up this document to help answer that question (at
least partially).  The first

part of this document lists various driver dev
resources, and the second part

contains a bunch of tips and tricks.


WinNT/2K/XP Kernel Mode Device Driver Development Resources




THE MICROSOFT WINDOWS DRIVER DEVELOPMENT KIT (DDK):



The DDK and its associated documentation and samples is an essential

resource for learning about device driver development. Once you're

familiar with the basics (read one of the books below) you should

definately go through the sample drivers that come with the DDK.

The "toaster" sample code contains generic code for several types of


drivers. You can order the DDK at: http://www.microsoft.com/ddk

The Windows Driver Development Kit also ships with the MSDN.



BOOKS:



"Windows NT Device Driver Development" by Peter Viscarola and Tony Mason.

This is the first NT driver book I read, and still the first one that I

recommend. It's got all of the essentials and is clearly written. 95%


of the stuff in this book still applies to W2K/XP drivers too.



"Programming the Microsoft Windows Driver Model, 2nd ed." by Walter Oney et al.

Read this book as soon as you can. It's filled with guidelines that will help you to

avoid the common mistakes. I couldn't believe how many things I'd learned

the hard way that I could have avoided if I'd read this book first.



"Inside Windows 2000, 3rd Ed." by David Solomon and Mark Russinovich.


This isn't a driver dev book, but you'll soon see drivers don't run in their

own sandbox. They're an integral part of the OS, and in order for them

to work properly you need to have a good understand of how the OS works.

This book is an absolute goldmine of OS internals. Highly recommended.



"The Windows 2000 Device Driver Book, 2nd Ed." Be sure to get the second

edition which was recently published. The first edition has many errors.

I like this book. It's only around 400 pages, so it's not overkill, and


it has a lot of really useful insights. It's helpled me to solve some

pretty tricky problems that I didn't find solutions to anywhere else. I'd

still recommend that you read Viscarola and Mason's book first.



"Windows NT/2000 Native API Reference" by Gary Nebbett. NT provides tons of

useful functionality through the native NT API. Most of this API is

undocumented and may change at any time. However, sometimes using these

undocumented functions is the only way you can accomplish certain tasks.


This book documents this API for the first time.



"Developing Windows NT Device Drivers" by Ed Dekker and Joseph Newcomer.

I've only been using this one as reference, so I can't say how good the

content is. I love the useful tables in the back of this book, and what

I have read is well written and clear.



"Windows NT File System Internals" by Rajeev Nagar. This is the only


file system driver book available for the NT platform, and it doesn't

cover any of the new stuff found in W2K or XP. However, it's got a lot

of useful info that you can't find anywhere else.



"Windows 2000 Kernel Debugging" by Steven McDowell. For me, this book

was mostly a waste of time. But if you've have zilch exposure to kernel

debugging then this book may help you to get up to speed a bit faster.

It's a very fast read as it only has a little over 200 pages of useful


text. When you really want to get to know the kernel debugger, read

the debugger documentation that comes with WinDbg. You can download

WinDbg at http://www.microsoft.com/ddk/debugging/ The WinDbg
documentation

is fantastic. When you start writing drivers you'll be crashing your

machine all the time, so getting to know the debugger is very important.



"Writing Windows WDM Device Drivers" by Chris Cant. I've only read parts


of this book, and what I read was useful. I've heard several people complain

that it has a lot of errors.



"Undocumented Windows 2000 Secrets" by Sven Schreiber. Not really an essential

book, but it may help you if you have to do some tricky stuff like patching

the system service descriptor table (this would allow you to filter native

API calls), or if you need to understand the until-now undocumented format of

Microsoft's .PDB symbol files.




Microsoft has also published the DDK documentation as a three-volume set.



WEB SITES:



http://www.microsoft.com/whdc/ - Windows Hardware and Driver Central
Home

http://www.microsoft.com/ddk - The home of
the DDK


http://www.microsoft.com/ddk/debugging/
- Get your kernel debugger here


http://www.microsoft.com/hwdev - More
driver dev resources from Microsoft

http://www.osr.com - Home of "The NT Insider," the
online DDK docs, & more

http://www.osronline.com - OSR's new online presence


http://www.sysinternals.com - Tons of
useful
apps and docs for systems programmers

http://www.hollistech.com - Home of
ddkbuild.bat - build your driver from the GUI

http://www.oneysoft.com - Home of "Programming
the Windows Driver Model" book

http://www.pcausa.com - Useful if you're writing
any kind of network driver


http://www.cmkrnl.com/faq.html - Jamie Hanrahan's (of Kernel Mode Systems) useful
faq

http://www.compuware.com - Makers of the SoftICE debugger and other driver dev
tools

http://www.wd-3.com - The Windows Driver Developer's Digest.



PERIODICALS:



"The NT Insider" - Sign up for the free distribution of this bimonthly


publication at http://www.osr.com It's packed with useful tips.



Matt Pietrek's "Under the Hood" column articles found in the Microsoft Systems

Journal (available in the MSDN Library) - Tons of useful systems programming

tips over several years. Microsoft stopped printing the MSJ and most of the

articles from MSJ (including Matt's) can now be found in MSDN Magazine.




Jim Finnegan's "Nerditorium" column articles found in the Microsoft Systems

Journal (available in the MSDN Library) - Even more driver oriented than

Pietrek's stuff. Read his March 1998 article for a quick intro to NT drivers.

href="http://www.microsoft.com/msj/defaultframe.asp?page=/msj/0398/driver.htm">http://www.microsoft.com/msj/defaultframe.asp?page=/msj/0398/driver.htm



Mark Russinovich's column articles found in Windows 2000 Magazine. Mark's

articles are clear and detailed, and provide very useful information on


how various parts of the OS are implemented and work. If you don't have

a subscription to Windows 2000 Magazine you can still get access to all but

his latest articles at www.windows2000mag.com He's also provided a ton

of systems programming resources at his website: href="http://www.sysinternals.com">http://www.sysinternals.com



EMAIL LISTS:



OSR hosts two fantastic email lists that are very active and an excellent

resource to driver developers. They are NTDEV and NTFSD. Post general driver


dev questions to NTDEV and post file-system driver specific questions to

NTFSD. You can sign up for these email lists at http://www.osr.com

and archives of these lists are found at http://www.ntdev.org.


NEWSGROUPS:



comp.os.ms-windows.programmer.nt.kernel-mode
(Very Good)

microsoft.public.development.device.drivers

(Very Good)

microsoft.public.win32.programmer.kernel(Very
Good)

microsoft.public.windbg (For help with the kernel debugger)

microsoft.public.ddk.* (There are several of these)



SEMINARS:




http://www.osr.com/seminars_main.shtml



http://www.azius.com


http://www.oneysoft.com/wdmclass.htm




OTHER RESOURCES:



The Microsoft Installable File System Development kit (IFS kit). This is a

$1000 CD that comes with the complete source for FastFAT and CDFS. If you're

writing file system or file system filter drivers then you'll want to buy the

IFS kit. I don't know about the XP IFS kit, but the W2K IFS kit comes with the


NT IFS kit. You can find more info at:

http://www.microsoft.com/ddk/IFSkit/




 


WinNT/2K/XP Device Driver Development Tips'n'Tricks


PERFORM BASIC TESTING



1) Turn the Driver Verifier on when you’re testing your driver. On your test
machine just run “verifier” from the Start | Run dialog. Driver Verifier will
let you perform a number of tests on the driver(s) of your choice. Click on the
Settings tab and select the driver(s) that you want Verifier to test, then in
the same tab select which tests you want Verifier to perform. You should at a
minimum select the Special Pool, Force IRQL Checking, Pool Tracking and I/O
Verification Level 1 tests.

2) Test your driver on both the free and checked builds of the OS. The checked
OS is full of assertions that will help you to find bugs. To quote Peter
Viscarola of OSR, “To me, there is very little that is sloppier or more
unprofessional looking than to have your released driver cause ASSERTS in the
Checked build. I mean, the Checked build contains very basic tests for
correctness. If you can’t pass those, it implies that you haven’t even done the
basics in terms of testing.” If you’re testing with a hybrid OS (a free OS with
checked versions of the HAL and kernel), you should also add checked versions of
the other drivers that your driver interacts with. For example, if you’re
testing a storage filter driver on a hybrid system, then in addition to the
checked HAL and kernel, you should also install the checked versions of ntfs.sys,
fastfat.sys, diskperf.sys, ftdisk.sys, classpnp.sys, scsiport.sys, etc.

3) Enable pool tagging on your test system, and be sure to use the “WithTag”
variations of any function calls that allocate memory. Pool tagging will help
you to discover memory leaks because it will associate a tag (which you provide
in the allocation call) with each memory allocation. Download PoolTag from OSR.
It’s a nice GUI app that will both enable pool tagging and display the current
memory allocated to each tag.


4) Use WinDbg - Have a kernel debugger attached to the test machine(s) on which
you’re running your driver. If Driver Verifier finds a problem, or when the
checked build of the OS asserts, you’ll end up with a blue screen. It’ll save
you quite a bit of time if you have WinDbg attached to your test systems so that
these bug checks will simply trap in the debugger, allowing you to analyze the
problem immediately. Usually, this is simply a matter of dumping the stack of
the offending code to find out who caused the problem. Be sure to generate
symbol files for your driver so that kernel debuggers can analyze stacks/etc.
You do this in your SOURCES file, which I’ll discuss later.

5) Test your driver on multiprocessor systems. The more processors the better.
Basically you’ll be testing your code to make sure that it properly synchronizes
access to shared resources. Multiprocessor systems are more likely to reveal
synchronization bugs in your code like deadlocks.





USE GOOD DRIVER IMPLEMENTATION PRACTICES



1) Use ASSERT() for all assumptions in your code. Every function should have an
ASSERT() which checks the current IRQL to see if it is within the assumed range.
Every function that’s internal to the driver (that can’t be called by code
outside of the driver) should have ASSERTs that validate parameter values. If a
function assumes that it’s running in the system context, it should perform an
ASSERT to see if it’s process is the system process. By using ASSERTs to check
IRQL and process context, you’ll self-document these requirements for every
function.

2) Functions that can be called by code that is external to the driver must
validate their parameters. This check should always occur, regardless of whether
your driver is built in the free or checked environment. It should NOT be
checked using ASSERT(). Such functions include your driver’s dispatch, AddDevice,
DriverEntry, and Unload routines.

3) Beware of macros. Many of the APIs in the DDK are actually implemented as
macros. Never use arguments that are expressions with side effects. These
arguments could be passed to a macro instead of a function and the macro can
evaluate the expression multiple times. Use block notation for conditional
expressions to avoid problems with multi-statement macros. For example: if
(flag) DoSomething(); If the function DoSomething() is a actually a
multi-statement macro, then only the first statement in the macro would be
conditionally executed. Be safe and use block notation for your conditional
expressions, like: if (flag) { DoSomething(); }

4) Synchronize access to any resources that can be touched by more than one
thread at the same time. It’s far better to experience a performance hit due to
synchronization code and to have your code properly synchronized than to have
code that’s a little faster but full of bugs.


5) Use structured exception handling. More on this below.

6) Don’t touch paged memory at IRQL >= DISPATCH_LEVEL. Also, keep in mind that
the UNICODE character tables are in paged memory, so if you attempt to use a
DbgPrint() statement to print out a WCHAR/UNICODE string, the system will
attempt to access paged memory in order to convert that string to text that can
be sent to the attached kernel debugger. At DISPATCH_LEVEL, the statement
DbgPrint(“%S”, unicodeString.Buffer); will cause a 0x0A bug check.

7) Some entry points to your driver (such as AddDevice()) are documented to be
called at PASSIVE_LEVEL. For this reason these functions are often placed in a
pageable code section. Be aware that if you use a spinlock in such a function
that you must not place the code for the function in a pageable code section.
The reason for this is that the uniprocessor kernel will raise IRQL to
DISPATCH_LEVEL for the duration of ownership of the spinlock.

8) Avoid stack overflows. Every NT thread has two stacks, a user-mode and a
kernel-mode stack. The user-mode stack is quite large, but the kernel-mode stack
is limited to 12K. To avoid overflowing the kernel stack, avoid deeply nested
function calls, large local variables, and recursive calls. Also, some system
calls (especially the registry I/O calls) consume quite a bit of stack. You may
find it necessary to farm off some of your API calls to worker threads whose
stacks are empty. You can use ASSERTs with IoGetRemainingStackSize() to quickly
locate areas in your code where you might overflow the stack.

9) Avoid undocumented system calls. The behavior of these calls can change from
one implementation to the next, and even between service packs. If you do use
undocumented calls, document exactly which ones you used and the reasons you
were forced to use them.

10) Be very careful to avoid deadlocks if your driver initiates new I/O that
will pass through the stack in which your driver participates. This is
especially true if your driver blocks on that new I/O.

11) Keep in mind that the data that an IRP points to through its MDL can be
changing while your driver owns the IRP. An IRP MDL’s data is not static.

12) All completion routines (except routines which set events that dispatch
routines are waiting on – see IoMarkIrpPending documentation) must include this
code: if (Irp->PendingReturned) { IoMarkIrpPending(Irp); } This is necessary in
order to propagate the PendingReturned flag up to the top of the stack so that
the IRP can be properly freed.


13) Zero-out (or use some other non-random value) structs that you are about to
free. This will help you to identify when you are referencing memory that has
already been freed. If you leave data in the struct, it’s likely that your code
will not complain about the data that it finds in the memory when it uses a
pointer that points to memory that’s already been freed. This technique is
easily accomplished by #defining a new version of the free-memory function that,
if DBG is defined, will zero-out the memory before freeing it.

14) Build in unit tests that fully test the features of your driver. Make it
possible for these test to be ran by a script. This will enable you to run a
daily batter of tests (a daily “smoke screen”) to see if any of the code you’ve
added that day has broken pre-existing code.



PERFORMANCE ENHANCING TIPS



1) Use lookaside lists (zones in NT 3.x) when you’re repeatedly allocating blocks
of memory that are the same size. Lookaside lists will hold onto previously
deallocated blocks instead of returning them to the Memory Manager. The next
time you need a block of memory the list can return one to you immediately
without all of the memory management overhead.

2) Use dispatcher synchronization objects (such as a mutex) instead of spinlocks
to synchronize access to resources where all of the code that accesses the
resource is guaranteed to run at IRQL PASSIVE_LEVEL.





USE WINDBG




1) Step through your code in WinDbg. WinDbg is a symbolic debugger capable of
source-level debugging. Stepping through your code and watching your variables
is one of the best ways to see if you actually implemented what you thought you
implemented.

2) WinDbg needs the symbol files (.PDB files) for your OS and for the driver
that you’re testing in order for it to analyze data in memory. I’ll discuss how
you can generate these files when I discuss tips on building your drivers.

3) Use cookies in your structs so that you can identify them when you’re poking
around in memory or in a crash dump file.

4) There’s so much to be said about taking advantage of the kernel debuggers
that it’s beyond the scope of this document. There is a book on using the kernel
debuggers, but it’s very basic. I’d recommend you read the documentation for
WinDbg. It’s been updated and is very clear and complete. WinDbg has undergone a
major transformation during the last year, and is significantly more powerful
and stable.

5) If you’re using WinDbg to analyze a crash dump file, use dumpchk.exe first to
check the file to see if it’s corrupt or not. If a driver has trampled over
critical data structures in memory, the crash dump file may not be usable.
Dumpchk.exe will tell you if a crash dump file is usable or not.



HAVE A SUPPORT PLAN



1) In order to support customers who experience system crashes, you’ll find that
it’s incredibly useful to be able to use WinDbg to symbolically debug their
crash dump files. In order to do this, you must provide WinDbg with the
necessary symbolic information. This symbolic information is found in the .PDB
file (the symbol file) that’s generated at build time. Unfortunately this file
is not generated by default. You must include instructions in your SOURCES file,
which I’ll describe in the next section. Occasionally you may run up against a
crash dump file that WinDbg has trouble analyzing. In this case you’ll end up
doing a lot of manual debugging. For this you will need to know which addresses
the linker mapped your symbols to. This is found in the .MAP file. You will also
need to view the mixed source/assembler for your driver. WinDbg isn’t capable of
doing this, but the compiler can generate .COD files for each of your .C files.
In the next section I’ll discuss how you can build your driver so that you
generate .PDB, .MAP and .COD files.

2) You may find it useful to create a debugger extension dll that will give you
extra commands in WinDbg specifically tailored to debugging your driver. WinDbg
comes with sample code for a simple debugger extension dll. They’re very easy to
write.


3) Do not release your .PDB files to the public. These files can be used to
obtain the entire original source of your driver.





BUILDING YOUR DRIVER



1) When you build your driver, you’ll want the compiler/linker to generate the
following files: A symbol file (.PDB), a linker map file (.MAP), mixed
source/assembler files for each .C file (.COD) and a browse file that will let
you browse symbols when you’re editing your driver’s source code (.BSC).

2) You’ll want to set a suitable warning level for the compiler.

3) You’ll want to enable/disable the correct optimizations.

4) Here is a sample SOURCES file for a Windows 2000 driver that contains all of
the commands needed to accomplish items 1-3:


NOTE: This SOURCES file will only work, as is, with the Windows 2000 DDK. If you
are using a newer version of the DDK (such as the XP DDK) then you may run into build issues, especially
with the switch that specifies that mixed source/assembly .cod output files are
desired. Tommy Tam (VISTA Controls) says that he's used this SOURCES file with
a more-recent DDK by removing the /Fc*$.cod switch from MSC_OPTIMIZATIONS and by adding
either one of the the following lines:


USER_C_FLAGS=$(USER_C_FLAGS) -fc$(O)/

USER_C_FLAGS=$(USER_C_FLAGS) /FAcs /Fa$O\$B

You may also need to modify the free/checked paths for the linker .MAP file.





# START OF SAMPLE SOURCES FILE


# Specify the target file/type of the build

TARGETNAME=MyDriver

TARGETPATH=obj

TARGETTYPE=DRIVER




# Produce the same symbolic information for both free and checked builds.

# This will allow us to perform full source level debugging on both

# builds without affecting the free build's performance

!IF "$(DDKBUILDENV)" != "checked"

NTDEBUG=ntsdnodbg

NTDEBUGTYPE=both


USE_PDB=1

!ELSE

NTDEBUG=ntsd

NTDEBUGTYPE=both

USE_PDB=1

!ENDIF



# Set compiler optimizations:

# /Ox - Full optimization enabled


# /Os - favor speed over size when optimizing

# /Od - Disable all optimizations

# /Oi - Enable optimization for intrinsic functions

# /Fc$*.cod - Generate mixed assembler/source code files

#

# For both checked and free builds, make sure that any intrinsic

# functions are compiled correctly. To do this, ensure that /Oi

# is selected for both free and checked builds. There is a bug in

# VC++ 6.0 (at least through SP4) where, if you specify any


# intrinsic functions in your code with "#pragma intrinsic" but

# you don't have the /Oi optimization enabled, neither a call

# to the function, nor the intrinsic inline version of the function

# will end up in your object code. This bug only applies to free

# builds, but just to be safe we'll make sure that the flag is

# enabled for all builds.

!IF "$(DDKBUILDENV)" != "checked"


MSC_OPTIMIZATION=/Ox /Os /Oi /Fc$*.cod

!ELSE

MSC_OPTIMIZATION=/Od /Oi /Fc$*.cod

!ENDIF



# Generate a linker map file just in case we need one for debugging

!IF "$(DDKBUILDENV)" != "checked"


LINKER_FLAGS=$(LINKER_FLAGS) -MAP:.\objfre\i386\$(TARGETNAME).map \

-MAPINFO:EXPORTS -MAPINFO:LINES -MAPINFO:FIXUPS

!ELSE

LINKER_FLAGS=$(LINKER_FLAGS) -MAP:.\objchk\i386\$(TARGETNAME).map \

-MAPINFO:EXPORTS -MAPINFO:LINES -MAPINFO:FIXUPS

!ENDIF



# Generate a browser information file for use in IDE development

BROWSER_INFO=1


BROWSERFILE=$(TARGETNAME).BSC -n



# Set the compiler's warning level

MSC_WARNING_LEVEL=-W3 -WX



# Specify the files to be used in the build

INCLUDES=$(BASEDIR)\inc;.



SOURCES=MyDriver.c



# END OF SAMPLE SOURCES FILE



USE THE IDE



1) Visit http://www.hollistech.com or href="http://www.osr.com">http://www.osr.com and download DDKBUILD.BAT. This batch file
can be used to build drivers from the IDE. You’ll be able to edit, compile and
browse symbols (assuming your SOURCES file instructs the compiler to build a .BSC
file) all within the Visual Studio IDE. DDKBUILD.BAT does not interfere with the
default build mechanism. You’ll still be able to “build –cZ” your driver from
the command line build environment.



DEALING WITH COMMON BUG CHECKS (BLUE SCREENS)



Here are some of the more common bug checks, and methods for avoiding them:




0x0A (IRQL_NOT_LESS_OR_EQUAL)

0xD1 (DRIVER_IRQL_NOT_LESS_OR_EQUAL )

0x1E (KMODE_EXCEPTION_NOT_HANDLED)

0x7F (UNEXPECTED_KERNEL_MODE_TRAP)



IRQL_NOT_LESS_OR_EQUAL or DRIVER_IRQL_NOT_LESS_OR_EQUAL



These bug checks occur when your driver has touched paged memory when the
processor’s IRQL was >= DISPATCH_LEVEL. It’s pretty easy to avoid this bug
check. Only make API calls at IRQLs that those APIs are documented to support.
If it’s reasonable, use NonPaged pool whenever you allocate memory. Don’t place
your code in pageable sections unless that code is guaranteed to run at IRQL <
DISPATCH_LEVEL and never uses spinlocks. Don’t use DbgPrint or KdPrint to print
out unicode strings if IRQL >= DISPATCH_LEVEL. Don’t assume that your dispatch
functions will be called at PASSIVE_LEVEL. Usually they will, but IoCallDriver()
can be called at IRQL <= DISPATCH_LEVEL, so it’s legal for the driver above you
to pass an IRP down to you at DISPATCH_LEVEL.




KMODE_EXCEPTION_NOT_HANDLED



You should use structured exception handling in your driver so that it
can gracefully detect these exceptions and recover from them. Walter Oney has an
excellent section on structured exception handling in his book Programming the
Windows Driver Model.



UNEXPECTED_KERNEL_MODE_TRAP



The most common cause of this bug check is a kernel-stack overflow. The
documentation doesn’t mention this. Use WinDbg to print out the stack to find
out which one of your functions is eating up all of the 12K of kernel stack. To
avoid this bug check, don’t use recursive calls, deeply nested calls, or
functions with large local variables. Use assertions on the current stack size
to determine which areas of your code are potential stack eaters.

 









IRP HANDLING RULES

by Maxim S. Shatskih



- IoMarkIrpPending must be called before the IRP will arrive to a context (queue or such), from where it can be completed in
an async way by other code paths (DPCs or such).

- IoMarkIrpPending requires STATUS_PENDING to be returned, and vice versa. No exceptions from this.

- if you want to complete the IRP just in the dispatch routine, return the same status as you put to Irp->IoStatus.Status.
This cannot be STATUS_PENDING. Do not call IoMarkIrpPending.

- Irp->IoStatus.Status must be filled only just before IoCompleteRequest. You can use Irp->IoStatus as a temporary storage
before this :-) (at least if you do not pass the IRP down). Also note that Irp->IoStatus.Information can hold a pointer (it
really holds it for some PnP IRPs).

- drivers which are not bottom-most, including all filters, can also pass the IRP down by "return IoCallDriver". In this case,
IoMarkIrpPending must not be called (regardless of NT4's CLASS2 code which had such a bug, it was not hit only because
SCSIPORT always returned STATUS_PENDING, and my full SCSI port code hit it more or less soon).

- this sequence is also valid, though slower. It is necessary as a workaround for filtering some buggy drivers like NT4's
CDFS:




IoMarkIrpPending(Irp);

(VOID)IoCallDriver(BottomDeviceObject, Irp);

return STATUS_PENDING;



- now the completion routines. Only 2 return values are valid. STATUS_MORE_PROCESSING_REQUIRED and STATUS_SUCCESS.

- STATUS_MORE_PROCESSING_REQUIRED means that IoCompleteRequest exits immediately. Next completion routines are not called, and
IRP is not passed to IO manager for destruction.

- if and only if the completion routine returns STATUS_SUCCESS, then it must do the following:




if( Irp->PendingReturned )

IoMarkIrpPending(Irp);



Just copy-paste it. It is necessary. Too bad IoCompleteRequest does not this itself.



- now the kinds of IRPs. "Full-blown" and "not-full-blown".

- first are created by IoBuildDeviceIoControlRequest and IoBuildSynchronousFsdRequest, including the IO syscalls called from
user mode or by Zwxxx (NtxxxFile functions use the Ioxxx functions are mentioned). These IRPs are associated with the thread,
and IoGetRequestorProcess works for them. These IRPs must be passed to IO manager for destruction by returning STATUS_SUCCESS
from the completion routine. IoFreeIrp cannot be called for them.

- "not-full-blown" IRPs are created by IoBuildAsynchronousFsdRequest and IoAllocateIrp. They are not associated with any
thread. They cannot be passed to IO manager for destruction, the completion routine must return
STATUS_MORE_PROCESSING_REQUIRED. They must be freed by IoFreeIrp, possibly in the completion routine.


- if you send the IRP down to some driver, you can use any way of the above ones. The driver will not be interested in how the
IRP coming from above was built.

- for first way, create an event, pass it to IoBuildDeviceIoControlRequest, then call IoCallDriver, then wait for event. IRP's
result will be in IO_STATUS_BLOCK passed to IoBuildDeviceIoControlRequest.

- for second way, create an event, write a completion routine of:



KeSetEvent((PKEVENT)Context, IO_xxx_INCREMENT, FALSE);

return STATUS_MORE_PROCESSING_REQUIRED;



then call IoAllocateIrp, then IoSetCompletionRoutine with event as a context, then call IoCallDriver, then wait for event.
After this, IRP's result is in Irp->IoStatus. You can either free the IRP by IoFreeIrp now, or call IoReuseIrp and then reuse
the IRP for next calls, calling IoFreeIrp later.

- now Irp->RequestorMode. It governs how the buffer pointer checks are made in the drivers which use Irp->UserBuffer - like
FSDs. If mode is KernelMode, then no checks are made. If mode is UserMode, then Irp->UserBuffer is probed to be user mode
pointer. So, for your own IRPs which fill Irp->UserBuffer, set Irp->RequestorMode to KernelMode.


- note that some stacks like 1394 require Irp->RequestorMode to be KernelMode.








Marking an IRP as pending in a file system filter driver

by Ravisankar Pudipeddi

extracted from the OSR NTFSD email list



Ah, confusion reigns again regarding what pending, cancel, synchronous i/o etc. mean. Time for a refresh yet again: let my
try to explain succintly as a set of rules (hopefully this will be more clearer). You have to read all the rules to
understand, and this is specifically targeted towards FSD/filter developers.




1.) You can return STATUS_PENDING by default for any IRP, including IRP_MJ_CREATE - except for a few special IRPs for which
STATUS_PENDING indicates something more. Currently these are the FSCTL_REQUEST*OPLOCK* IRPs. If you return pending for them,
the caller assumes the oplock is granted. For every other IRP, all STATUS_PENDING means is that if it's synchronous i/o (such
as a create), I/O manager will wait for the IRP to complete.



2.) When you return STATUS_PENDING, if it's a synchronous i/o, NT i/o manager automatically waits for the i/o to complete. You
needn't worry that you are making an inherently synchronous IRP asynchronous by returning pending. You are not.



3.) You HAVE to do IoMarkIrpPending() if you return STATUS_PENDING from your dispatch. You HAVE to return STATUS_PENDING if
you are doing IoMarkIrpPending() in your dispatch.



4.) If you are going to return STATUS_MORE_PROCESSING_REQUIRED from your *completion routine* for an IRP, then in your
*dispatch* (not in the completion): you have to mark the irp pending & return pending. This means you need to know in you
dispatch if you are going to abort processing in your completion. That's the way it works.



5.) If you are going to return STATUS_PENDING from your dispatch routine, and you have a completion routine, then you HAVE to
propagate the pending flag in your completion and vice versa. Propagating the pending flag up the stack in your completion
routine is as follows:

    if (Irp->PendingReturned) IoMarkIrpPending( Irp );


    Note the Irp->PendingReturned variable is NOT valid on the way down of an IRP. It comes into existence
only when an IRP is completing.



6.) As the converse of rule 5.) if you ARE going to wait for the IRP to complete in your dispatch - i.e. your completion
routine sets an event and returns STATUS_MORE_PROCESSING_REQUIRED, and then you pick up in your dispatch and return a
non-pending status, you must NOT propagate the pending flag in your completion routine. This is because you are not going to
return pending from your dispatch, you are synchronizing the i/o essentially in your dispatch. The rules can be probably be
further simplified as:

    If returning pending from dispatch, mark irp pending.

    If you marked an IRP pending in dispatch, return STATUS_PENDING from dispatch.

    If you are NOT returning STATUS_PENDING from dispatch, then do not propagate the pending flag in your
completion



7.) You do not necessarily have to have a cancel routine even if you return pending. You do need a cancel routine if you are
going to hold on to the IRP for a long time in your driver - if you are queueing it up in your driver, then please, do
implement a cancel routine. If you are sending it off to a work queue, unfortunately since work items cannot be cancelled,
there is no point in implementing one.



8.) You have to clear the cancel routine before you do an IoCompleteRequest() on it, or before you forward it via an
IoCallDriver().




By the way - the i/o verifier catches almost every violation of the above rules & more. Turn it on for your driver. Ravi



This posting is provided "AS IS" with no warranties, and confers no rights.



windows command line

http://commandwindows.com/runline.htm

Wednesday, July 15, 2009

Install Bochs on CentOS 5 from src

0. Install similar package as qemu, Download src from bochs.sourceforge.net
1. yum install gcc-c++ libXpm-devel
2. ./.conf.linux (./configure reports some error but .conf.linux works)
3. make
4. make install

Install QEMU on CentOS 5.3 from src

0. download qemu from qemu.org.
tar xzf qemu-0.10.5.tar.gz
1. Download zlib from http://www.zlib.net
tar xzf zlib-1.2.3.tar.gz
cd zlib-1.2.3
./configure
make
make install
2. Install SDL develop lib
yum install SDL-devel
3. cd qemu-0.10.5
./configure
make
make install

Friday, July 10, 2009

VM in VM: QEMU, Xen, VMware ESXi and HyperV

I am trying to test some virtualization technologies. I don't want to install them on the real hardware directly. It is hard to change and maintain. So I want to install a VM in another VM. Following is a short summary:

----------------------------------Xen------------ Hyper-V--------- VMware ESX i
Underlying virtualization:
QEMU 0.10.5 -----------------\/--------------- X-------------------------- X
Virtual Box 3 ------------------?-----------------X-------------------------- X
VMware WS 6.5 --------------\/----------------?---------------------------- \/

It means Xen can run in a QEMU VM, but hyper-v and ESXi cannot run on current QEMU.

Install Hyper-V

Server core:
http://blogs.msdn.com/virtual_pc_guy/archive/2007/12/26/installing-the-hyper-v-beta-in-a-core-configuration.aspx
http://rickyfang.blog.51cto.com/1213/125167

Hyper-V manager:
http://go.microsoft.com/fwlink/?LinkId=122188
http://technet.microsoft.com/en-us/library/cc512503(WS.10).aspx

Develop GPL VMware ESXi drivers

http://open-vdrivers.wiki.sourceforge.net/Getting_Started

Install VMware ESXi in a VMware workstaion VM

http://wangchunhai.blog.51cto.com/225186/91400
http://wangchunhai.blog.51cto.com/225186/127783
http://wangchunhai.blog.51cto.com/225186/104710

Wednesday, July 8, 2009

Hyper-V architecture

http://msdn.microsoft.com/en-us/library/cc768520.aspx

Installing HyperV in QEMU VM.












fig 1


1. Use this cmd:


qemu-system-x86_64 -cdrom GRIC1HVxFRE1_DVD.iso -hda win2k8-2.img -m 2047 -boot d -no-kqemu -vga std

got BSOD in fig1. This happend when it lets the user choose the language.
Check the error code from here: http://msdn.microsoft.com/en-us/library/ms793648.aspx
It means: 1E is the error code of KeBugCheckEx(). The four parameters is defined in the above link. The exception code is 0xffffffffc0000005. It means access deny. The address of the exception occured is the second parameter: 0xFFFFF80010c1c3DE. The address that the driver attemps to access is the fourth parameter: 0xfffffffffffffff. It seems a serious problem.










2. Then I used this cmd:




qemu-system-x86_64 -cdrom GRIC1HVxFRE1_DVD.iso -hda win2k8-2.img -m 2047 -boot d -no-kqemu





Got BSOD in fig2. This happend when it lets the user choose the language.



Check the error code again.







Windows Driver Kit: Driver Development Tools
Bug Check 0xD1: DRIVER_IRQL_NOT_LESS_OR_EQUAL

The DRIVER_IRQL_NOT_LESS_OR_EQUAL bug check has a value of 0x000000D1. This indicates that a kernel-mode driver attempted to access pageable memory at a process IRQL that was too high.








3. Then I used this cmd:
qemu-system-x86_64 -cdrom GRIC1HVxFRE1_DVD.iso -hda win2k8-2.img -m 2047 -boot d -no-kqemu -vga vmware
Got BSOD in fig3. This happend when it lets the user choose the language.






4. On qemu 0.9.1 on Windows, cmd:



qemu-system-x86_64.exe -L . -hda win2k8.img -m 1024 -M pc -soundhw all -localtime -cdrom GRC1HVxFRE1_DVD.iso -boot d



Got BSOD in fig 4.




5. On qemu 0.10.5, CentOS 5.3 , cmd:

qemu-system-x86_64 -cdrom GRIC1HVxFRE1_DVD.iso -hda win2k8.img -m 2047 -boot d -no-kqemu

Got BSOD in fig 5.
Discussions about hyper-v:

Install WIN 2008 and x64 on QEMU

2003 x64
http://qemu-forum.ipi.fi/viewtopic.php?f=9&t=4906

win2008
http://qemu-forum.ipi.fi/viewtopic.php?f=9&t=5052

Monday, July 6, 2009

Using STREAM benchmark on CentOS 5

1. Download binary from here:
http://www.cs.virginia.edu/stream/ref.html#start
2. chomd +x stream_l
3. Download libstdc++-lib and install it:
http://rpm.pbone.net/index.php3/stat/4/idpl/8076489/com/compat-libstdc++-296-2.96-138.i386.rpm.html
4. Since stream_l is linked with a old version of the libstdc++-lib, we have to "fake" one:
ln -s libstdc++-3-libc6.2-2-2.10.0.so libstdc++-libc6.0-1.so.2
5. ./stream_l 2400 20

Friday, June 19, 2009

用QEMU+GDB 调试LINUX KERNEL的方法

1。安装QEMU,安装个LINUX虚拟机。被调试的是虚拟机。
2。重新编译虚拟机里LINUX KERNEL,打开DEBUG相关设置,具体设置可以看LDD3的第四章
3。用QEMU 启动被调试的虚拟机的时候,加上 -S 参数。注意是大写S,表示虚拟机一启动起来就先暂停(SUSPEND).
4。选中QEMU那个虚拟机,按 CTL+ALT+2切换到控制窗口,然后输入
gdbserver 1234. 意思是启动GDBSERVER,同时端口为1234。然后这时可以按c, 让QEMU继续运行。然后按CTRL+ALT+1 切换回虚拟机本身的显示。
5。在HOST(主机)上面启动一个GDB。如果要看LINUX KERNEL SYMBOL的话,最好在和虚拟机里编译LINUX KERNEL同样的路径上放上KERNEL的源码。
在启动GDB 的时候,用 gdb vmlinux 来启动。其中的vmlinux 是没压缩过,并且带符号表的格式,大小应该为30M 以上。
6。在GDB 里,输入 target remote localhost:1234。 这里的1234是和QEMU里的设置相对应。
7。回车,这时虚拟机应该被停了下来,同时在GDB里会显示一些随机的源码,显示GDB停在了什么地方。
8。这时,在GDB里,可以用b xxx设置断点,然后按c继续运行。
9。如果虚拟机已经运行了一半,这时想停下来加断点,可以在GDB里按CTRL+C.

GDB tutorials

More complete one:
http://www.delorie.com/gnu/docs/gdb/gdb.html#SEC_Top

Simple one:
http://www.unknownroad.com/rtfm/gdbtut/gdbtoc.html

Note: to run a program in GDB with argument, just type:
run cmdline-args

http://sunny-day.blogbus.com/logs/10603407.html
http://www.ibm.com/developerworks/aix/library/au-gdb.html?S_TACT=105AGX52&S_CMP=cn-a-aix
http://blog.chinaunix.net/u/9577/showart_408305.html
http://www.scribd.com/doc/15086590/GDB