Wednesday, December 15, 2010

Windows Server 2008 as Workstation(zz)

http://weblogs.asp.net/israelio/archive/2008/02/21/windows-server-2008-as-workstation.aspx

Tuesday, December 14, 2010

@bios, BIOS utility for Gigabyte BIOS

http://www.gigabyte.com/MicroSite/121/tech_a_bios.htm

Monday, November 22, 2010

Implementing, Testing And Debugging ACPI On Windows Platforms

Thursday, November 18, 2010

coreboot hibernation and wakeup

Wakeup:
http://tracker.coreboot.org/trac/coreboot/browser/trunk/src/arch/i386/boot/wakeup.S

Using libcap on CentOS 5

1. download libcap source from tcpdump.org
2. install it
2.1. yum install flex
2.2. download byacc from http://invisible-island.net/byacc/byacc.html
2.3. install byacc
3. then compile a sample program will be OK. But when running it, may got an error saying cannot find libcap.so.1. To solve this, use following:
set /usr/local/lib
/usr/lib
into /etc/ld.so.conf and then execute ldconfig command.

Tuesday, November 16, 2010

SWITCH FROM REAL MODE TO PROTECTED MODE

http://blog.csdn.net/pengyang/archive/2009/03/10/3977909.aspx

http://blog.sina.com.cn/s/blog_414c0121010005o3.html

http://blog.chinaunix.net/u3/95743/showart.php?id=2286642

Intel CPU manual, 3A, chapter 9.9.1


1. Disable interrupts. A CLI instruction disables maskable hardware interrupts. NMI interrupts can be disabled with external circuitry. (Software must guarantee that no exceptions or interrupts are generated during the mode switching operation.)
2. Execute the LGDT instruction to load the GDTR register with the base address of the GDT.
3. Execute a MOV CR0 instruction that sets the PE flag (and optionally the PG flag) in control register CR0.
4. Immediately following the MOV CR0 instruction, execute a far JMP or far CALL instruction. (This operation is typically a far jump or call to the next instruction in the instruction stream.)
5. The JMP or CALL instruction immediately after the MOV CR0 instruction changes the flow of execution and serializes the processor.

Tuesday, August 31, 2010

calculate total line number of a text file on Linux

command:
wc -l

jump to the line of large file:

http://superuser.com/questions/113039/less-quickly-jump-to-line-number-in-large-file
vim myfile +$n


If the file is open you can type:
100g to go to the 100th line.
50p to go to 50% into the file.
100P to go to the line containing 100th byte.
You can use these from terminal by adding + in front of them:less +100g bigfile.txt

Tuesday, August 17, 2010

用GDB+QEMU找WINDOWS KERNEL BASE的方法






目标:找到WINDOWS KERNEL BASE ADDRESS.

环境+工具: LINUX HOST, GDB, QEMU , WINDOWS XP VM
方法:

1. 启动 QEMU WINDOWS VM. 然后按 CTL+ALT+2 , 切换到 QEMU MONITOR. 输入 gdbserver 1234

2. On Linux host, start gdb, then type "target remote localohost:1234". Then the Windows VM is debugged by GDB.

3. On qemu monitor, type "info registers", then look fs segment, find its base address, (the second number), as shown in the figure1. This is the address of "kpcr", it is 0xffdff000 on the test VM.

4. Then get the value of KdVersionBlock :
kdversionblock = Dword(kpcr+0x34). It is 0x8054c738 on the test VM.

5. Then get the "kernbase" :
kernbase = Dword (kdversionblock+ 16). It is 0x804d7000 on the test VM.

6. Verify that is the correct address. Should see 0x4d 0x5a, 0x90 as the signature of the pe file.

(step 4,5,6 is shown in the figure 2).

Friday, July 30, 2010

4GB memory issues

http://www.biosren.com/thread-56-1-5.html
Intel® Chipset 4 GB System
http://www.biosren.com/viewthread.php?tid=283&extra=page%3D2&frombbs=1

Physical address layout (Chinese)
http://www.biosren.com/viewthread.php?tid=3200&highlight=

Can the kernel reserve memory at boot time ?(zz)

http://www.eggheadcafe.com/software/aspnet/35648681/can-the-kernel-reserve-memory-at-boot-time-.aspx

Hello,
I have a bad memory (from 589MB to 605MB according to memtest86+)
I'd like to know if the kernel can reserve this portion of memory; then it
will not be used anymore ?
Is it possible under Windows 7 ?
I Found that the Linux kernel have the parameter 'memmap' for this !
Unfortunately I cannot find the same thing for Windows :(
Any ideas to solve my memory corruption problem ?

---------------------------------------------------------------------------------------
Actually, it does have the option. Unfortunately, it is not documented.
There are calls MmMarkPhysicalMemoryAsBad and MmMarkPhysicalMemoryAsGood to
do this, which have been used for fault tolerant environments. I have never
seen any doc's that tell you how to use them, but they are in there, they
were even in the Windows XP DDK includes.
I agree with Alexander in this case the right thing to do is replace the
memory.

------------------------------------------------------------------------------------
With msconfig, it is possible to set the maximum use of memory, but not a
specific zone :(

Seems I have no choice to replace it !
Too bad, I solved the problem for the Linux OS (memmap=17M$589M), but not
for Windows...

Monday, June 7, 2010

Install new kernel on old distros (such as CentOS)

In short, choose YES for one configuration in the new kernel:
General config-->enable deprecated sysfs features which may confuse old userspace tools
(or similar, on 2.6.31.13, it is callled "remove sysfs features that may confuse old userspace tools")

Otherwise, you will get a kernel panic during boot and some errors, like "could not mount root filesystem".

ref: (Chinese version, some explantion about initrd.img.)
http://hi.baidu.com/mhlovejn/blog/item/930c96314f6311f21a4cffc6.html
http://hi.baidu.com/mhlovejn/blog/item/53740d298870c8f0e6cd4095.html
http://hi.baidu.com/mhlovejn/blog/item/7a4a55fe65de7488b801a020.html

Update : 11.4.2011
On kernel 3.0.8 (or some recent versions), this setting becomes a run time option. That means after enable it in the menuconfig, you also need to add one line in the grub, after kernel line, like:

kernel xxxx, sysfs.deprecated=1

Thursday, May 27, 2010

How to debug Linux kernel using QEMU?

How to debug Linux kernel using QEMU?
1. Install QEMU and install a QEMU VM, the Linux on VM is the one being debugged.
2. Enable some debug configurations, then recompile the kernel. The debugging configurations is described on LDD3 (Linux device driver), chapter 4.
3. Start QEMU VM, use –S argument. It means the VM will be suspended when start.
4. Click on the window of the QEMU VM, press ctrl+alt+2 to switch to QEMU monitor, then type: gdbserver 1234. This will start gdbserver built in QEMU and it listens on the port 1234. Then press “c” to “continue”. Then press alt+ctrl+1 to switch back to OS console.
5. Start a GDB on the host. To get the symbols of the linux kernel, put the source files of the compiled kernel on the same path on the host. Use “gdb vmlinux” to start gdb. Note the vmlinux is an uncompressed file and its size is about 30MB. If it is too small, it may not have necessary debug symbols.
6. In gdb command line, type: “target remote localhost:1234”. This 1234 is the port number used in the gdbserver in step 4.
7. Press “enter”, then the VM should be stopped, and gdb will display some source files, telling where the current function is.
8. To setup a breakpoint, use “b xxx” in gdb, then press “c” to continue.

Wednesday, May 12, 2010

Introduction to Bochs

Introduction to Bochs
1. Bochs is another emulator (virtual machine) supports X86.
2. Support multiple platforms: Linux, Windows, *BSD
3. Can compiled on Linux and Windows
4. Bochs-2.4.1 does not compile on 64bit CentOS. Bochs-2.4.5 OK.
5. On-line document is old. There is no bochs-dlx command mentioned in Chapter 3.3
6. ./configure --enable-debugger --enable-disasm --enable-debugger-gui=0


11.常用的bochs调试命令
c 继续执行
s count 单步执行,count为指令数量
vb seg:off 例如vb 0x0000:0x7c00(虚拟地址断点)
lb addr 线性地址断点
pb 物理地址断点
info break 断点相关信息
x /10 从当前地址执行10条指令
info cpu
info r 寄存器

Quick ref:
http://wiki.minix3.org/en/UsersGuide/RunningMinixOnBochs
http://box.matto.nl/minix3bochs.html
http://dev.csdn.net/article/83/83404.shtm (old)

Wednesday, May 5, 2010

Linux Suspend / Hibernate Functionality Support

http://www.cyberciti.biz/faq/linux-suspend-hibernate-functionality-support/

Xen常用的基本命令(ZZ)

http://hi.baidu.com/huangj/blog/item/2541bf38db8671cbd462252a.html

Xen常用的基本命令

RHEL5自带Xen,如果用上面的图形化操作也还是不错的。不过还是命令行比较方便——如果会的话。做下记录:
Xen的日志在/var/log/xen
一般是准备好一个虚拟硬盘在Domain-0上面,为安装System1做准备:dd if=/dev/zero of=/opt/Xen/system1.img bs=1024k count=10000 生成一个10g大小的硬盘(注:这一步可以省略,等virt-install来自己产生)。
开始安装:virt-install -n system1 -r 1024 --file=/opt/Xen/system1.img --nographic -l nfs:127.0.0.1:/media/cdrom
参数很好记忆。实在记不住,直接输入virt-install就可以,他会有文字提示一步步完成。或者也可以直接 virt-install -h看提示,里面有包括nfs,ftp,http的格式都有,所以其实不怎么费脑子。关键是要告诉xen,你装的虚拟机的名字,给他分多少内存,硬盘在哪,安装文件在哪,就够了。安装过程和正常装没什么区别。不过好像不能再选择在虚拟机里面装Xen了……

安装完成以后,在/etc/xen目录下,会产生类似 system1这样的目录,里面就是这个虚拟机的配置文件。通过观察配置文件,比如我这里的system1这个文件,就会发现其实里面记录的就是一些虚拟机的信息,比如硬盘在哪,网卡用什么mac地址,cpu,机器名等。

常用的命令:
xm list 查看当前机器里的全部虚拟机列表
xm create xxxx 启动名字为xxx的虚拟机
xm shutdown xxxxm
reboot xxxxm
pause xxxxm
resuem xxx

需要管理登入的时候:xm console xxxx就可以像在本机一样操作虚拟机了,如果要退出到Domain-0,按住Ctrl+] 就行

硬盘不够可以添加,添加了以后不用重启,直接生效,和插u盘一个效果xm block-list xxxx 查看xxx虚拟机的可用硬盘设备xm block-attach xxx tap:aio:/xxxx.img xvdb w这里可以参照一下之前的/etc/xen下的配置文件,对于之前用dd生成的文件挂载,貌似就是用tap:aio:这样的形式,如果是vmware的 vmdk文件就写上vmdk(我只尝试了dd产生的文件的形式),xen支持好些个虚拟机的硬盘格式转化。剩下的在虚拟机里面就可以看到新硬盘,然后就和实体机一样操作就好了。

注意:
xm list 在XEN3.1 CENTOS HOST上没法显示所有的VM. 还是要自己到/etc/xen下面去看已经有的配置文件。

Monday, May 3, 2010

ACPI中ASL的简单介绍

ACPI中ASL的简单介绍

ACPI中ASL的简单介绍

ACPI(Advanced Configuration and Power Interface)中,一个特色是有自己专门的一个语言来编写ACPI的那些表。这个语言叫做:ASL(ACPI Source Language). ASL在经过编译器编译后,变成AML(ACPI Machine Language)。然后由OSPM(一般也就是OS)来执行。

AML是一种BYTECODE,类似JAVA BYTECODE。也就是说,他并不是直接在机器上执行的2进制代码,而是需要OS来解释后执行。这样做的好处是方便错误检查,减少由于代码没写好而带来 的负面影响。

本文主要介绍下ASL,并把他和其他常见的编程语言,比如C,C++,JAVA,PERL之类的进行对比。适合初学者。另外,作者本人也是刚刚学 ACPI, ASL,所以文中也许有不对的地方,欢迎大家指正。

在学ASL之前,我也学过一些编程语言,比如C,C++,JAVA,PERL之类。所以在开始学ASL的时候,有意无意的同这些以前学的语言进行比较。慢 慢的,我发现ASL同前面提的这些语言差别还是很大的。下面简要介绍下ASL的特性和差别。


1、ACPI NAMESPACE与一般的常量,变量的区别。
一般的编程语言中操作的是常量和变量。这些变量之间一般没啥关系,可以说是一堆平行(有序或者随机排列的)的内存地址而已。而在ACPI中,这个发生了明 显变化。ACPI引入了一个NAMESPACE的观念。也就是说所有的OBJECT之间是有等级关系的。类似一个文件或者注册表系统,各个ACPI OBJECT(类似常量)之间都存在于一个路径下面,其中的根目录就是以符号“\"来表示。然后上下级目录之间用 “.”来连接起来。

比如\_SB_. FOO.BAR 就表示根目录下的_SB_这个OBJECT下的FOO OBJECT下的BAR OBJECT.

因此,在ACPI中,很多操作都是作用在这个NAME SPACE 中的某个OBJECT上面。并由此引入了一系列相关概念。比如SCOPE。

为什么要这样设计呢?因为ACPI本身是一个针对性很强的规范,就是电源管理。因此把这些常用的OBJECT排列好,分类好。处理起来也方便。灵活性比一 般的编程语言差了,但是简单,并且能满足设计要求。


2、ASL中有大量的OPERATOR(操作符)。
基本上看一段ASL代码,其中操作符占掉了大部分。比如ASL中很多都是如下形式:Device(PCI0)。一般在小括号前面的都是操作符,也就是预先定义好的。这也是因为ASL本身的目的就很简单,所以 很多东西可以先定好。


参考资料:
ACPI SPEC 4.0, CHAPTER 5, 18.

Wednesday, April 7, 2010

WINDOWS与LINUX中的中断处理比较

最近刚好分别看了WINODWS和LINUX中的中断处理部分,趁着还没忘记,把感想写下来方便以后查询。

一.不同之处:

在WINDOWS中,有一个IRQL(注意不是IRQ)的概念。最早的时候,我以为是CPU设计里就包括了这个东东。后来看INTELCPU手册,发现似乎没有。最近又看了一遍WINDOWSINTERALS 4TH。感觉这个东西应该是包括在PICOR APIC里面的(关于APIC,可以看我以前的帖子)。对于X86-32,硬件设备的IRQ于IRQL之间的关系是:IRQL=27-IRQ。引入IRQL的动机似乎是这样的:当CPU运行在低IRQL时,如果来了一个高IRQL对应的中断,那么低的中断的ISR是会被高的ISR抢过去的。就是说低的ISR又被一个更高级的ISR中断了。

这样的好处是优先级高的ISR可以更快的得到响应。另外,在具体实现中,由于操作PICOR APCI改IRQL是比较费时的,所以WINDOWS是尽量不去直接操作硬件,而是等到万不得已的时候才改。

在LINUX中,似乎没有类似IRQL这样的观念。就我目前看过的书和代码来看,LINUX中的ISR或者是KERNLE最多是操作下CPU上的中断标志位(IF)来开启或者关闭中断。也就是说,要么中断全开,要么全关。似乎不会出现低的ISR被高的ISR中断的情况。从这一点来看,LINUX在这部分的设计上比WINDOWS简单。
(update 2015:  上面应该是关于老的LINUX, 新的LINUX 2.6 以后,request_irq的时候有一个FLAG,可以控制关闭所有的中断,还是只关自己的中断,默认是只关自己的中断)。

二.相似之处

WINDOWS和LINUX似乎都把中断分成了2部分。在LINUX中叫ISR(还是其他?)和BOTTOMHALF。而WINODWS中,DPC(DeferredProcedure Calls)和APC(AsynchronousProcedure Calls)就非常类似BOTTOMHALF。二者把中断分成两部分的动机是差不多的。都是为了把ISR搞得越快越好。

LINUX中,在ISR里一般关中断,所以时间太长的话,其他中断就得不到响应。WINDOWS中,ISR跑在一个很高的IRQL里面,同样会阻塞其他IRQL比较低的任务。LINUX中的BOTTOMHALF 又可以分为TASKLET和SOFIRQ。二者的主要区别是复杂度和并发性(CONCURRENCY)。下面COPY自一书。

Tasklet: Only one instance of each tasklet can run at anytime. Different tasklets can run concurrently on different CPUs.
Softirq: Only one instance of each softirq can run at thesame time on a CPU. However, the same softirq can run on different CPUsconcurrentlyOnly one instance of each softirq can run at the same time on aCPU. However, the same softirq can run on different CPUs concurrently.

WINDOWS中的DPC有点类似TASKLET和SOFTIRQ。DPC是系统范围内的,并且运行在DPCIRQL。是一个类似中断上下文的环境(INTERRUPTCONTEXT)。APC和DPC的区别是运行在更低级别的APCIRQL。另外,APC是针对每一个线程的。执行在某个线程环境中。主要目的也是把一部分事情放到以后去执行。APC又分为KERNELAPC 和USERAPC。

APC这个观念在LINUX中似乎没有类似的?至少我还没想到。

三.参考文献:

1.WINDOWSINTERALS 4TH
2.UNDERSTANDINGLINUX NETWORK INTERNALS, 2005

文中也许有些不对的地方,欢迎大家讨论。

Tuesday, March 23, 2010

XP 64 cannot hibernate issue

I have a server machine with 8GB and installed with XP 64bit. I tried to put it into hibernate mode. But I cannot find the option in the control panel. Pressing Shit key when shutting down the machine does not work either.
After some google search, I found out that when the memory is too big (larger than 4GB) and PAE is enabled, the XP (Vista, Server 2003, 08) then does not support hibernation.

I tested this on another machine also with 8GB ram but installed XP 32 without PAE. That machine can hiberate.

ref:
http://www.velocityreviews.com/forums/t525208-no-hibernate-tab-in-power-option-properties-in-xp-x64-in-pae-mode-its-not-allowed.html

Friday, March 12, 2010

Increase MTU of the NIC on Linux

http://www.intel.com/support/network/sb/cs-009209.htm

今天用5分钟解决了一个SEGMENTATION FAULT 错误

一段LINUX小应用程序。主要功能是MMAP一段内存,MEMCPY点东西上去。

但是程序运行起来后,出现了SEGMENTATION FAULT的错误提示。如果是以前,我多半就慌了,也不知道咋解决。不过最近看了WING老兄的那篇文章之后,对这种提示已经没什么恐惧心理了,

呵呵。言规正传,我们知道SEGMENTATION FAULT一般也就是访问内存时出错了。由于我用到了MMAP, MEMCPY等。那么出错也很有可能。但是我想了下,上面那2个函数都没错啊,范围大小什么也都对。于是加了点PRINTF。发现是MEMCPY之后才出错的。比较奇怪,MEMCPY之后还有啥呢?

于是又开了个GDB调试下。在出现SGEMENTATION FAULT提示后,敲个“where”,看看在什么地方,显示是MAIN 531 行调用了PUTS,然后后面就是GLIBC的函数了。于是看看MAIN 的531, 是个PRINTF。 这也能引起SGEMENTATION FAULT? 有点怀疑。不过从调用PUTS来看,确实是PRINTF引起的。还真有点邪门啊。眼睛于是在PRINTF周围看了看。突然,发现上面一句是MUNMAP。这个也是和内存有关啊。有嫌疑。接着又仔细看了下。发现就是它。因为它在传入的大小参数时候是错的。主要原因是那个大小我之前改过几次,当时都是只改 MMAP那里,而忽略了MUNMAP!

于是改成正确的大小,搞定。前后大概也就5分钟吧。

终于解决了一个困扰了1周多的问题

要实现的功能不是很复杂,就是在SMM 下面操纵下网卡。

一开始在QEMU虚拟机里调试,还算比较顺利。想明白了之后,自己用汇编写个小程序,又写个小LOADER就实现了。这时心情很好,自信心也很足,剩下的就是把代码移植到一台真正的物理机器上了。

谁知道,恶梦开始了。在QEMU里跑的好好的程序在物理机器上就是不行。没办法,又重新看网卡SPEC,然后用各种方法,加了N多调试,始终不行。之前我努力的重点是看在操作网卡的时候有没有漏掉什么。因为我怀疑QEMU虚拟的网卡和真实的差别比较大。另外,我在我的SMI里把用到的数据都读出来,验证过,证明数据都是没有错的。于是继续怀疑网卡。同时又看了一堆LINUX 网卡实现方面的东西。

搞了好几天,还是不行,十分沮丧,差点都不想搞了。不过老板在后面逼着,不搞不行。于是接着想其他方法。后来又想到,也许是SMM的问题。如果我在 LINUX环境下用SMM里同样的方法操作下网卡,看看是否能得到想要的结果。如果可以,那么就是SMM的问题,如果不行,那么是我的程序在网卡设置什么地方不对。又改了下小程序,试了试,发现在LINUX中是可以的。也就是我操作网卡的地方没错。那么就是SMM的问题了。是什么问题呢?突然想到,我是把网卡的 TX_DESC之类的东西放在SMRAM里面的,难道网卡DMA的时候,读不到 SMRAM?于是接着改程序,把TX_DESC放到系统内存里。

再次怀着紧张的心情试了下,成了!!!!!!!!!!!!!!!!!!!!!!!!!

结论就是,网卡DMA的时候果然访问不到SMRAM里面的东西。其实这个我觉得还是有点奇怪的。也许只是适用我实验的那台老机器。按理说,我SMRAM也没加锁,而且又是在SMM模式下操作的网卡,PCI卡应该能访问到SMRAM才对啊? 不过也无所谓了,反正放到普通RAM里也一样。

PS:在这个过程中,顺手解决了一堆小BUG, 比如SMI只能触发一次啦。在SMM和OS之间没法利用共享的内存通信啦,如何检测网卡内部FIFO状态啦等等,呵呵。还是学到了不少东西。

Thursday, February 25, 2010

How to find all processes in Linux manually

How to find all processes in Linux manually

Linux kernel already provides a MACRO called for_each_process to go through each process. However, you may need to find all the processes manually sometimes. Following is the method:

Environment:

QEMU 0.10.2
QEMU VM: Cent OS 5.3 (Linux 2.6.18)
ARCH :X86-32

Basic idea:

We know that every process has a task_struct in Linux. The task_struct has a member called “tasks”. It is a list_head and all the processes are linked together by task_struct.tasks.

Therefore, to find out all the processes, we could find one task_struct and then follow the tasks member. Other useful members in task_struct are: pid (it is similar to thread id on Windows), tgid ( it is similar to process id on Windows), comm (command line, can be used as process name ).

Next, we are going to use QEMU to find all the processes manually from QEMU monitor. In GDB, we could use “p task->tgid” command to print out the tgid. But QEMU does not support that. So we have to know the exact offset for each interesting member of task_struct.

To find out the offset for members such as tasks, pid, I use a kernel module to print out them. Following is the result: (pid =0xbc means the offset of pid in tasks_struct is at 0xbc)

pid = 0Xbc
tgid = 0Xc0
comm = 0X1AC
tasks = 0X80

The next thing we need to know is how to find the first task_struct. You can read ULK3 or LKD2 to find the answer. In sum, find esp first, in QEMU, esp can be obtained by using “p $esp”, let’s say it is 0xc100,4566. Set the low 13bit of the esp to 0, we get 0xc100,4000. This is the value of “current”, and it’s first member is the *task_struct.

OK, let’s do a real experiment :
1) Setup a QEMU VM with CentOS 5.3 installed as the guest OS. And run it.
2) Press alt+ctrl+2 to switch to QEMU monitor. Then type “stop” in the QEMU monitor. This stops the QEMU emulation so that we don’t need to worry about the memory changes due to a process exit.
3) Type p$esp to find esp. It is 0xc3a94f78 in my experiment.
4) Find THEAD_INFO . That is to set the low 13bit of esp to 0. We got 0xc3a94000
5) x /20w 0xc3a94000 . This will show the content of THREAD_INFO. The first member is the *task. The result is 0xc3aaa370. (Note that, sometimes the value here might be 0, just press “c” to let QEMU continue and “stop” to try more times.)
6) Now we know the address for task_struct, then it is easy to find tgid, tasks and comm. For example, pid is at 0xc3aaa42c (task_struct+0XBC). Note that when checking the content of “comm”, use following command in QEMU : x /20b 0xc3aaa51c. Otherwise, the byte order is reversed by QEMU.
7) Now we know the info about the current process. To find out the next one, just follow task_struct->tasks. Note that the address of task_struct->tasks->next is not the base address of task_struct. The base address of next task_strcut is 0x80 smaller. Then you can repeat the above steps to find out all the processes.

Friday, February 19, 2010

Function-call Conventions

Ref:
Assembly view:http://unixwiz.net/techtips/win32-callconv-asm.html
C view:http://unixwiz.net/techtips/win32-callconv.html

code from MS VS:

1.1 cdecl callee
void __cdecl ccall(int arg1)
{
004115A0 push ebp
004115A1 mov ebp,esp //前面2行是标准的函数入口开头,LINUX也这样
004115A3 sub esp,0C0h //留一段STACK空间
004115A9 push ebx
004115AA push esi
004115AB push edi //前面3行以前总是不明白。现在清楚了,因为后面要用到那些寄存器,所以这里先保存到堆栈上。
004115AC lea edi,[ebp-0C0h]
004115B2 mov ecx,30h
004115B7 mov eax,0CCCCCCCCh //这个 CCCCCCCC 是不是很熟悉?如果是,那么说明你经常犯没有初始话局部变量的错误
004115BC rep stos dword ptr es:[edi] //到这里,目的就是把堆栈上的局部变量设定一个初始值 CCCCCCCCh ,这样出了问题好查。 这些是WINDOWS特有的。或者是MS VS 特有的。
arg1++;
004115BE mov eax,dword ptr [arg1]
004115C1 add eax,1
004115C4 mov dword ptr [arg1],eax
}
004115C7 pop edi
004115C8 pop esi
004115C9 pop ebx //一堆POP,把之前保存的寄存器恢复。注意顺序很重要。
004115CA mov esp,ebp
004115CC pop ebp
004115CD ret //终于返回了。

1.2 cdecl 的caller
ccall(2);
00411670 push 2
00411672 call ccall (41114Ah)
00411677 add esp,4
很明显的可以看到 1.2 里的最后一行,就是caller在清堆栈。

下面再看看stdcall的例子
2.1 stdcall callee
void __stdcall stcall(int arg1)
{
004115E0 push ebp
004115E1 mov ebp,esp
004115E3 sub esp,0C0h
004115E9 push ebx
004115EA push esi
004115EB push edi
004115EC lea edi,[ebp-0C0h]
004115F2 mov ecx,30h
004115F7 mov eax,0CCCCCCCCh
004115FC rep stos dword ptr es:[edi]
arg1++;
004115FE mov eax,dword ptr [arg1]
00411601 add eax,1
00411604 mov dword ptr [arg1],eax
}
00411607 pop edi
00411608 pop esi
00411609 pop ebx
0041160A mov esp,ebp
0041160C pop ebp
0041160D ret 4 //这个地方和前面的CDECL CALL不一样了。终于发现你了!

2.2 stdcall caller
stcall(3);
0041167A push 3
0041167C call stcall (411203h)

Wednesday, January 27, 2010

QEMU小实验:手工遍历所有进程的方法

QEMU小实验:手工遍历所有进程的方法

在内核中已经提供了遍历所有进程的方法,比如用for_each_process宏。但是如果你想加深对这部分的了解,那么可以不用这个宏,完全手工遍历一遍。下面介绍了在QEMU中,利用QEMU MONITOR,手工找出所有PROCESS的方法

环境:
QEMU 0.9.1
QEMU VM: CENT OS 5.3 (LINUX 2.6.18)
ARCH :X86-32

主要思路:
我们已经知道LINUX中的所有进程都对应一个TASK_STRUCT. 同时这些TASK_STRUCT里面有一个成员叫做TASKS。它的类型是一个LIST_HEAD(有NEXT, PRE2个成员,构成一个双向链表)。所有的进程就是通过这个TASK_STRUCT里的TASKS连接在一起的。

想遍历进程的话,可以找到一个上面说的结构,然后顺着链表一个一个找下去。另外,还有一些重要的信息在TASK_STRUCT中。常用的有PID(线程ID), TGID(进程ID), COMM(进程的命令参数)。
(关于PID, TGID,可以看下面这个帖子:http://linux.chinaunix.net/bbs/thread-1155667-1-1.html

如果是在GDB里,那么找到了TASK_STRUCT之后,可以直接用 p task->pid 的方法来把PID之类的打印出来。但是我们的目的就是不用GDB的功能,而只用QEMU MONITOR的功能,纯手工的找出PID这些成员。

这样一来,我们需要自己计算下PID, TGID等成员在TASK_STRUCT中的偏移量。怎么算呢?最天真的办法是按照 .H 文件里TASK_STRUCT的声明,自己一个一个算过去,比如一个CHAR占一个字节,一个INT 4个字节等。但是由于TASK_STRUCT是一个很大的结构(包括几十或者上百个成员),同时考虑到编译的时候还有对齐的问题。这样手工算可以说既费力又不一定对。

最厉害的办法是自己写个类似的C编译器,直接把编译后的输出打印出来。感兴趣的可以看看CIL(C Intermediate Language)。但是CIL还是有点麻烦的。它是用OCAML语言写的。我不是很熟悉。

于是我采用了一个比较折中的办法:自己写个KERNEL MODULE, 在这个MODULE里直接定义一个TASK_STRUCT,然后把感兴趣的那些成员的地址和基地址都打印出来。或者直接把偏移量打印出来。

我就是用上述办法得到了几个关键成员的偏移量:
(前面是TASK_STRUCT里的成员,后面是它距离TASK_STRUCT基地址的偏移量)
pid = 0Xbc
tgid = 0Xc0
comm = 0X1AC
tasks = 0X80

还有一点,我们怎么找到第一个TASK_STRUCT呢?可以看这篇文章(如何找出CURRENT):
http://linux.chinaunix.net/bbs/viewthread.php?tid=1147973&extra=

好了,万事具备,开始行动吧。
1) 运行一个QEMU VM。我是在WINDOWS HOST上跑的QEMU. 在LINUX上也可以跑QEMU。但是LINUX上的QEMU有一点不好,那就是它的QEMU MONITOR的大小是固定的。太小了。而在WINDOWS上面,QEMU MONITOR可以变得很大。

2) 按ALT+CTRL+2, 切换到QEMU MONITOR. 输入STOP. 这样QEMU 就会暂停下来了。这样的好处是你可以花很多时间来遍历进程。而不用担心某个进程结束后,带来的地址无效的问题。

3) p $esp :找出当前的ESP。我实验的时的输出是 0xc3a94f78

4) 找出THREAD_INFO:也就是把ESP与上0XFFFFE000。得到0xc3a94000

5) x /20w 0xc3a94000。显示的这个内容的第一个指针(前4个字节)就是当前进程的TASK_STRUCT。因为THREAD_INFO里的第一个成员就是struct task_struct *task
我这里得到的结果是0xc3aaa370。

(注意:有的时候这个地方显示的数据全都是0。我怀疑是因为进程正在切换中,内核栈刚刚清空。碰到这种情况,可以在QEMU MONITOR里输入c, 让QEMU继续跑一段时间,然后输入stop. 从头开始。有时要多试几次)

6) 找到了TASK_STRUCT之后,根据前面找到的偏移量,可以方便的找到PID, TGID, COMM, TASKS 等成员的值。比如PID的位置在0xc3aaa42c (TASK+0XBC)。
注意,在看COMM的时候,要用下面这个命令: x /20b 0xc3aaa51c. 否则的话,QEMU会把数据当成DWORD处理,并自动转换Endian. 使得看到的字符串的顺序是反过来的。

7) 这样一来,当前进程的信息已经知道了 。在我的输出中,PID,TGID=0,COMM=SWAPPER. 我们开始找下一个。下一个进程是通过TASK_STRUCT->TASKS 连接在一起的。有一点要注意,TASK_STRUCT->TASKS->NEXT中的地址不是下个TASK_STRUCT的起始地址,而是下个TASK_STRUCT中的TASKS的地址。所以有了这个地址后,要先减去0X80才得到基地址。然后就可以用上面的方法找出PID, TGID, COMM来了。

Linux中的task,process, thread 简介

本文的主要目的是介绍在Linux内核中,task,process, thread这3个名字之间的区别和联系。并且和WINDOWS中的相应观念进行比较。如果你已经很清楚了,那么就不用往下看了。

LINUX版本:2.6.18
ARCH: X86

首先要明确的是,按照LKD 2里面的说法,LINUX和其他OS 比如WINDOWS, SOLARIS之间一个很大的不同是没有严格定义的线程(thread)。那么你也许会问,如果LINUX中没有线程,那么如何来表示类似WINDOWS 线程的那种执行观念呢?答案是LINUX中,PROCESS(进程)可以当作线程。

那么你也许又会问,WINDOWS中的多线程程序在LINUX中是怎样表示的呢?具体来说,LINUX中的PROCESS有2种。一种是独立的 PROCESS。自己有自己的地址空间,资源列表,代码等。另外一种PROCESS是和其他PROCESS共享一个地址空间,资源列表的。这种 PROCESS就类似于WINDOWS中的线程。

在看LINUX内核代码的时候,你会同时看到process, task, thread这3个名字。下面简要介绍下他们之间的区别:

1、task 可以理解为一个LINUX PROCESS。最著名的定义TASK的数据结构叫做struct task_struct, 在linux\sched.h中。我觉得这个名字起得不好。因为大家都已经对PROCESS, THREAD之类得观念很熟悉了。现在又冒出来个TASK,很容易让人搞混。不过也许是历史原因吧。这个TASK一直保留着。

在 task_struct 中有一堆的成员。其中有PID 和TGID. PID实际上类似于WINDOWS中的THREAD ID。而TGID (thead group id) 对应于WINDOWS中的PID。PID对于独立的PROCESS来说,就是它的PID。这时PID == TGID。对于和其他PROCESS共享地址空间的PROCESS来说,每个都有独立的PID,但是他们的TGID是一样的。

2、thead虽然说LINUX不支持THREAD. 但是在内核代码里又可以看到THREAD这个名字。这时可以把他们和WINDOWS中的THREAD对应起来。一个比较著名的是thread_info 结构。

3、kernel thread在LINUX中,kernel thread是一个专门的名词。它的特点是没有独立的地址空间(MM结构为NULL). 他们只运行在KERNEL SPACE.不能切换到USER SPACE。

最后,总结下,在LINUX中,一个PROCESS即可能是一个WINDOWS PROCESS类似的观念,也可能是一个与WINDOWS THREAD类似的观念。而且有时还被叫做TASK(感觉有点乱)。不过最常用的还是与WINDOWS PROCESS类似的观念。比如在内核代码中有一个for_each_process宏。它就是只遍历那些主要的,独立的PROCESS。

参考资料:
1、LKD 2
2. http://blog.csdn.net/pppjob/archive/2009/02/05/3864020.aspx

Saturday, January 2, 2010

How to find the physical address of kernel text of Linux x86_64

How to find the physical address of kernel text of Linux x86_64

1. Goal:
I need to find the physical address of kernel text of Linux x86_64 (2.6.18). Why physical address? If you know DMA attack, you will understand why the physical address is useful. Basically, it may be used when you want to attack Linux x86_64.

BTW: for Linux x86_32, it is pretty easy to find the physical address of some kernel data structure. In x86_32, the kernel is normally mapped at 0xc0000000. Then the physical address is virtual address minus 0xc0000000.

2. Method:

Wrong method:

At first, virt_to_phys() function in the kernel came to my mind. The function name suggests that it is intended to be used to convert the virtual address to physical address. Also, it is easy to get the virtual address of the kernel text segment. You may use command : grep _text Symbol.

However, I find it got a wrong address when I used the above method. The physical address that obtained by virt_to_phys() is obvious wrong because it exceeds the amount of physical memory of my computer. After I read the description of virt_to_phys(), I found out that it only works for memories that are allocated by kmalloc().

Correct method:

After reading an article about linker script, I found out that the beginning physical address of the kernel is defined in the linker script. So let’s see what is inside this script.

First of all, the location of the script is in arch/kernel/x86_64 (or x86 for 2.6.32).

Following is a part of the script:
http://lxr.linux.no/#linux+v2.6.18/arch/x86_64/kernel/vmlinux.lds.S

SECTIONS
17{
18 . = __START_KERNEL;
19 phys_startup_64 = startup_64 - LOAD_OFFSET;
20 _text = .; /* Text and read-only data */
21 .text : AT(ADDR(.text) - LOAD_OFFSET) {

__START_KERNEL is a macro. So let’s find all the related definitions. There are in some .h files.


#define __PHYSICAL_START CONFIG_PHYSICAL_START
#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
#define __START_KERNEL_map 0xffffffff80000000UL
#define LOAD_OFFSET __START_KERNEL_map

Following is a description of CONFIG_PHYSICAL_START:

http://lxr.linux.no/#linux+v2.6.18/arch/x86_64/Kconfig#L493

config PHYSICAL_START
494 hex "Physical address where the kernel is loaded" if (EMBEDDED CRASH_DUMP)
495 default "0x1000000" if CRASH_DUMP
496 default "0x200000"
497 help
498 This gives the physical address where the kernel is loaded. Normally
499 for regular kernels this value is 0x200000 (2MB). But in the case
500 of kexec on panic the fail safe kernel needs to run at a different
501 address than the panic-ed kernel.

From above, we can see that the default value of PHYSICAL_START is 0X200000. Also, the virtual address of the kernel starts from 0XFFFFFFFF80000000 (by using grep _text symbol file). Therefore, the contents of these two addresses should be the same. So the physical address 0x200000 correspond to the virtual address 0XFFFFFFFF80000000.

Last, let’s test the above theory on a virtual machine. I installed CentOS 5.3 x86_64 on QEMU (version 0.11.91, the older version such as 0.9.x or 0.010.x seem to have problems when install x86_64 CentOS). When QEMU virtual machine has focus, press alt+ctrl+2 to switch to QEMU monitor, then enter following two commands:
x /10x 0xffffffff80000000
xp /10x 0x200000
It turns out that the contents of these two addresses are the same. It means the theory is correct.