Account

Log in (OpenID enabled)
Register

WordPress Spam: Javascript antispam such as WP-SpamFree/WP-HashCash don’t work anymore

Being a user of websites myself, I have always been annoyed at Captchas like this:

An annoying captcha from Google

Which is why I’ve avoided using them on Codexon, opting to use JavaScript based anti-spam like WP-SpamFree or WP-HashCash instead. Unfortunately, JavaScript anti-spam is defeated the moment spammers decide to add Javascript to their bots.

It worked well up until a couple of months ago, but there has recently been a surge of spam getting past the JavaScript anti-spam. I believe spammers are now targeting them. After installing a captcha, spam here at Codexon has dropped from 10+ a day to zero.

So if you have a website and are considering just using a convenient JavaScript solution, be wary.

  • Reddit
  • Facebook
  • Google Bookmarks
  • RSS

2 Comments


Make a Python JIT compiler without writing a single line of C or 3rd party library

JIT is just a fancy term for making native code during runtime; another word for “eval” except you won’t be sneered at when you implement it.

I will show you how this can all be done in the comfort of Python without having to write a single line of C or use a 3rd party library.

from ctypes import *
 
code32 = ('\xb8\x7b\x00\x00\x00' # mov eax, 123
          '\xc3')                # ret
 
code64 = ('\x48\xc7\xc0\x7b\x00\x00\x00' # mov rax, 123
          '\xc3')                        # ret
 
def windows(code):
    PAGE_EXECUTE_READWRITE = 0x40
 
    code_buffer = create_string_buffer(code, len(code))
    code_buffer_address = addressof(code_buffer)
    windll.kernel32.VirtualProtect(code_buffer_address, len(code), PAGE_EXECUTE_READWRITE, byref(c_int()))
    return code_buffer_address
 
def linux(code):
    PROT_WRITE = 2
    PROT_EXEC  = 4
 
    libc = CDLL('libc.so.6')
    code_buffer_address = libc.valloc(len(code))
    memmove(code_buffer_address, code, len(code))
    libc.mprotect(code_buffer_address, len(code), PROT_WRITE|PROT_EXEC)
    return code_buffer_address
 
import platform
bit, os = platform.architecture()
 
code     = code32 if bit == '32bit' else code64
platform = windows if os == 'WindowsPE' else linux
 
function_prototype = CFUNCTYPE(c_int)
jitfunction = function_prototype(platform(code))
print jitfunction()

The code does the following:

  1. Allocates space for the code.
  2. Writes the code to memory.
  3. Enables execution privileges on that memory.
  4. Creates a function prototype gateway so you can use it from Python. (this feature is undocumented!)

Now that I’ve shown the basic steps, it should be simple for you to hook this up to a lexer or look up additional assembly codes to use.

  • Reddit
  • Facebook
  • Google Bookmarks
  • RSS

2 Comments


Fun with Facebook Graph

Facebook has released an API to access their data through the web. Apparently you could do the same thing by logging on and using the gui, but now everyone can benefit from the lack of privacy!

Who hates their boss?

http://graph.facebook.com/search?q=I%20hate%20my%20boss

The spammers are spamming their acai berry.

http://graph.facebook.com/search?q=acai%20berry

Interesting crab infections.

http://graph.facebook.com/search?q=itchy%20crabs

Juicy medical information for insurance companies.

http://graph.facebook.com/search?q=colorectal%20surgery

Got any other interesting searches? Leave a comment below.

Google Cache of the API (no Facebook login necessary)

  • Reddit
  • Facebook
  • Google Bookmarks
  • RSS

1 Comment


Is Cassandra ready yet?

With Twitter, Facebook, Digg, and Reddit using the open-source Amazon Dynamo clone, Cassandra may be the winner of the NoSQL race.

But is it ready for a general purpose (excluding transactions and other obvious features) data store? There are a few kinks right now that may hamper your adoption of Cassandra.

You must restart Cassandra in order to add/delete/rename ColumnFamilies (tables)

This can be a problem if you want to run your website without interruption. A rolling restart is recommended, but that will also impact your uptime if you choose a consistency level of “ALL”. A patch to solve this problem is being worked on and is scheduled for version 0.7. The current version is 0.5

Issue 44

Getting a range of rows may return empty “tombstoned” rows

Due to the way that Cassandra is engineered, Cassandra must mark deleted rows as “tombstones” so other nodes will know that it has been deleted instead of the other node not having a copy propagated yet. Tombstones will only disappear after a costly “compact” command.

The problem is that if you want to return users 40-50, you may get 10 empty rows. To fix this you would then have to repeatedly query Cassandra until you get 10 non-empty rows. This back and forth may cause significant overhead on your network. Before it was done server-side, but apparently the early adopters preferred to return empty rows . This is a mistake in my opinion.

No distributed versioning (only last write wins)

Vector clocks support is scheduled to be added, but there is no ETA. This may be important for some applications. For example, if you have a shopping cart and you add Item A on Server 1 and Item B on Server 2 and the servers momentarily split, your shopping cart may end up with only Item B if you use “last write wins”. With vector clocks, you will get the chance to merge them.

Issue 580

No advanced queries through map/reduce (yet?)

Hadoop integration is planned, but to my knowledge, this isn’t ready yet. Your application may or may not need this feature.

Issue 342

No atomic increment/decrement

This feature is necessary for counting things like page-views accurately. There is a hack here using a 3rd party application called zookeeper.

Issue 721

Even with these current shortcomings, you may find Cassandra to be suitable for you right away. It is definitely one of the best NoSQL solutions available.

  • Reddit
  • Facebook
  • Google Bookmarks
  • RSS

Leave a Comment


Hot code swapping for servers not written in Erlang

The recent server benchmark posted here comparing Erlang, Haskell, and Python understandably upset many people. Today we will move away from polarizing benchmarks and more about features.

Even though Erlang may not be the fastest language, it still has at least one great feature: hot swapping code without restarting the server. This is something that every long running server should have, but unfortunately it comes at the cost of performance, and some languages simply are not built to support it.

However, here is a way to emulate it for a web server, thanks to techniques borrowed from Nginx.

Here are the steps:

  1. Use the spawn family of system calls (ex. spawnl spawnvp) with the flag P_NOWAIT to execute itself. This new process has access to the open sockets.
  2. The new process should open a predetermined file containing the file descriptors of the sockets it needs to use saved by the old process.
  3. Start calling “accept” on the same listening sockets that the old process was accepting.
  4. Signal the old process to stop accepting, and to quit when all requests are finished.

While Python has a rudimentary reload, it may still be preferable to use this method as it greatly simplifies the upgrade logic. Upgrading functions piecewise may lead to corruption of logic and data.

Here is an example in Python that reloads itself in a loop using this method. It saves the file descriptors for 2 connected sockets which allows the new process to signal the old one at will. Simply run the program, and then edit the print statements below to watch it change.

import os
import time
import socket
import sys
import fcntl
import _multiprocessing
from multiprocessing import Pipe
 
# global pipe variable
pipe       = None
spare_pipe = None
 
def PipeFromFD(fd):
    return _multiprocessing.Connection(fd)
 
def recordPipe(p1, p2):
    with open('fd.txt', 'w') as f:
        f.write('%s,%s' % (p1.fileno(), p2.fileno()))
 
def upgrade():
    os.spawnlp(os.P_NOWAIT, 'python', 'python', sys.argv[0], '-upgrade')
 
def upgradeBootstrap():
    global pipe, spare_pipe
    with open('fd.txt', 'r') as f:
        p1, p2 = (int(fd) for fd in f.read().split(','))
        pipe = PipeFromFD(p1)
        spare_pipe = PipeFromFD(p2)
 
    recordPipe(spare_pipe, pipe)
    pipe.send("I am upgraded!")
 
if __name__ == '__main__':    
 
    if '-upgrade' in sys.argv:
        upgradeBootstrap()
    else:
        pipe, spare_pipe = Pipe()
        recordPipe(spare_pipe, pipe)
 
    pid = os.getpid()
    print '%i: Version 1. Upgrade in 5 seconds.' % pid
    time.sleep(5)
    print '%i: Upgrading' % pid
    upgrade()
    print '%i: Message from new process: %s' % (pid, pipe.recv())
    print '%i: Upgrade Complete. Exiting' % pid

Example Output

user@test:/mnt/shared$ python upgrade.py
22930: Version 1. Upgrade in 10 seconds.
22930: Upgrading
22971: Version 2. Upgrade in 10 seconds.
22930: Message from new process: I am upgraded!
22930: Upgrade Complete. Exiting
user@test:/mnt/shared$ 22971: Upgrading
23025: Version 3. Upgrade in 10 seconds.
22971: Message from new process: I am upgraded!
22971: Upgrade Complete. Exiting
killall python
  • Reddit
  • Facebook
  • Google Bookmarks
  • RSS

7 Comments